LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH net-next] net/wan/fsl_ucc_hdlc: Add MODULE_DESCRIPTION
From: David Miller @ 2020-08-31 19:42 UTC (permalink / raw)
  To: yuehaibing; +Cc: kuba, netdev, linuxppc-dev, linux-kernel, qiang.zhao
In-Reply-To: <20200829115823.10140-1-yuehaibing@huawei.com>

From: YueHaibing <yuehaibing@huawei.com>
Date: Sat, 29 Aug 2020 19:58:23 +0800

> Add missing MODULE_DESCRIPTION.
> 
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>

Applied.

^ permalink raw reply

* [Bug 209029] kernel 5.9-rc2 fails to boot on a PowerMac G5 11,2 - BUG: Kernel NULL pointer dereference on read at 0x00000020
From: bugzilla-daemon @ 2020-08-31 20:09 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <bug-209029-206035@https.bugzilla.kernel.org/>

https://bugzilla.kernel.org/show_bug.cgi?id=209029

--- Comment #2 from Erhard F. (erhard_f@mailbox.org) ---
No change with 5.9-rc3.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply

* [RESEND][PATCH 2/7] alpha: Avoid overflow at boundary_size
From: Nicolin Chen @ 2020-08-31 20:38 UTC (permalink / raw)
  To: mpe, benh, paulus, rth, ink, mattst88, tony.luck, fenghua.yu,
	schnelle, gerald.schaefer, hca, gor, borntraeger, davem, tglx,
	mingo, bp, x86, hpa, James.Bottomley, deller
  Cc: sfr, linux-ia64, linux-parisc, linux-s390, linux-kernel,
	linux-alpha, sparclinux, linuxppc-dev, hch
In-Reply-To: <20200831203811.8494-1-nicoleotsuka@gmail.com>

The boundary_size might be as large as ULONG_MAX, which means
that a device has no specific boundary limit. So "+ 1" would
potentially overflow.

Also, by following other places in the kernel, boundary_size
should align with the PAGE_SIZE before right shifting by the
PAGE_SHIFT. However, passing it to ALIGN() would potentially
overflow too.

According to kernel defines:
    #define ALIGN_MASK(x, mask) (((x) + (mask)) & ~(mask))
    #define ALIGN(x, a)	ALIGN_MASK(x, (typeof(x))(a) - 1)

We can simplify the logic here:
  ALIGN(boundary + 1, 1 << shift) >> shift
= ALIGN_MASK(b + 1, (1 << s) - 1) >> s
= {[b + 1 + (1 << s) - 1] & ~[(1 << s) - 1]} >> s
= [b + 1 + (1 << s) - 1] >> s
= [b + (1 << s)] >> s
= (b >> s) + 1

So fixing a potential overflow with the safer shortcut.

Signed-off-by: Nicolin Chen <nicoleotsuka@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
---
 arch/alpha/kernel/pci_iommu.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c
index 81037907268d..1ef2c647bd3e 100644
--- a/arch/alpha/kernel/pci_iommu.c
+++ b/arch/alpha/kernel/pci_iommu.c
@@ -141,12 +141,10 @@ iommu_arena_find_pages(struct device *dev, struct pci_iommu_arena *arena,
 	unsigned long boundary_size;
 
 	base = arena->dma_base >> PAGE_SHIFT;
-	if (dev) {
-		boundary_size = dma_get_seg_boundary(dev) + 1;
-		boundary_size >>= PAGE_SHIFT;
-	} else {
-		boundary_size = 1UL << (32 - PAGE_SHIFT);
-	}
+
+	boundary_size = dev ? dma_get_seg_boundary(dev) : U32_MAX;
+	/* Overflow-free shortcut for: ALIGN(b + 1, 1 << s) >> s */
+	boundary_size = (boundary_size >> PAGE_SHIFT) + 1;
 
 	/* Search forward for the first mask-aligned sequence of N free ptes */
 	ptes = arena->ptes;
-- 
2.17.1


^ permalink raw reply related

* [RESEND][PATCH 1/7] powerpc/iommu: Avoid overflow at boundary_size
From: Nicolin Chen @ 2020-08-31 20:38 UTC (permalink / raw)
  To: mpe, benh, paulus, rth, ink, mattst88, tony.luck, fenghua.yu,
	schnelle, gerald.schaefer, hca, gor, borntraeger, davem, tglx,
	mingo, bp, x86, hpa, James.Bottomley, deller
  Cc: sfr, linux-ia64, linux-parisc, linux-s390, linux-kernel,
	linux-alpha, sparclinux, linuxppc-dev, hch
In-Reply-To: <20200831203811.8494-1-nicoleotsuka@gmail.com>

The boundary_size might be as large as ULONG_MAX, which means
that a device has no specific boundary limit. So either "+ 1"
or passing it to ALIGN() would potentially overflow.

According to kernel defines:
    #define ALIGN_MASK(x, mask) (((x) + (mask)) & ~(mask))
    #define ALIGN(x, a)	ALIGN_MASK(x, (typeof(x))(a) - 1)

We can simplify the logic here:
  ALIGN(boundary + 1, 1 << shift) >> shift
= ALIGN_MASK(b + 1, (1 << s) - 1) >> s
= {[b + 1 + (1 << s) - 1] & ~[(1 << s) - 1]} >> s
= [b + 1 + (1 << s) - 1] >> s
= [b + (1 << s)] >> s
= (b >> s) + 1

So fixing a potential overflow with the safer shortcut.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Nicolin Chen <nicoleotsuka@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
---
 arch/powerpc/kernel/iommu.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 9704f3f76e63..c01ccbf8afdd 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -236,15 +236,14 @@ static unsigned long iommu_range_alloc(struct device *dev,
 		}
 	}
 
-	if (dev)
-		boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
-				      1 << tbl->it_page_shift);
-	else
-		boundary_size = ALIGN(1UL << 32, 1 << tbl->it_page_shift);
 	/* 4GB boundary for iseries_hv_alloc and iseries_hv_map */
+	boundary_size = dev ? dma_get_seg_boundary(dev) : U32_MAX;
+
+	/* Overflow-free shortcut for: ALIGN(b + 1, 1 << s) >> s */
+	boundary_size = (boundary_size >> tbl->it_page_shift) + 1;
 
 	n = iommu_area_alloc(tbl->it_map, limit, start, npages, tbl->it_offset,
-			     boundary_size >> tbl->it_page_shift, align_mask);
+			     boundary_size, align_mask);
 	if (n == -1) {
 		if (likely(pass == 0)) {
 			/* First try the pool from the start */
-- 
2.17.1


^ permalink raw reply related

* [RESEND][PATCH 3/7] ia64/sba_iommu: Avoid overflow at boundary_size
From: Nicolin Chen @ 2020-08-31 20:38 UTC (permalink / raw)
  To: mpe, benh, paulus, rth, ink, mattst88, tony.luck, fenghua.yu,
	schnelle, gerald.schaefer, hca, gor, borntraeger, davem, tglx,
	mingo, bp, x86, hpa, James.Bottomley, deller
  Cc: sfr, linux-ia64, linux-parisc, linux-s390, linux-kernel,
	linux-alpha, sparclinux, linuxppc-dev, hch
In-Reply-To: <20200831203811.8494-1-nicoleotsuka@gmail.com>

The boundary_size might be as large as ULONG_MAX, which means
that a device has no specific boundary limit. So either "+ 1"
or passing it to ALIGN() would potentially overflow.

According to kernel defines:
    #define ALIGN_MASK(x, mask) (((x) + (mask)) & ~(mask))
    #define ALIGN(x, a)	ALIGN_MASK(x, (typeof(x))(a) - 1)

We can simplify the logic here:
  ALIGN(boundary + 1, 1 << shift) >> shift
= ALIGN_MASK(b + 1, (1 << s) - 1) >> s
= {[b + 1 + (1 << s) - 1] & ~[(1 << s) - 1]} >> s
= [b + 1 + (1 << s) - 1] >> s
= [b + (1 << s)] >> s
= (b >> s) + 1

So fixing a potential overflow with the safer shortcut.

Signed-off-by: Nicolin Chen <nicoleotsuka@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
---
 arch/ia64/hp/common/sba_iommu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c
index 656a4888c300..945954903bb0 100644
--- a/arch/ia64/hp/common/sba_iommu.c
+++ b/arch/ia64/hp/common/sba_iommu.c
@@ -485,8 +485,8 @@ sba_search_bitmap(struct ioc *ioc, struct device *dev,
 	ASSERT(((unsigned long) ioc->res_hint & (sizeof(unsigned long) - 1UL)) == 0);
 	ASSERT(res_ptr < res_end);
 
-	boundary_size = (unsigned long long)dma_get_seg_boundary(dev) + 1;
-	boundary_size = ALIGN(boundary_size, 1ULL << iovp_shift) >> iovp_shift;
+	/* Overflow-free shortcut for: ALIGN(b + 1, 1 << s) >> s */
+	boundary_size = (dma_get_seg_boundary(dev) >> iovp_shift) + 1;
 
 	BUG_ON(ioc->ibase & ~iovp_mask);
 	shift = ioc->ibase >> iovp_shift;
-- 
2.17.1


^ permalink raw reply related

* [RESEND][PATCH 0/7] Avoid overflow at boundary_size
From: Nicolin Chen @ 2020-08-31 20:38 UTC (permalink / raw)
  To: mpe, benh, paulus, rth, ink, mattst88, tony.luck, fenghua.yu,
	schnelle, gerald.schaefer, hca, gor, borntraeger, davem, tglx,
	mingo, bp, x86, hpa, James.Bottomley, deller
  Cc: sfr, linux-ia64, linux-parisc, linux-s390, linux-kernel,
	linux-alpha, sparclinux, linuxppc-dev, hch

==== For this resend ====
The original series have not been acked at any patch. So I am
resending them, being suggested by Niklas.

==== Coverletter ====
We are expending the default DMA segmentation boundary to its
possible maximum value (ULONG_MAX) to indicate that a device
doesn't specify a boundary limit. So all dma_get_seg_boundary
callers should take a precaution with the return values since
it would easily get overflowed.

I scanned the entire kernel tree for all the existing callers
and found that most of callers may get overflowed in two ways:
either "+ 1" or passing it to ALIGN() that does "+ mask".

According to kernel defines:
    #define ALIGN_MASK(x, mask) (((x) + (mask)) & ~(mask))
    #define ALIGN(x, a)	ALIGN_MASK(x, (typeof(x))(a) - 1)

We can simplify the logic here:
  ALIGN(boundary + 1, 1 << shift) >> shift
= ALIGN_MASK(b + 1, (1 << s) - 1) >> s
= {[b + 1 + (1 << s) - 1] & ~[(1 << s) - 1]} >> s
= [b + 1 + (1 << s) - 1] >> s
= [b + (1 << s)] >> s
= (b >> s) + 1

So this series of patches fix the potential overflow with this
overflow-free shortcut.

As I don't have these platforms, testings/comments are welcome.

Thanks
Nic

Nicolin Chen (7):
  powerpc/iommu: Avoid overflow at boundary_size
  alpha: Avoid overflow at boundary_size
  ia64/sba_iommu: Avoid overflow at boundary_size
  s390/pci_dma: Avoid overflow at boundary_size
  sparc: Avoid overflow at boundary_size
  x86/amd_gart: Avoid overflow at boundary_size
  parisc: Avoid overflow at boundary_size

 arch/alpha/kernel/pci_iommu.c    | 10 ++++------
 arch/ia64/hp/common/sba_iommu.c  |  4 ++--
 arch/powerpc/kernel/iommu.c      | 11 +++++------
 arch/s390/pci/pci_dma.c          |  4 ++--
 arch/sparc/kernel/iommu-common.c |  9 +++------
 arch/sparc/kernel/iommu.c        |  4 ++--
 arch/sparc/kernel/pci_sun4v.c    |  4 ++--
 arch/x86/kernel/amd_gart_64.c    |  4 ++--
 drivers/parisc/ccio-dma.c        |  4 ++--
 drivers/parisc/sba_iommu.c       |  4 ++--
 10 files changed, 26 insertions(+), 32 deletions(-)

-- 
2.17.1

^ permalink raw reply

* [RESEND][PATCH 4/7] s390/pci_dma: Avoid overflow at boundary_size
From: Nicolin Chen @ 2020-08-31 20:38 UTC (permalink / raw)
  To: mpe, benh, paulus, rth, ink, mattst88, tony.luck, fenghua.yu,
	schnelle, gerald.schaefer, hca, gor, borntraeger, davem, tglx,
	mingo, bp, x86, hpa, James.Bottomley, deller
  Cc: sfr, linux-ia64, linux-parisc, linux-s390, linux-kernel,
	linux-alpha, sparclinux, linuxppc-dev, hch
In-Reply-To: <20200831203811.8494-1-nicoleotsuka@gmail.com>

The boundary_size might be as large as ULONG_MAX, which means
that a device has no specific boundary limit. So either "+ 1"
or passing it to ALIGN() would potentially overflow.

According to kernel defines:
    #define ALIGN_MASK(x, mask) (((x) + (mask)) & ~(mask))
    #define ALIGN(x, a)	ALIGN_MASK(x, (typeof(x))(a) - 1)

We can simplify the logic here:
  ALIGN(boundary + 1, 1 << shift) >> shift
= ALIGN_MASK(b + 1, (1 << s) - 1) >> s
= {[b + 1 + (1 << s) - 1] & ~[(1 << s) - 1]} >> s
= [b + 1 + (1 << s) - 1] >> s
= [b + (1 << s)] >> s
= (b >> s) + 1

So fixing a potential overflow with the safer shortcut.

Signed-off-by: Nicolin Chen <nicoleotsuka@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
---
 arch/s390/pci/pci_dma.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c
index 64b1399a73f0..ecb067acc6d5 100644
--- a/arch/s390/pci/pci_dma.c
+++ b/arch/s390/pci/pci_dma.c
@@ -263,8 +263,8 @@ static unsigned long __dma_alloc_iommu(struct device *dev,
 	struct zpci_dev *zdev = to_zpci(to_pci_dev(dev));
 	unsigned long boundary_size;
 
-	boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
-			      PAGE_SIZE) >> PAGE_SHIFT;
+	/* Overflow-free shortcut for: ALIGN(b + 1, 1 << s) >> s */
+	boundary_size = (dma_get_seg_boundary(dev) >> PAGE_SHIFT) + 1;
 	return iommu_area_alloc(zdev->iommu_bitmap, zdev->iommu_pages,
 				start, size, zdev->start_dma >> PAGE_SHIFT,
 				boundary_size, 0);
-- 
2.17.1


^ permalink raw reply related

* [RESEND][PATCH 5/7] sparc: Avoid overflow at boundary_size
From: Nicolin Chen @ 2020-08-31 20:38 UTC (permalink / raw)
  To: mpe, benh, paulus, rth, ink, mattst88, tony.luck, fenghua.yu,
	schnelle, gerald.schaefer, hca, gor, borntraeger, davem, tglx,
	mingo, bp, x86, hpa, James.Bottomley, deller
  Cc: sfr, linux-ia64, linux-parisc, linux-s390, linux-kernel,
	linux-alpha, sparclinux, linuxppc-dev, hch
In-Reply-To: <20200831203811.8494-1-nicoleotsuka@gmail.com>

The boundary_size might be as large as ULONG_MAX, which means
that a device has no specific boundary limit. So either "+ 1"
or passing it to ALIGN() would potentially overflow.

According to kernel defines:
    #define ALIGN_MASK(x, mask) (((x) + (mask)) & ~(mask))
    #define ALIGN(x, a)	ALIGN_MASK(x, (typeof(x))(a) - 1)

We can simplify the logic here:
  ALIGN(boundary + 1, 1 << shift) >> shift
= ALIGN_MASK(b + 1, (1 << s) - 1) >> s
= {[b + 1 + (1 << s) - 1] & ~[(1 << s) - 1]} >> s
= [b + 1 + (1 << s) - 1] >> s
= [b + (1 << s)] >> s
= (b >> s) + 1

So fixing a potential overflow with the safer shortcut.

Signed-off-by: Nicolin Chen <nicoleotsuka@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
---
 arch/sparc/kernel/iommu-common.c | 9 +++------
 arch/sparc/kernel/iommu.c        | 4 ++--
 arch/sparc/kernel/pci_sun4v.c    | 4 ++--
 3 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/arch/sparc/kernel/iommu-common.c b/arch/sparc/kernel/iommu-common.c
index 59cb16691322..843e71894d04 100644
--- a/arch/sparc/kernel/iommu-common.c
+++ b/arch/sparc/kernel/iommu-common.c
@@ -166,13 +166,10 @@ unsigned long iommu_tbl_range_alloc(struct device *dev,
 		}
 	}
 
-	if (dev)
-		boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
-				      1 << iommu->table_shift);
-	else
-		boundary_size = ALIGN(1ULL << 32, 1 << iommu->table_shift);
+	boundary_size = dev ? dma_get_seg_boundary(dev) : U32_MAX;
 
-	boundary_size = boundary_size >> iommu->table_shift;
+	/* Overflow-free shortcut for: ALIGN(b + 1, 1 << s) >> s */
+	boundary_size = (boundary_size >> iommu->table_shift) + 1;
 	/*
 	 * if the skip_span_boundary_check had been set during init, we set
 	 * things up so that iommu_is_span_boundary() merely checks if the
diff --git a/arch/sparc/kernel/iommu.c b/arch/sparc/kernel/iommu.c
index 4ae7388b1bff..d981c37305ae 100644
--- a/arch/sparc/kernel/iommu.c
+++ b/arch/sparc/kernel/iommu.c
@@ -472,8 +472,8 @@ static int dma_4u_map_sg(struct device *dev, struct scatterlist *sglist,
 	outs->dma_length = 0;
 
 	max_seg_size = dma_get_max_seg_size(dev);
-	seg_boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
-				  IO_PAGE_SIZE) >> IO_PAGE_SHIFT;
+	/* Overflow-free shortcut for: ALIGN(b + 1, 1 << s) >> s */
+	seg_boundary_size = (dma_get_seg_boundary(dev) >> IO_PAGE_SHIFT) + 1;
 	base_shift = iommu->tbl.table_map_base >> IO_PAGE_SHIFT;
 	for_each_sg(sglist, s, nelems, i) {
 		unsigned long paddr, npages, entry, out_entry = 0, slen;
diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index 14b93c5564e3..233fe8a20cec 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -508,8 +508,8 @@ static int dma_4v_map_sg(struct device *dev, struct scatterlist *sglist,
 	iommu_batch_start(dev, prot, ~0UL);
 
 	max_seg_size = dma_get_max_seg_size(dev);
-	seg_boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
-				  IO_PAGE_SIZE) >> IO_PAGE_SHIFT;
+	/* Overflow-free shortcut for: ALIGN(b + 1, 1 << s) >> s */
+	seg_boundary_size = (dma_get_seg_boundary(dev) >> IO_PAGE_SHIFT) + 1;
 
 	mask = *dev->dma_mask;
 	if (!iommu_use_atu(iommu, mask))
-- 
2.17.1


^ permalink raw reply related

* [RESEND][PATCH 6/7] x86/amd_gart: Avoid overflow at boundary_size
From: Nicolin Chen @ 2020-08-31 20:38 UTC (permalink / raw)
  To: mpe, benh, paulus, rth, ink, mattst88, tony.luck, fenghua.yu,
	schnelle, gerald.schaefer, hca, gor, borntraeger, davem, tglx,
	mingo, bp, x86, hpa, James.Bottomley, deller
  Cc: sfr, linux-ia64, linux-parisc, linux-s390, linux-kernel,
	linux-alpha, sparclinux, linuxppc-dev, hch
In-Reply-To: <20200831203811.8494-1-nicoleotsuka@gmail.com>

The boundary_size might be as large as ULONG_MAX, which means
that a device has no specific boundary limit. So either "+ 1"
or passing it to ALIGN() would potentially overflow.

According to kernel defines:
    #define ALIGN_MASK(x, mask) (((x) + (mask)) & ~(mask))
    #define ALIGN(x, a)	ALIGN_MASK(x, (typeof(x))(a) - 1)

We can simplify the logic here:
  ALIGN(boundary + 1, 1 << shift) >> shift
= ALIGN_MASK(b + 1, (1 << s) - 1) >> s
= {[b + 1 + (1 << s) - 1] & ~[(1 << s) - 1]} >> s
= [b + 1 + (1 << s) - 1] >> s
= [b + (1 << s)] >> s
= (b >> s) + 1

So fixing a potential overflow with the safer shortcut.

Signed-off-by: Nicolin Chen <nicoleotsuka@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
---
 arch/x86/kernel/amd_gart_64.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/amd_gart_64.c b/arch/x86/kernel/amd_gart_64.c
index e89031e9c847..7fa0bb490065 100644
--- a/arch/x86/kernel/amd_gart_64.c
+++ b/arch/x86/kernel/amd_gart_64.c
@@ -96,8 +96,8 @@ static unsigned long alloc_iommu(struct device *dev, int size,
 
 	base_index = ALIGN(iommu_bus_base & dma_get_seg_boundary(dev),
 			   PAGE_SIZE) >> PAGE_SHIFT;
-	boundary_size = ALIGN((u64)dma_get_seg_boundary(dev) + 1,
-			      PAGE_SIZE) >> PAGE_SHIFT;
+	/* Overflow-free shortcut for: ALIGN(b + 1, 1 << s) >> s */
+	boundary_size = (dma_get_seg_boundary(dev) >> PAGE_SHIFT) + 1;
 
 	spin_lock_irqsave(&iommu_bitmap_lock, flags);
 	offset = iommu_area_alloc(iommu_gart_bitmap, iommu_pages, next_bit,
-- 
2.17.1


^ permalink raw reply related

* [RESEND][PATCH 7/7] parisc: Avoid overflow at boundary_size
From: Nicolin Chen @ 2020-08-31 20:38 UTC (permalink / raw)
  To: mpe, benh, paulus, rth, ink, mattst88, tony.luck, fenghua.yu,
	schnelle, gerald.schaefer, hca, gor, borntraeger, davem, tglx,
	mingo, bp, x86, hpa, James.Bottomley, deller
  Cc: sfr, linux-ia64, linux-parisc, linux-s390, linux-kernel,
	linux-alpha, sparclinux, linuxppc-dev, hch
In-Reply-To: <20200831203811.8494-1-nicoleotsuka@gmail.com>

The boundary_size might be as large as ULONG_MAX, which means
that a device has no specific boundary limit. So either "+ 1"
or passing it to ALIGN() would potentially overflow.

According to kernel defines:
    #define ALIGN_MASK(x, mask) (((x) + (mask)) & ~(mask))
    #define ALIGN(x, a)	ALIGN_MASK(x, (typeof(x))(a) - 1)

We can simplify the logic here:
  ALIGN(boundary + 1, 1 << shift) >> shift
= ALIGN_MASK(b + 1, (1 << s) - 1) >> s
= {[b + 1 + (1 << s) - 1] & ~[(1 << s) - 1]} >> s
= [b + 1 + (1 << s) - 1] >> s
= [b + (1 << s)] >> s
= (b >> s) + 1

So fixing a potential overflow with the safer shortcut.

Signed-off-by: Nicolin Chen <nicoleotsuka@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
---
 drivers/parisc/ccio-dma.c  | 4 ++--
 drivers/parisc/sba_iommu.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/parisc/ccio-dma.c b/drivers/parisc/ccio-dma.c
index a5507f75b524..c667d6aba764 100644
--- a/drivers/parisc/ccio-dma.c
+++ b/drivers/parisc/ccio-dma.c
@@ -356,8 +356,8 @@ ccio_alloc_range(struct ioc *ioc, struct device *dev, size_t size)
 	** ggg sacrifices another 710 to the computer gods.
 	*/
 
-	boundary_size = ALIGN((unsigned long long)dma_get_seg_boundary(dev) + 1,
-			      1ULL << IOVP_SHIFT) >> IOVP_SHIFT;
+	/* Overflow-free shortcut for: ALIGN(b + 1, 1 << s) >> s */
+	boundary_size = (dma_get_seg_boundary(dev) >> IOVP_SHIFT) + 1;
 
 	if (pages_needed <= 8) {
 		/*
diff --git a/drivers/parisc/sba_iommu.c b/drivers/parisc/sba_iommu.c
index d4314fba0269..96bc2c617cbd 100644
--- a/drivers/parisc/sba_iommu.c
+++ b/drivers/parisc/sba_iommu.c
@@ -342,8 +342,8 @@ sba_search_bitmap(struct ioc *ioc, struct device *dev,
 	unsigned long shift;
 	int ret;
 
-	boundary_size = ALIGN((unsigned long long)dma_get_seg_boundary(dev) + 1,
-			      1ULL << IOVP_SHIFT) >> IOVP_SHIFT;
+	/* Overflow-free shortcut for: ALIGN(b + 1, 1 << s) >> s */
+	boundary_size = (dma_get_seg_boundary(dev) >> IOVP_SHIFT) + 1;
 
 #if defined(ZX1_SUPPORT)
 	BUG_ON(ioc->ibase & ~IOVP_MASK);
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH] scsi: ibmvfc: interface updates for future FPIN and MQ support
From: kernel test robot @ 2020-08-31 23:57 UTC (permalink / raw)
  To: Tyrel Datwyler, james.bottomley
  Cc: Tyrel Datwyler, kbuild-all, martin.petersen, linux-scsi,
	linux-kernel, brking, linuxppc-dev
In-Reply-To: <20200831171844.635729-1-tyreld@linux.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 15835 bytes --]

Hi Tyrel,

I love your patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on mkp-scsi/for-next scsi/for-next v5.9-rc3 next-20200828]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Tyrel-Datwyler/scsi-ibmvfc-interface-updates-for-future-FPIN-and-MQ-support/20200901-012005
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-defconfig (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from drivers/scsi/ibmvscsi/ibmvfc.c:32:
>> drivers/scsi/ibmvscsi/ibmvfc.h:500:13: error: expected ':', ',', ';', '}' or '__attribute__' before 'nvmeof_vas_channels'
     500 |  __be32 num nvmeof_vas_channels;
         |             ^~~~~~~~~~~~~~~~~~~
>> drivers/scsi/ibmvscsi/ibmvfc.h:506:17: error: expected declaration specifiers or '...' before '(' token
     506 | }__attrribute__((packed, aligned (8)));
         |                 ^
>> drivers/scsi/ibmvscsi/ibmvfc.h:526:28: error: field 'common' has incomplete type
     526 |  struct ibmvfc_madd_common common;
         |                            ^~~~~~
>> drivers/scsi/ibmvscsi/ibmvfc.h:627:2: error: expected ':', ',', ';', '}' or '__attribute__' before 'u8'
     627 |  u8 pad;
         |  ^~
   In file included from include/linux/byteorder/big_endian.h:5,
                    from arch/powerpc/include/uapi/asm/byteorder.h:14,
                    from include/asm-generic/bitops/le.h:6,
                    from arch/powerpc/include/asm/bitops.h:246,
                    from include/linux/bitops.h:29,
                    from include/linux/kernel.h:12,
                    from include/linux/list.h:9,
                    from include/linux/module.h:12,
                    from drivers/scsi/ibmvscsi/ibmvfc.c:10:
   drivers/scsi/ibmvscsi/ibmvfc.c: In function 'ibmvfc_handle_async':
>> drivers/scsi/ibmvscsi/ibmvfc.c:2665:75: error: 'struct ibmvfc_async_crq' has no member named 'event'
    2665 |  const struct ibmvfc_async_desc *desc = ibmvfc_get_ae_desc(be64_to_cpu(crq->event));
         |                                                                           ^~
   include/uapi/linux/byteorder/big_endian.h:38:51: note: in definition of macro '__be64_to_cpu'
      38 | #define __be64_to_cpu(x) ((__force __u64)(__be64)(x))
         |                                                   ^
   drivers/scsi/ibmvscsi/ibmvfc.c:2665:60: note: in expansion of macro 'be64_to_cpu'
    2665 |  const struct ibmvfc_async_desc *desc = ibmvfc_get_ae_desc(be64_to_cpu(crq->event));
         |                                                            ^~~~~~~~~~~
>> drivers/scsi/ibmvscsi/ibmvfc.c:2669:57: error: 'struct ibmvfc_async_crq' has no member named 'scsi_id'
    2669 |      " node_name: %llx%s\n", desc->desc, be64_to_cpu(crq->scsi_id),
         |                                                         ^~
   include/uapi/linux/byteorder/big_endian.h:38:51: note: in definition of macro '__be64_to_cpu'
      38 | #define __be64_to_cpu(x) ((__force __u64)(__be64)(x))
         |                                                   ^
   drivers/scsi/ibmvscsi/ibmvfc.h:818:4: note: in expansion of macro 'dev_err'
     818 |    dev_err((vhost)->dev, ##__VA_ARGS__); \
         |    ^~~~~~~
   drivers/scsi/ibmvscsi/ibmvfc.c:2668:2: note: in expansion of macro 'ibmvfc_log'
    2668 |  ibmvfc_log(vhost, desc->log_level, "%s event received. scsi_id: %llx, wwpn: %llx,"
         |  ^~~~~~~~~~
>> drivers/scsi/ibmvscsi/ibmvfc.c:2670:21: error: 'struct ibmvfc_async_crq' has no member named 'wwpn'
    2670 |      be64_to_cpu(crq->wwpn), be64_to_cpu(crq->node_name),
         |                     ^~
   include/uapi/linux/byteorder/big_endian.h:38:51: note: in definition of macro '__be64_to_cpu'
      38 | #define __be64_to_cpu(x) ((__force __u64)(__be64)(x))
         |                                                   ^
   drivers/scsi/ibmvscsi/ibmvfc.h:818:4: note: in expansion of macro 'dev_err'
     818 |    dev_err((vhost)->dev, ##__VA_ARGS__); \
         |    ^~~~~~~
   drivers/scsi/ibmvscsi/ibmvfc.c:2668:2: note: in expansion of macro 'ibmvfc_log'
    2668 |  ibmvfc_log(vhost, desc->log_level, "%s event received. scsi_id: %llx, wwpn: %llx,"
         |  ^~~~~~~~~~
>> drivers/scsi/ibmvscsi/ibmvfc.c:2670:45: error: 'struct ibmvfc_async_crq' has no member named 'node_name'
    2670 |      be64_to_cpu(crq->wwpn), be64_to_cpu(crq->node_name),
         |                                             ^~
   include/uapi/linux/byteorder/big_endian.h:38:51: note: in definition of macro '__be64_to_cpu'
      38 | #define __be64_to_cpu(x) ((__force __u64)(__be64)(x))
         |                                                   ^
   drivers/scsi/ibmvscsi/ibmvfc.h:818:4: note: in expansion of macro 'dev_err'
     818 |    dev_err((vhost)->dev, ##__VA_ARGS__); \
         |    ^~~~~~~
   drivers/scsi/ibmvscsi/ibmvfc.c:2668:2: note: in expansion of macro 'ibmvfc_log'
    2668 |  ibmvfc_log(vhost, desc->log_level, "%s event received. scsi_id: %llx, wwpn: %llx,"
         |  ^~~~~~~~~~
   drivers/scsi/ibmvscsi/ibmvfc.c:2673:25: error: 'struct ibmvfc_async_crq' has no member named 'event'
    2673 |  switch (be64_to_cpu(crq->event)) {
         |                         ^~
   include/uapi/linux/byteorder/big_endian.h:38:51: note: in definition of macro '__be64_to_cpu'
      38 | #define __be64_to_cpu(x) ((__force __u64)(__be64)(x))
         |                                                   ^
   drivers/scsi/ibmvscsi/ibmvfc.c:2673:10: note: in expansion of macro 'be64_to_cpu'
    2673 |  switch (be64_to_cpu(crq->event)) {
         |          ^~~~~~~~~~~
   drivers/scsi/ibmvscsi/ibmvfc.c:2714:12: error: 'struct ibmvfc_async_crq' has no member named 'scsi_id'
    2714 |    if (!crq->scsi_id && !crq->wwpn && !crq->node_name)
         |            ^~
   drivers/scsi/ibmvscsi/ibmvfc.c:2714:29: error: 'struct ibmvfc_async_crq' has no member named 'wwpn'
    2714 |    if (!crq->scsi_id && !crq->wwpn && !crq->node_name)
         |                             ^~
   drivers/scsi/ibmvscsi/ibmvfc.c:2714:43: error: 'struct ibmvfc_async_crq' has no member named 'node_name'
    2714 |    if (!crq->scsi_id && !crq->wwpn && !crq->node_name)
         |                                           ^~
   drivers/scsi/ibmvscsi/ibmvfc.c:2716:11: error: 'struct ibmvfc_async_crq' has no member named 'scsi_id'
    2716 |    if (crq->scsi_id && cpu_to_be64(tgt->scsi_id) != crq->scsi_id)
         |           ^~
   drivers/scsi/ibmvscsi/ibmvfc.c:2716:56: error: 'struct ibmvfc_async_crq' has no member named 'scsi_id'
    2716 |    if (crq->scsi_id && cpu_to_be64(tgt->scsi_id) != crq->scsi_id)
         |                                                        ^~
   drivers/scsi/ibmvscsi/ibmvfc.c:2718:11: error: 'struct ibmvfc_async_crq' has no member named 'wwpn'
    2718 |    if (crq->wwpn && cpu_to_be64(tgt->ids.port_name) != crq->wwpn)
         |           ^~
   drivers/scsi/ibmvscsi/ibmvfc.c:2718:59: error: 'struct ibmvfc_async_crq' has no member named 'wwpn'
    2718 |    if (crq->wwpn && cpu_to_be64(tgt->ids.port_name) != crq->wwpn)
         |                                                           ^~
   drivers/scsi/ibmvscsi/ibmvfc.c:2720:11: error: 'struct ibmvfc_async_crq' has no member named 'node_name'
    2720 |    if (crq->node_name && cpu_to_be64(tgt->ids.node_name) != crq->node_name)
         |           ^~
   drivers/scsi/ibmvscsi/ibmvfc.c:2720:64: error: 'struct ibmvfc_async_crq' has no member named 'node_name'
    2720 |    if (crq->node_name && cpu_to_be64(tgt->ids.node_name) != crq->node_name)
         |                                                                ^~
   In file included from include/linux/byteorder/big_endian.h:5,
                    from arch/powerpc/include/uapi/asm/byteorder.h:14,
                    from include/asm-generic/bitops/le.h:6,
                    from arch/powerpc/include/asm/bitops.h:246,
                    from include/linux/bitops.h:29,
                    from include/linux/kernel.h:12,
                    from include/linux/list.h:9,
                    from include/linux/module.h:12,
                    from drivers/scsi/ibmvscsi/ibmvfc.c:10:
   drivers/scsi/ibmvscsi/ibmvfc.c:2722:42: error: 'struct ibmvfc_async_crq' has no member named 'event'
    2722 |    if (tgt->need_login && be64_to_cpu(crq->event) == IBMVFC_AE_ELS_LOGO)
         |                                          ^~
   include/uapi/linux/byteorder/big_endian.h:38:51: note: in definition of macro '__be64_to_cpu'
      38 | #define __be64_to_cpu(x) ((__force __u64)(__be64)(x))
         |                                                   ^
   drivers/scsi/ibmvscsi/ibmvfc.c:2722:27: note: in expansion of macro 'be64_to_cpu'
    2722 |    if (tgt->need_login && be64_to_cpu(crq->event) == IBMVFC_AE_ELS_LOGO)
         |                           ^~~~~~~~~~~
   drivers/scsi/ibmvscsi/ibmvfc.c:2724:43: error: 'struct ibmvfc_async_crq' has no member named 'event'
    2724 |    if (!tgt->need_login || be64_to_cpu(crq->event) == IBMVFC_AE_ELS_PLOGI) {
         |                                           ^~
   include/uapi/linux/byteorder/big_endian.h:38:51: note: in definition of macro '__be64_to_cpu'
      38 | #define __be64_to_cpu(x) ((__force __u64)(__be64)(x))
         |                                                   ^
   drivers/scsi/ibmvscsi/ibmvfc.c:2724:28: note: in expansion of macro 'be64_to_cpu'
    2724 |    if (!tgt->need_login || be64_to_cpu(crq->event) == IBMVFC_AE_ELS_PLOGI) {
         |                            ^~~~~~~~~~~
   In file included from include/linux/device.h:15,
                    from include/linux/dma-mapping.h:7,
                    from drivers/scsi/ibmvscsi/ibmvfc.c:12:
   drivers/scsi/ibmvscsi/ibmvfc.c:2741:66: error: 'struct ibmvfc_async_crq' has no member named 'event'
    2741 |   dev_err(vhost->dev, "Unknown async event received: %lld\n", crq->event);
         |                                                                  ^~
   include/linux/dev_printk.h:104:32: note: in definition of macro 'dev_err'
     104 |  _dev_err(dev, dev_fmt(fmt), ##__VA_ARGS__)
         |                                ^~~~~~~~~~~

# https://github.com/0day-ci/linux/commit/b2aced49faf50075e9a74e7a253d1ad77cce1c0c
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Tyrel-Datwyler/scsi-ibmvfc-interface-updates-for-future-FPIN-and-MQ-support/20200901-012005
git checkout b2aced49faf50075e9a74e7a253d1ad77cce1c0c
vim +500 drivers/scsi/ibmvscsi/ibmvfc.h

   490	
   491	struct ibmvfc_channel_enquiry {
   492		struct ibmvfc_mad_common common;
   493		__be32 flags;
   494	#define IBMVFC_NO_CHANNELS_TO_CRQ_SUPPORT	0x01
   495	#define IBMVFC_SUPPORT_VARIABLE_SUBQ_MSG	0x02
   496	#define IBMVFC_NO_N_TO_M_CHANNELS_SUPPORT	0x04
   497		__be32 num_scsi_subq_channels;
   498		__be32 num_nvmeof_subq_channels;
   499		__be32 num_scsi_vas_channels;
 > 500		__be32 num nvmeof_vas_channels;
   501	}__attribute__((packed, aligned (8)));
   502	
   503	struct ibmvfc_channel_setup_mad {
   504		struct ibmvfc_mad_common common;
   505		struct srp_direct_buf buffer;
 > 506	}__attrribute__((packed, aligned (8)));
   507	
   508	#define IBMVFC_MAX_CHANNELS	502
   509	
   510	struct ibmvfc_channel_setup {
   511		__be32 flags;
   512	#define IBMVFC_CANCEL_CHANNELS		0x01
   513	#define IBMVFC_USE_BUFFER		0x02
   514	#define IBMVFC_CHANNELS_CANCELED	0x04
   515		__be32 reserved;
   516		__be32 num_scsi_subq_channels;
   517		__be32 num_nvmeof_subq_channels;
   518		__be32 num_scsi_vas_channels;
   519		__be32 num_nvmeof_vas_channels;
   520		struct srp_direct_buf buffer;
   521		__be64 reserved2[5];
   522		__be64 channel_handles[IBMVFC_MAX_CHANNELS];
   523	}__attribute__((packed, aligned (8)));
   524	
   525	struct ibmvfc_connection_info {
 > 526		struct ibmvfc_madd_common common;
   527		__be64 information_bits;
   528	#define IBMVFC_NO_FC_IO_CHANNEL		0x01
   529	#define IBMVFC_NO_PHYP_VAS		0x02
   530	#define IBMVFC_NO_PHYP_SUBQ		0x04
   531	#define IBMVFC_PHYP_DEPRECATED_SUBQ	0x08
   532	#define IBMVFC_PHYP_PRESERVED_SUBQ	0x10
   533	#define IBMVFC_PHYP_FULL_SUBQ		0x20
   534		__be64 reserved[16];
   535	}__attribute__((packed, aligned (8)));
   536	
   537	struct ibmvfc_trace_start_entry {
   538		u32 xfer_len;
   539	}__attribute__((packed));
   540	
   541	struct ibmvfc_trace_end_entry {
   542		u16 status;
   543		u16 error;
   544		u8 fcp_rsp_flags;
   545		u8 rsp_code;
   546		u8 scsi_status;
   547		u8 reserved;
   548	}__attribute__((packed));
   549	
   550	struct ibmvfc_trace_entry {
   551		struct ibmvfc_event *evt;
   552		u32 time;
   553		u32 scsi_id;
   554		u32 lun;
   555		u8 fmt;
   556		u8 op_code;
   557		u8 tmf_flags;
   558		u8 type;
   559	#define IBMVFC_TRC_START	0x00
   560	#define IBMVFC_TRC_END		0xff
   561		union {
   562			struct ibmvfc_trace_start_entry start;
   563			struct ibmvfc_trace_end_entry end;
   564		} u;
   565	}__attribute__((packed, aligned (8)));
   566	
   567	enum ibmvfc_crq_formats {
   568		IBMVFC_CMD_FORMAT		= 0x01,
   569		IBMVFC_ASYNC_EVENT	= 0x02,
   570		IBMVFC_MAD_FORMAT		= 0x04,
   571	};
   572	
   573	enum ibmvfc_async_event {
   574		IBMVFC_AE_ELS_PLOGI		= 0x0001,
   575		IBMVFC_AE_ELS_LOGO		= 0x0002,
   576		IBMVFC_AE_ELS_PRLO		= 0x0004,
   577		IBMVFC_AE_SCN_NPORT		= 0x0008,
   578		IBMVFC_AE_SCN_GROUP		= 0x0010,
   579		IBMVFC_AE_SCN_DOMAIN		= 0x0020,
   580		IBMVFC_AE_SCN_FABRIC		= 0x0040,
   581		IBMVFC_AE_LINK_UP			= 0x0080,
   582		IBMVFC_AE_LINK_DOWN		= 0x0100,
   583		IBMVFC_AE_LINK_DEAD		= 0x0200,
   584		IBMVFC_AE_HALT			= 0x0400,
   585		IBMVFC_AE_RESUME			= 0x0800,
   586		IBMVFC_AE_ADAPTER_FAILED	= 0x1000,
   587		IBMVFC_AE_FPIN			= 0x2000,
   588	};
   589	
   590	struct ibmvfc_async_desc {
   591		const char *desc;
   592		enum ibmvfc_async_event ae;
   593		int log_level;
   594	};
   595	
   596	struct ibmvfc_crq {
   597		volatile u8 valid;
   598		volatile u8 format;
   599		u8 reserved[6];
   600		volatile __be64 ioba;
   601	}__attribute__((packed, aligned (8)));
   602	
   603	struct ibmvfc_crq_queue {
   604		struct ibmvfc_crq *msgs;
   605		int size, cur;
   606		dma_addr_t msg_token;
   607	};
   608	
   609	enum ibmvfc_ae_link_state {
   610		IBMVFC_AE_LS_LINK_UP		= 0x01,
   611		IBMVFC_AE_LS_LINK_BOUNCED	= 0x02,
   612		IBMVFC_AE_LS_LINK_DOWN		= 0x04,
   613		IBMVFC_AE_LS_LINK_DEAD		= 0x08,
   614	};
   615	
   616	enum ibmvfc_ae_fpin_status {
   617		IBMVFC_AE_FPIN_LINK_CONGESTED	= 0x1,
   618		IBMVFC_AE_FPIN_PORT_CONGESTED	= 0x2,
   619		IBMVFC_AE_FPIN_PORT_CLEARED	= 0x3,
   620		IBMVFC_AE_FPIN_PORT_DEGRADED	= 0x4,
   621	};
   622	
   623	struct ibmvfc_async_crq {
   624		volatile u8 valid;
   625		u8 link_state;
   626		u8 fpin_status
 > 627		u8 pad;
   628		__be32 pad2;
   629		volatile __be64 event;
   630		volatile __be64 scsi_id;
   631		volatile __be64 wwpn;
   632		volatile __be64 node_name;
   633		__be64 reserved;
   634	}__attribute__((packed, aligned (8)));
   635	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 26306 bytes --]

^ permalink raw reply

* [PATCH v2] scsi: ibmvfc: interface updates for future FPIN and MQ support
From: Tyrel Datwyler @ 2020-09-01  0:24 UTC (permalink / raw)
  To: james.bottomley
  Cc: Tyrel Datwyler, martin.petersen, linux-scsi, linux-kernel, brking,
	linuxppc-dev

VIOS partitions with SLI-4 enabled Emulex adapters will be capable of
driving IO in parallel through mulitple work queues or channels, and
with new hyperviosr firmware that supports multiple interrupt sources
an ibmvfc NPIV single initiator can be modified to exploit end to end
channelization in a PowerVM environment.

VIOS hosts will also be able to expose fabric perfromance impact
notifications (FPIN) via a new asynchronous event to ibmvfc clients that
advertise support via IBMVFC_CAN_HANDLE_FPIN in their capabilities flag
during NPIV_LOGIN.

This patch introduces three new Managment Datagrams (MADs) for
channelization support negotiation as well as the FPIN asynchronous
event and FPIN status flags. Follow up work is required to plumb the
ibmvfc client driver to use these new interfaces.

Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
---
v1 -> v2:
	Fixup complier errors from neglected commit --amend

---
 drivers/scsi/ibmvscsi/ibmvfc.h | 66 +++++++++++++++++++++++++++++++++-
 1 file changed, 65 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.h b/drivers/scsi/ibmvscsi/ibmvfc.h
index 907889f1fa9d..fe55cfc79421 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.h
+++ b/drivers/scsi/ibmvscsi/ibmvfc.h
@@ -124,6 +124,9 @@ enum ibmvfc_mad_types {
 	IBMVFC_PASSTHRU		= 0x0200,
 	IBMVFC_TMF_MAD		= 0x0100,
 	IBMVFC_NPIV_LOGOUT	= 0x0800,
+	IBMVFC_CHANNEL_ENQUIRY	= 0x1000,
+	IBMVFC_CHANNEL_SETUP	= 0x2000,
+	IBMVFC_CONNECTION_INFO	= 0x4000,
 };
 
 struct ibmvfc_mad_common {
@@ -162,6 +165,8 @@ struct ibmvfc_npiv_login {
 	__be32 max_cmds;
 	__be64 capabilities;
 #define IBMVFC_CAN_MIGRATE		0x01
+#define IBMVFC_CAN_USE_CHANNELS		0x02
+#define IBMVFC_CAN_HANDLE_FPIN		0x04
 	__be64 node_name;
 	struct srp_direct_buf async;
 	u8 partition_name[IBMVFC_MAX_NAME];
@@ -204,6 +209,7 @@ struct ibmvfc_npiv_login_resp {
 	__be64 capabilities;
 #define IBMVFC_CAN_FLUSH_ON_HALT	0x08
 #define IBMVFC_CAN_SUPPRESS_ABTS	0x10
+#define IBMVFC_CAN_SUPPORT_CHANNELS	0x20
 	__be32 max_cmds;
 	__be32 scsi_id_sz;
 	__be64 max_dma_len;
@@ -482,6 +488,52 @@ struct ibmvfc_passthru_mad {
 	struct ibmvfc_passthru_fc_iu fc_iu;
 }__attribute__((packed, aligned (8)));
 
+struct ibmvfc_channel_enquiry {
+	struct ibmvfc_mad_common common;
+	__be32 flags;
+#define IBMVFC_NO_CHANNELS_TO_CRQ_SUPPORT	0x01
+#define IBMVFC_SUPPORT_VARIABLE_SUBQ_MSG	0x02
+#define IBMVFC_NO_N_TO_M_CHANNELS_SUPPORT	0x04
+	__be32 num_scsi_subq_channels;
+	__be32 num_nvmeof_subq_channels;
+	__be32 num_scsi_vas_channels;
+	__be32 num_nvmeof_vas_channels;
+}__attribute__((packed, aligned (8)));
+
+struct ibmvfc_channel_setup_mad {
+	struct ibmvfc_mad_common common;
+	struct srp_direct_buf buffer;
+}__attribute__((packed, aligned (8)));
+
+#define IBMVFC_MAX_CHANNELS	502
+
+struct ibmvfc_channel_setup {
+	__be32 flags;
+#define IBMVFC_CANCEL_CHANNELS		0x01
+#define IBMVFC_USE_BUFFER		0x02
+#define IBMVFC_CHANNELS_CANCELED	0x04
+	__be32 reserved;
+	__be32 num_scsi_subq_channels;
+	__be32 num_nvmeof_subq_channels;
+	__be32 num_scsi_vas_channels;
+	__be32 num_nvmeof_vas_channels;
+	struct srp_direct_buf buffer;
+	__be64 reserved2[5];
+	__be64 channel_handles[IBMVFC_MAX_CHANNELS];
+}__attribute__((packed, aligned (8)));
+
+struct ibmvfc_connection_info {
+	struct ibmvfc_mad_common common;
+	__be64 information_bits;
+#define IBMVFC_NO_FC_IO_CHANNEL		0x01
+#define IBMVFC_NO_PHYP_VAS		0x02
+#define IBMVFC_NO_PHYP_SUBQ		0x04
+#define IBMVFC_PHYP_DEPRECATED_SUBQ	0x08
+#define IBMVFC_PHYP_PRESERVED_SUBQ	0x10
+#define IBMVFC_PHYP_FULL_SUBQ		0x20
+	__be64 reserved[16];
+}__attribute__((packed, aligned (8)));
+
 struct ibmvfc_trace_start_entry {
 	u32 xfer_len;
 }__attribute__((packed));
@@ -532,6 +584,7 @@ enum ibmvfc_async_event {
 	IBMVFC_AE_HALT			= 0x0400,
 	IBMVFC_AE_RESUME			= 0x0800,
 	IBMVFC_AE_ADAPTER_FAILED	= 0x1000,
+	IBMVFC_AE_FPIN			= 0x2000,
 };
 
 struct ibmvfc_async_desc {
@@ -560,10 +613,18 @@ enum ibmvfc_ae_link_state {
 	IBMVFC_AE_LS_LINK_DEAD		= 0x08,
 };
 
+enum ibmvfc_ae_fpin_status {
+	IBMVFC_AE_FPIN_LINK_CONGESTED	= 0x1,
+	IBMVFC_AE_FPIN_PORT_CONGESTED	= 0x2,
+	IBMVFC_AE_FPIN_PORT_CLEARED	= 0x3,
+	IBMVFC_AE_FPIN_PORT_DEGRADED	= 0x4,
+};
+
 struct ibmvfc_async_crq {
 	volatile u8 valid;
 	u8 link_state;
-	u8 pad[2];
+	u8 fpin_status;
+	u8 pad;
 	__be32 pad2;
 	volatile __be64 event;
 	volatile __be64 scsi_id;
@@ -590,6 +651,9 @@ union ibmvfc_iu {
 	struct ibmvfc_tmf tmf;
 	struct ibmvfc_cmd cmd;
 	struct ibmvfc_passthru_mad passthru;
+	struct ibmvfc_channel_enquiry channel_enquiry;
+	struct ibmvfc_channel_setup_mad channel_setup;
+	struct ibmvfc_connection_info connection_info;
 }__attribute__((packed, aligned (8)));
 
 enum ibmvfc_target_action {
-- 
2.27.0


^ permalink raw reply related

* Re: [PATCH] soc: fsl: Remove bogus packed attributes from qman.h
From: Herbert Xu @ 2020-09-01  0:33 UTC (permalink / raw)
  To: Li Yang, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20200730125259.GA8948@gondor.apana.org.au>

On Thu, Jul 30, 2020 at 10:52:59PM +1000, Herbert Xu wrote:
> There are two __packed attributes in qman.h that are both unnecessary
> and causing compiler warnings because they're conflicting with
> explicit alignment requirements set on members within the structure.
> 
> This patch removes them both.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Ping.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: fsl_espi errors on v5.7.15
From: Chris Packham @ 2020-09-01  1:25 UTC (permalink / raw)
  To: Heiner Kallweit, Nicholas Piggin, benh@kernel.crashing.org,
	broonie@kernel.org, mpe@ellerman.id.au, paulus@samba.org
  Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
	linux-spi@vger.kernel.org
In-Reply-To: <5bf05188-a52e-c8c0-9d2d-c87eff6f1588@gmail.com>


On 1/09/20 12:33 am, Heiner Kallweit wrote:
> On 30.08.2020 23:59, Chris Packham wrote:
>> On 31/08/20 9:41 am, Heiner Kallweit wrote:
>>> On 30.08.2020 23:00, Chris Packham wrote:
>>>> On 31/08/20 12:30 am, Nicholas Piggin wrote:
>>>>> Excerpts from Chris Packham's message of August 28, 2020 8:07 am:
>>>> <snip>
>>>>
>>>>>>>>>> I've also now seen the RX FIFO not empty error on the T2080RDB
>>>>>>>>>>
>>>>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>>>>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty!
>>>>>>>>>> fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>>>>>>>>>
>>>>>>>>>> With my current workaround of emptying the RX FIFO. It seems
>>>>>>>>>> survivable. Interestingly it only ever seems to be 1 extra byte in the
>>>>>>>>>> RX FIFO and it seems to be after either a READ_SR or a READ_FSR.
>>>>>>>>>>
>>>>>>>>>> fsl_espi ffe110000.spi: tx 70
>>>>>>>>>> fsl_espi ffe110000.spi: rx 03
>>>>>>>>>> fsl_espi ffe110000.spi: Extra RX 00
>>>>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>>>>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty!
>>>>>>>>>> fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>>>>>>>>> fsl_espi ffe110000.spi: tx 05
>>>>>>>>>> fsl_espi ffe110000.spi: rx 00
>>>>>>>>>> fsl_espi ffe110000.spi: Extra RX 03
>>>>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>>>>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty!
>>>>>>>>>> fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>>>>>>>>> fsl_espi ffe110000.spi: tx 05
>>>>>>>>>> fsl_espi ffe110000.spi: rx 00
>>>>>>>>>> fsl_espi ffe110000.spi: Extra RX 03
>>>>>>>>>>
>>>>>>>>>>     From all the Micron SPI-NOR datasheets I've got access to it is
>>>>>>>>>> possible to continually read the SR/FSR. But I've no idea why it
>>>>>>>>>> happens some times and not others.
>>>>>>>>> So I think I've got a reproduction and I think I've bisected the problem
>>>>>>>>> to commit 3282a3da25bd ("powerpc/64: Implement soft interrupt replay in
>>>>>>>>> C"). My day is just finishing now so I haven't applied too much scrutiny
>>>>>>>>> to this result. Given the various rabbit holes I've been down on this
>>>>>>>>> issue already I'd take this information with a good degree of skepticism.
>>>>>>>>>
>>>>>>>> OK, so an easy test should be to re-test with a 5.4 kernel.
>>>>>>>> It doesn't have yet the change you're referring to, and the fsl-espi driver
>>>>>>>> is basically the same as in 5.7 (just two small changes in 5.7).
>>>>>>> There's 6cc0c16d82f88 and maybe also other interrupt related patches
>>>>>>> around this time that could affect book E, so it's good if that exact
>>>>>>> patch is confirmed.
>>>>>> My confirmation is basically that I can induce the issue in a 5.4 kernel
>>>>>> by cherry-picking 3282a3da25bd. I'm also able to "fix" the issue in
>>>>>> 5.9-rc2 by reverting that one commit.
>>>>>>
>>>>>> I both cases it's not exactly a clean cherry-pick/revert so I also
>>>>>> confirmed the bisection result by building at 3282a3da25bd (which sees
>>>>>> the issue) and the commit just before (which does not).
>>>>> Thanks for testing, that confirms it well.
>>>>>
>>>>> [snip patch]
>>>>>
>>>>>> I still saw the issue with this change applied. PPC_IRQ_SOFT_MASK_DEBUG
>>>>>> didn't report anything (either with or without the change above).
>>>>> Okay, it was a bit of a shot in the dark. I still can't see what
>>>>> else has changed.
>>>>>
>>>>> What would cause this, a lost interrupt? A spurious interrupt? Or
>>>>> higher interrupt latency?
>>>>>
>>>>> I don't think the patch should cause significantly worse latency,
>>>>> (it's supposed to be a bit better if anything because it doesn't set
>>>>> up the full interrupt frame). But it's possible.
>>>> My working theory is that the SPI_DON indication is all about the TX
>>>> direction an now that the interrupts are faster we're hitting an error
>>>> because there is still RX activity going on. Heiner disagrees with my
>>>> interpretation of the SPI_DON indication and the fact that it doesn't
>>>> happen every time does throw doubt on it.
>>>>
>>> It's right that the eSPI spec can be interpreted that SPI_DON refers to
>>> TX only. However this wouldn't really make sense, because also for RX
>>> we program the frame length, and therefore want to be notified once the
>>> full frame was received. Also practical experience shows that SPI_DON
>>> is set also after RX-only transfers.
>>> Typical SPI NOR use case is that you write read command + start address,
>>> followed by a longer read. If the TX-only interpretation would be right,
>>> we'd always end up with SPI_DON not being set.
>>>
>>>> I can't really explain the extra RX byte in the fifo. We know how many
>>>> bytes to expect and we pull that many from the fifo so it's not as if
>>>> we're missing an interrupt causing us to skip the last byte. I've been
>>>> looking for some kind of off-by-one calculation but again if it were
>>>> something like that it'd happen all the time.
>>>>
>>> Maybe it helps to know what value this extra byte in the FIFO has. Is it:
>>> - a duplicate of the last read byte
>>> - or the next byte (at <end address> + 1)
>>> - or a fixed value, e.g. always 0x00 or 0xff
>> The values were up thread a bit but I'll repeat them here
>>
>> fsl_espi ffe110000.spi: tx 70
>> fsl_espi ffe110000.spi: rx 03
>> fsl_espi ffe110000.spi: Extra RX 00
>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty!
>> fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>> fsl_espi ffe110000.spi: tx 05
>> fsl_espi ffe110000.spi: rx 00
>> fsl_espi ffe110000.spi: Extra RX 03
>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty!
>> fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>> fsl_espi ffe110000.spi: tx 05
>> fsl_espi ffe110000.spi: rx 00
>> fsl_espi ffe110000.spi: Extra RX 03
>>
>>
>> The rx 00 Extra RX 03 is a bit concerning. I've only ever seen them with
>> either a READ_SR or a READ_FSR. Never a data read.
>>
> Just remembered something about SPIE_DON:
> Transfers are always full duplex, therefore in case of a read the chip
> sends dummy zero's. Having said that in case of a read SPIE_DON means
> that the last dummy zero was shifted out.
>
> READ_SR and READ_FSR are the shortest transfers, 1 byte out and 1 byte in.
> So the issue may have a dependency on the length of the transfer.
> However I see no good explanation so far. You can try adding a delay of
> a few miroseconds between the following to commands in fsl_espi_bufs().
>
> 	fsl_espi_write_reg(espi, ESPI_SPIM, mask);
>
> 	/* Prevent filling the fifo from getting interrupted */
> 	spin_lock_irq(&espi->lock);
>
> Maybe enabling interrupts and seeing the SPIE_DON interrupt are too close.
I think this might be heading in the right direction. Playing about with 
a delay does seem to make the two symptoms less likely. Although I have 
to set it quite high (i.e. msleep(100)) to completely avoid any 
possibility of seeing either message.

^ permalink raw reply

* RE: [PATCH] soc: fsl: Remove bogus packed attributes from qman.h
From: Leo Li @ 2020-09-01  1:50 UTC (permalink / raw)
  To: Herbert Xu, linuxppc-dev@lists.ozlabs.org,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <20200730125259.GA8948@gondor.apana.org.au>



> -----Original Message-----
> From: Herbert Xu <herbert@gondor.apana.org.au>
> Sent: Thursday, July 30, 2020 7:53 AM
> To: Leo Li <leoyang.li@nxp.com>; linuxppc-dev@lists.ozlabs.org; linux-arm-
> kernel@lists.infradead.org
> Subject: [PATCH] soc: fsl: Remove bogus packed attributes from qman.h
> 
> There are two __packed attributes in qman.h that are both unnecessary
> and causing compiler warnings because they're conflicting with
> explicit alignment requirements set on members within the structure.

Sorry for the late response.  I missed this email previously.

These structures are descriptors used by hardware, we cannot have _ANY_ padding from the compiler.  The compiled result might be the same with or without the __packed attribute for now, but I think keep it there probably is safer for dealing with unexpected alignment requirements from the compiler in the future.

Having conflicting alignment requirements warning might means something is wrong with the structure in certain scenario.  I just tried a ARM64 build but didn't see the warnings.  Could you share the warning you got and the build setup?  Thanks.

Regards,
Leo 
> 
> This patch removes them both.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> diff --git a/include/soc/fsl/qman.h b/include/soc/fsl/qman.h
> index cfe00e08e85b..d81ff185dc0b 100644
> --- a/include/soc/fsl/qman.h
> +++ b/include/soc/fsl/qman.h
> @@ -256,7 +256,7 @@ struct qm_dqrr_entry {
>  	__be32 context_b;
>  	struct qm_fd fd;
>  	u8 __reserved4[32];
> -} __packed;
> +};
>  #define QM_DQRR_VERB_VBIT		0x80
>  #define QM_DQRR_VERB_MASK		0x7f	/* where the verb
> contains; */
>  #define QM_DQRR_VERB_FRAME_DEQUEUE	0x60	/* "this format" */
> @@ -289,7 +289,7 @@ union qm_mr_entry {
>  		__be32 tag;
>  		struct qm_fd fd;
>  		u8 __reserved1[32];
> -	} __packed ern;
> +	} ern;
>  	struct {
>  		u8 verb;
>  		u8 fqs;		/* Frame Queue Status */
> --
> Email: Herbert Xu <herbert@gondor.apana.org.au>
> Home Page:
> https://eur01.safelinks.protection.outlook.com/?url=http:%2F%2Fgondor.ap
> ana.org.au%2F~herbert%2F&amp;data=02%7C01%7Cleoyang.li%40nxp.com
> %7Cb69aca8dc53a4030b14b08d8348783a9%7C686ea1d3bc2b4c6fa92cd99c5c3
> 01635%7C0%7C0%7C637317103931120730&amp;sdata=g3%2FJfa%2FNcuhLD5
> SYhbmhno65O1bxisVt2xltu2IMPjQ%3D&amp;reserved=0
> PGP Key:
> https://eur01.safelinks.protection.outlook.com/?url=http:%2F%2Fgondor.ap
> ana.org.au%2F~herbert%2Fpubkey.txt&amp;data=02%7C01%7Cleoyang.li%4
> 0nxp.com%7Cb69aca8dc53a4030b14b08d8348783a9%7C686ea1d3bc2b4c6fa9
> 2cd99c5c301635%7C0%7C0%7C637317103931120730&amp;sdata=uSS2C4cuAL
> XcCgIhpIORK4EZ1BHHj%2BqAW2Pu%2FLrFKPM%3D&amp;reserved=0

^ permalink raw reply

* Re: [PATCH] soc: fsl: Remove bogus packed attributes from qman.h
From: Herbert Xu @ 2020-09-01  1:56 UTC (permalink / raw)
  To: Leo Li
  Cc: linuxppc-dev@lists.ozlabs.org, Linux Kernel Mailing List,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <VE1PR04MB6687FB075B9A6A0923F576978F2E0@VE1PR04MB6687.eurprd04.prod.outlook.com>

On Tue, Sep 01, 2020 at 01:50:38AM +0000, Leo Li wrote:
>
> Sorry for the late response.  I missed this email previously.
> 
> These structures are descriptors used by hardware, we cannot have _ANY_ padding from the compiler.  The compiled result might be the same with or without the __packed attribute for now, but I think keep it there probably is safer for dealing with unexpected alignment requirements from the compiler in the future.
> 
> Having conflicting alignment requirements warning might means something is wrong with the structure in certain scenario.  I just tried a ARM64 build but didn't see the warnings.  Could you share the warning you got and the build setup?  Thanks.

Just do a COMPILE_TEST build on x86-64:

In file included from ../drivers/crypto/caam/qi.c:12:
../include/soc/fsl/qman.h:259:1: warning: alignment 1 of ‘struct qm_dqrr_entry’ is less than 8 [-Wpacked-not-aligned]
 } __packed;
 ^
../include/soc/fsl/qman.h:292:2: warning: alignment 1 of ‘struct <anonymous>’ is less than 8 [-Wpacked-not-aligned]
  } __packed ern;
  ^

In any case, those packed markers are completely unnecessary because
those structs contain no holes.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: fsl_espi errors on v5.7.15
From: Chris Packham @ 2020-09-01  2:52 UTC (permalink / raw)
  To: Heiner Kallweit, Nicholas Piggin, benh@kernel.crashing.org,
	broonie@kernel.org, mpe@ellerman.id.au, paulus@samba.org
  Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
	linux-spi@vger.kernel.org
In-Reply-To: <5bf05188-a52e-c8c0-9d2d-c87eff6f1588@gmail.com>


On 1/09/20 12:33 am, Heiner Kallweit wrote:
> On 30.08.2020 23:59, Chris Packham wrote:
>> On 31/08/20 9:41 am, Heiner Kallweit wrote:
>>> On 30.08.2020 23:00, Chris Packham wrote:
>>>> On 31/08/20 12:30 am, Nicholas Piggin wrote:
>>>>> Excerpts from Chris Packham's message of August 28, 2020 8:07 am:
>>>> <snip>
>>>>
>>>>>>>>>> I've also now seen the RX FIFO not empty error on the T2080RDB
>>>>>>>>>>
>>>>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>>>>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty!
>>>>>>>>>> fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>>>>>>>>>
>>>>>>>>>> With my current workaround of emptying the RX FIFO. It seems
>>>>>>>>>> survivable. Interestingly it only ever seems to be 1 extra byte in the
>>>>>>>>>> RX FIFO and it seems to be after either a READ_SR or a READ_FSR.
>>>>>>>>>>
>>>>>>>>>> fsl_espi ffe110000.spi: tx 70
>>>>>>>>>> fsl_espi ffe110000.spi: rx 03
>>>>>>>>>> fsl_espi ffe110000.spi: Extra RX 00
>>>>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>>>>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty!
>>>>>>>>>> fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>>>>>>>>> fsl_espi ffe110000.spi: tx 05
>>>>>>>>>> fsl_espi ffe110000.spi: rx 00
>>>>>>>>>> fsl_espi ffe110000.spi: Extra RX 03
>>>>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>>>>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty!
>>>>>>>>>> fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>>>>>>>>> fsl_espi ffe110000.spi: tx 05
>>>>>>>>>> fsl_espi ffe110000.spi: rx 00
>>>>>>>>>> fsl_espi ffe110000.spi: Extra RX 03
>>>>>>>>>>
>>>>>>>>>>     From all the Micron SPI-NOR datasheets I've got access to it is
>>>>>>>>>> possible to continually read the SR/FSR. But I've no idea why it
>>>>>>>>>> happens some times and not others.
>>>>>>>>> So I think I've got a reproduction and I think I've bisected the problem
>>>>>>>>> to commit 3282a3da25bd ("powerpc/64: Implement soft interrupt replay in
>>>>>>>>> C"). My day is just finishing now so I haven't applied too much scrutiny
>>>>>>>>> to this result. Given the various rabbit holes I've been down on this
>>>>>>>>> issue already I'd take this information with a good degree of skepticism.
>>>>>>>>>
>>>>>>>> OK, so an easy test should be to re-test with a 5.4 kernel.
>>>>>>>> It doesn't have yet the change you're referring to, and the fsl-espi driver
>>>>>>>> is basically the same as in 5.7 (just two small changes in 5.7).
>>>>>>> There's 6cc0c16d82f88 and maybe also other interrupt related patches
>>>>>>> around this time that could affect book E, so it's good if that exact
>>>>>>> patch is confirmed.
>>>>>> My confirmation is basically that I can induce the issue in a 5.4 kernel
>>>>>> by cherry-picking 3282a3da25bd. I'm also able to "fix" the issue in
>>>>>> 5.9-rc2 by reverting that one commit.
>>>>>>
>>>>>> I both cases it's not exactly a clean cherry-pick/revert so I also
>>>>>> confirmed the bisection result by building at 3282a3da25bd (which sees
>>>>>> the issue) and the commit just before (which does not).
>>>>> Thanks for testing, that confirms it well.
>>>>>
>>>>> [snip patch]
>>>>>
>>>>>> I still saw the issue with this change applied. PPC_IRQ_SOFT_MASK_DEBUG
>>>>>> didn't report anything (either with or without the change above).
>>>>> Okay, it was a bit of a shot in the dark. I still can't see what
>>>>> else has changed.
>>>>>
>>>>> What would cause this, a lost interrupt? A spurious interrupt? Or
>>>>> higher interrupt latency?
>>>>>
>>>>> I don't think the patch should cause significantly worse latency,
>>>>> (it's supposed to be a bit better if anything because it doesn't set
>>>>> up the full interrupt frame). But it's possible.
>>>> My working theory is that the SPI_DON indication is all about the TX
>>>> direction an now that the interrupts are faster we're hitting an error
>>>> because there is still RX activity going on. Heiner disagrees with my
>>>> interpretation of the SPI_DON indication and the fact that it doesn't
>>>> happen every time does throw doubt on it.
>>>>
>>> It's right that the eSPI spec can be interpreted that SPI_DON refers to
>>> TX only. However this wouldn't really make sense, because also for RX
>>> we program the frame length, and therefore want to be notified once the
>>> full frame was received. Also practical experience shows that SPI_DON
>>> is set also after RX-only transfers.
>>> Typical SPI NOR use case is that you write read command + start address,
>>> followed by a longer read. If the TX-only interpretation would be right,
>>> we'd always end up with SPI_DON not being set.
>>>
>>>> I can't really explain the extra RX byte in the fifo. We know how many
>>>> bytes to expect and we pull that many from the fifo so it's not as if
>>>> we're missing an interrupt causing us to skip the last byte. I've been
>>>> looking for some kind of off-by-one calculation but again if it were
>>>> something like that it'd happen all the time.
>>>>
>>> Maybe it helps to know what value this extra byte in the FIFO has. Is it:
>>> - a duplicate of the last read byte
>>> - or the next byte (at <end address> + 1)
>>> - or a fixed value, e.g. always 0x00 or 0xff
>> The values were up thread a bit but I'll repeat them here
>>
>> fsl_espi ffe110000.spi: tx 70
>> fsl_espi ffe110000.spi: rx 03
>> fsl_espi ffe110000.spi: Extra RX 00
>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty!
>> fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>> fsl_espi ffe110000.spi: tx 05
>> fsl_espi ffe110000.spi: rx 00
>> fsl_espi ffe110000.spi: Extra RX 03
>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty!
>> fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>> fsl_espi ffe110000.spi: tx 05
>> fsl_espi ffe110000.spi: rx 00
>> fsl_espi ffe110000.spi: Extra RX 03
>>
>>
>> The rx 00 Extra RX 03 is a bit concerning. I've only ever seen them with
>> either a READ_SR or a READ_FSR. Never a data read.
>>
> Just remembered something about SPIE_DON:
> Transfers are always full duplex,

That's not true in rxskip mode. Setting rxskip forces the eSPI into 
half-duplex mode. So far all the instances of "extra" rx bytes have been 
when rxskip == 1.

I think that could be where our experience with SPIE_DON differ. In 
full-duplex mode yes DON is always set. In half duplex mode we can end 
up with DON set or not set depending on the interrupt timing.

> therefore in case of a read the chip
> sends dummy zero's. Having said that in case of a read SPIE_DON means
> that the last dummy zero was shifted out.
>
> READ_SR and READ_FSR are the shortest transfers, 1 byte out and 1 byte in.
> So the issue may have a dependency on the length of the transfer.
> However I see no good explanation so far. You can try adding a delay of
> a few miroseconds between the following to commands in fsl_espi_bufs().
>
> 	fsl_espi_write_reg(espi, ESPI_SPIM, mask);
>
> 	/* Prevent filling the fifo from getting interrupted */
> 	spin_lock_irq(&espi->lock);
>
> Maybe enabling interrupts and seeing the SPIE_DON interrupt are too close.

^ permalink raw reply

* Re: [PATCH v3 01/13] powerpc/mm: Add DEBUG_VM WARN for pmd_clear
From: Anshuman Khandual @ 2020-09-01  3:12 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm
  Cc: linux-arch, linux-s390, Christophe Leroy, x86, Mike Rapoport,
	Qian Cai, Gerald Schaefer, Vineet Gupta, linux-snps-arc,
	linuxppc-dev, linux-arm-kernel
In-Reply-To: <20200827080438.315345-2-aneesh.kumar@linux.ibm.com>



On 08/27/2020 01:34 PM, Aneesh Kumar K.V wrote:
> With the hash page table, the kernel should not use pmd_clear for clearing
> huge pte entries. Add a DEBUG_VM WARN to catch the wrong usage.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  arch/powerpc/include/asm/book3s/64/pgtable.h | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index 6de56c3b33c4..079211968987 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -868,6 +868,13 @@ static inline bool pte_ci(pte_t pte)
>  
>  static inline void pmd_clear(pmd_t *pmdp)
>  {
> +	if (IS_ENABLED(CONFIG_DEBUG_VM) && !radix_enabled()) {
> +		/*
> +		 * Don't use this if we can possibly have a hash page table
> +		 * entry mapping this.
> +		 */
> +		WARN_ON((pmd_val(*pmdp) & (H_PAGE_HASHPTE | _PAGE_PTE)) == (H_PAGE_HASHPTE | _PAGE_PTE));
> +	}
>  	*pmdp = __pmd(0);
>  }
>  
> @@ -916,6 +923,13 @@ static inline int pmd_bad(pmd_t pmd)
>  
>  static inline void pud_clear(pud_t *pudp)
>  {
> +	if (IS_ENABLED(CONFIG_DEBUG_VM) && !radix_enabled()) {
> +		/*
> +		 * Don't use this if we can possibly have a hash page table
> +		 * entry mapping this.
> +		 */
> +		WARN_ON((pud_val(*pudp) & (H_PAGE_HASHPTE | _PAGE_PTE)) == (H_PAGE_HASHPTE | _PAGE_PTE));
> +	}
>  	*pudp = __pud(0);
>  }

There are two checkpatch.pl warnings for this patch.

WARNING: line length of 105 exceeds 100 columns
#27: FILE: arch/powerpc/include/asm/book3s/64/pgtable.h:876:
+               WARN_ON((pmd_val(*pmdp) & (H_PAGE_HASHPTE | _PAGE_PTE)) == (H_PAGE_HASHPTE | _PAGE_PTE));

WARNING: line length of 105 exceeds 100 columns
#41: FILE: arch/powerpc/include/asm/book3s/64/pgtable.h:931:
+               WARN_ON((pud_val(*pudp) & (H_PAGE_HASHPTE | _PAGE_PTE)) == (H_PAGE_HASHPTE | _PAGE_PTE));

total: 0 errors, 2 warnings, 26 lines checked

^ permalink raw reply

* Re: [PATCH v3 02/13] powerpc/mm: Move setting pte specific flags to pfn_pte
From: Anshuman Khandual @ 2020-09-01  3:13 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm
  Cc: linux-arch, linux-s390, Christophe Leroy, x86, Mike Rapoport,
	Qian Cai, Gerald Schaefer, Vineet Gupta, linux-snps-arc,
	linuxppc-dev, linux-arm-kernel
In-Reply-To: <20200827080438.315345-3-aneesh.kumar@linux.ibm.com>



On 08/27/2020 01:34 PM, Aneesh Kumar K.V wrote:
> powerpc used to set the pte specific flags in set_pte_at(). This is different
> from other architectures. To be consistent with other architecture update
> pfn_pte to set _PAGE_PTE on ppc64. Also, drop now unused pte_mkpte.
> 
> We add a VM_WARN_ON() to catch the usage of calling set_pte_at() without setting
> _PAGE_PTE bit. We will remove that after a few releases.
> 
> With respect to huge pmd entries, pmd_mkhuge() takes care of adding the
> _PAGE_PTE bit.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  arch/powerpc/include/asm/book3s/64/pgtable.h | 15 +++++++++------
>  arch/powerpc/include/asm/nohash/pgtable.h    |  5 -----
>  arch/powerpc/mm/pgtable.c                    |  5 -----
>  3 files changed, 9 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index 079211968987..2382fd516f6b 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -619,7 +619,7 @@ static inline pte_t pfn_pte(unsigned long pfn, pgprot_t pgprot)
>  	VM_BUG_ON(pfn >> (64 - PAGE_SHIFT));
>  	VM_BUG_ON((pfn << PAGE_SHIFT) & ~PTE_RPN_MASK);
>  
> -	return __pte(((pte_basic_t)pfn << PAGE_SHIFT) | pgprot_val(pgprot));
> +	return __pte(((pte_basic_t)pfn << PAGE_SHIFT) | pgprot_val(pgprot) | _PAGE_PTE);
>  }
>  
>  static inline unsigned long pte_pfn(pte_t pte)
> @@ -655,11 +655,6 @@ static inline pte_t pte_mkexec(pte_t pte)
>  	return __pte_raw(pte_raw(pte) | cpu_to_be64(_PAGE_EXEC));
>  }
>  
> -static inline pte_t pte_mkpte(pte_t pte)
> -{
> -	return __pte_raw(pte_raw(pte) | cpu_to_be64(_PAGE_PTE));
> -}
> -
>  static inline pte_t pte_mkwrite(pte_t pte)
>  {
>  	/*
> @@ -823,6 +818,14 @@ static inline int pte_none(pte_t pte)
>  static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
>  				pte_t *ptep, pte_t pte, int percpu)
>  {
> +
> +	VM_WARN_ON(!(pte_raw(pte) & cpu_to_be64(_PAGE_PTE)));
> +	/*
> +	 * Keep the _PAGE_PTE added till we are sure we handle _PAGE_PTE
> +	 * in all the callers.
> +	 */
> +	 pte = __pte_raw(pte_raw(pte) | cpu_to_be64(_PAGE_PTE));
> +
>  	if (radix_enabled())
>  		return radix__set_pte_at(mm, addr, ptep, pte, percpu);
>  	return hash__set_pte_at(mm, addr, ptep, pte, percpu);
> diff --git a/arch/powerpc/include/asm/nohash/pgtable.h b/arch/powerpc/include/asm/nohash/pgtable.h
> index 4b7c3472eab1..6277e7596ae5 100644
> --- a/arch/powerpc/include/asm/nohash/pgtable.h
> +++ b/arch/powerpc/include/asm/nohash/pgtable.h
> @@ -140,11 +140,6 @@ static inline pte_t pte_mkold(pte_t pte)
>  	return __pte(pte_val(pte) & ~_PAGE_ACCESSED);
>  }
>  
> -static inline pte_t pte_mkpte(pte_t pte)
> -{
> -	return pte;
> -}
> -
>  static inline pte_t pte_mkspecial(pte_t pte)
>  {
>  	return __pte(pte_val(pte) | _PAGE_SPECIAL);
> diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
> index 9c0547d77af3..ab57b07ef39a 100644
> --- a/arch/powerpc/mm/pgtable.c
> +++ b/arch/powerpc/mm/pgtable.c
> @@ -184,9 +184,6 @@ void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
>  	 */
>  	VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));
>  
> -	/* Add the pte bit when trying to set a pte */
> -	pte = pte_mkpte(pte);
> -
>  	/* Note: mm->context.id might not yet have been assigned as
>  	 * this context might not have been activated yet when this
>  	 * is called.
> @@ -275,8 +272,6 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_
>  	 */
>  	VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));
>  
> -	pte = pte_mkpte(pte);
> -
>  	pte = set_pte_filter(pte);
>  
>  	val = pte_val(pte);
> 

There is one checkpatch.pl warning for this patch.

WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#6: 
powerpc used to set the pte specific flags in set_pte_at(). This is different

total: 0 errors, 1 warnings, 61 lines checked

^ permalink raw reply

* Re: [PATCH v3 03/13] mm/debug_vm_pgtable/ppc64: Avoid setting top bits in radom value
From: Anshuman Khandual @ 2020-09-01  3:15 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm
  Cc: linux-arch, linux-s390, Christophe Leroy, x86, Mike Rapoport,
	Qian Cai, Gerald Schaefer, Vineet Gupta, linux-snps-arc,
	linuxppc-dev, linux-arm-kernel
In-Reply-To: <20200827080438.315345-4-aneesh.kumar@linux.ibm.com>



On 08/27/2020 01:34 PM, Aneesh Kumar K.V wrote:
> ppc64 use bit 62 to indicate a pte entry (_PAGE_PTE). Avoid setting that bit in
> random value.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  mm/debug_vm_pgtable.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
> index 086309fb9b6f..bbf9df0e64c6 100644
> --- a/mm/debug_vm_pgtable.c
> +++ b/mm/debug_vm_pgtable.c
> @@ -44,10 +44,17 @@
>   * entry type. But these bits might affect the ability to clear entries with
>   * pxx_clear() because of how dynamic page table folding works on s390. So
>   * while loading up the entries do not change the lower 4 bits. It does not
> - * have affect any other platform.
> + * have affect any other platform. Also avoid the 62nd bit on ppc64 that is
> + * used to mark a pte entry.
>   */
> -#define S390_MASK_BITS	4
> -#define RANDOM_ORVALUE	GENMASK(BITS_PER_LONG - 1, S390_MASK_BITS)
> +#define S390_SKIP_MASK		GENMASK(3, 0)
> +#ifdef CONFIG_PPC_BOOK3S_64
> +#define PPC64_SKIP_MASK		GENMASK(62, 62)
> +#else
> +#define PPC64_SKIP_MASK		0x0
> +#endif

Please drop the #ifdef CONFIG_PPC_BOOK3S_64 here. We already accommodate skip
bits for a s390 platform requirement and can also do so for ppc64 as well. As
mentioned before, please avoid adding any platform specific constructs in the
test.

> +#define ARCH_SKIP_MASK (S390_SKIP_MASK | PPC64_SKIP_MASK)
> +#define RANDOM_ORVALUE (GENMASK(BITS_PER_LONG - 1, 0) & ~ARCH_SKIP_MASK)
>  #define RANDOM_NZVALUE	GENMASK(7, 0)

Please fix the alignments here. Feel free to consider following changes after
this patch.

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 122416464e0f..f969031152bb 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -48,14 +48,11 @@
  * have affect any other platform. Also avoid the 62nd bit on ppc64 that is
  * used to mark a pte entry.
  */
-#define S390_SKIP_MASK         GENMASK(3, 0)
-#ifdef CONFIG_PPC_BOOK3S_64
-#define PPC64_SKIP_MASK                GENMASK(62, 62)
-#else
-#define PPC64_SKIP_MASK                0x0
-#endif
-#define ARCH_SKIP_MASK (S390_SKIP_MASK | PPC64_SKIP_MASK)
-#define RANDOM_ORVALUE (GENMASK(BITS_PER_LONG - 1, 0) & ~ARCH_SKIP_MASK)
+#define S390_SKIP_MASK GENMASK(3, 0)
+#define PPC64_SKIP_MASK        GENMASK(62, 62)
+#define ARCH_SKIP_MASK (S390_SKIP_MASK | PPC64_SKIP_MASK)
+#define RANDOM_ORVALUE (GENMASK(BITS_PER_LONG - 1, 0) & ~ARCH_SKIP_MASK)
+
 #define RANDOM_NZVALUE GENMASK(7, 0)
 
>  
>  static void __init pte_basic_tests(unsigned long pfn, pgprot_t prot)
> 

Besides, there is also one checkpatch.pl warning here.

WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#7: 
ppc64 use bit 62 to indicate a pte entry (_PAGE_PTE). Avoid setting that bit in

total: 0 errors, 1 warnings, 20 lines checked

^ permalink raw reply related

* Re: [PATCH v3 04/13] mm/debug_vm_pgtables/hugevmap: Use the arch helper to identify huge vmap support.
From: Anshuman Khandual @ 2020-09-01  3:16 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm
  Cc: linux-arch, linux-s390, Christophe Leroy, x86, Mike Rapoport,
	Qian Cai, Gerald Schaefer, Vineet Gupta, linux-snps-arc,
	linuxppc-dev, linux-arm-kernel
In-Reply-To: <20200827080438.315345-5-aneesh.kumar@linux.ibm.com>



On 08/27/2020 01:34 PM, Aneesh Kumar K.V wrote:
> ppc64 supports huge vmap only with radix translation. Hence use arch helper
> to determine the huge vmap support.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  mm/debug_vm_pgtable.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
> index bbf9df0e64c6..28f9d0558c20 100644
> --- a/mm/debug_vm_pgtable.c
> +++ b/mm/debug_vm_pgtable.c
> @@ -28,6 +28,7 @@
>  #include <linux/swapops.h>
>  #include <linux/start_kernel.h>
>  #include <linux/sched/mm.h>
> +#include <linux/io.h>
>  #include <asm/pgalloc.h>
>  #include <asm/tlbflush.h>
>  
> @@ -206,11 +207,12 @@ static void __init pmd_leaf_tests(unsigned long pfn, pgprot_t prot)
>  	WARN_ON(!pmd_leaf(pmd));
>  }
>  
> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
>  static void __init pmd_huge_tests(pmd_t *pmdp, unsigned long pfn, pgprot_t prot)
>  {
>  	pmd_t pmd;
>  
> -	if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP))
> +	if (!arch_ioremap_pmd_supported())
>  		return;
>  
>  	pr_debug("Validating PMD huge\n");
> @@ -224,6 +226,10 @@ static void __init pmd_huge_tests(pmd_t *pmdp, unsigned long pfn, pgprot_t prot)
>  	pmd = READ_ONCE(*pmdp);
>  	WARN_ON(!pmd_none(pmd));
>  }
> +#else /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
> +static void __init pmd_huge_tests(pmd_t *pmdp, unsigned long pfn, pgprot_t prot) { }
> +#endif /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
> +

Some small nits here.

- s/!CONFIG_HAVE_ARCH_HUGE_VMAP/CONFIG_HAVE_ARCH_HUGE_VMAP
- Drop the extra line at the end
- Move the { } down just to be consistent with existing stub for pmd_huge_tests()

>  
>  static void __init pmd_savedwrite_tests(unsigned long pfn, pgprot_t prot)
>  {
> @@ -320,11 +326,12 @@ static void __init pud_leaf_tests(unsigned long pfn, pgprot_t prot)
>  	WARN_ON(!pud_leaf(pud));
>  }
>  
> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
>  static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, pgprot_t prot)
>  {
>  	pud_t pud;
>  
> -	if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP))
> +	if (!arch_ioremap_pud_supported())
>  		return;
>  
>  	pr_debug("Validating PUD huge\n");
> @@ -338,6 +345,10 @@ static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, pgprot_t prot)
>  	pud = READ_ONCE(*pudp);
>  	WARN_ON(!pud_none(pud));
>  }
> +#else /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
> +static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, pgprot_t prot) { }
> +#endif /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
> +

Some small nits here.

- s/!CONFIG_HAVE_ARCH_HUGE_VMAP/CONFIG_HAVE_ARCH_HUGE_VMAP
- Drop the extra line at the end
- Move the { } down just to be consistent with existing stub for pud_huge_tests()

>  #else  /* !CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */
>  static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot) { }
>  static void __init pud_advanced_tests(struct mm_struct *mm,
>

^ permalink raw reply

* Re: [PATCH v3 05/13] mm/debug_vm_pgtable/savedwrite: Enable savedwrite test with CONFIG_NUMA_BALANCING
From: Anshuman Khandual @ 2020-09-01  3:18 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm
  Cc: linux-arch, linux-s390, Christophe Leroy, x86, Mike Rapoport,
	Qian Cai, Gerald Schaefer, Vineet Gupta, linux-snps-arc,
	linuxppc-dev, linux-arm-kernel
In-Reply-To: <20200827080438.315345-6-aneesh.kumar@linux.ibm.com>



On 08/27/2020 01:34 PM, Aneesh Kumar K.V wrote:
> Saved write support was added to track the write bit of a pte after marking the
> pte protnone. This was done so that AUTONUMA can convert a write pte to protnone
> and still track the old write bit. When converting it back we set the pte write
> bit correctly thereby avoiding a write fault again. Hence enable the test only
> when CONFIG_NUMA_BALANCING is enabled and use protnone protflags.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  mm/debug_vm_pgtable.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
> index 28f9d0558c20..5c0680836fe9 100644
> --- a/mm/debug_vm_pgtable.c
> +++ b/mm/debug_vm_pgtable.c
> @@ -119,10 +119,14 @@ static void __init pte_savedwrite_tests(unsigned long pfn, pgprot_t prot)
>  {
>  	pte_t pte = pfn_pte(pfn, prot);
>  
> +	if (!IS_ENABLED(CONFIG_NUMA_BALANCING))
> +		return;
> +
>  	pr_debug("Validating PTE saved write\n");
>  	WARN_ON(!pte_savedwrite(pte_mk_savedwrite(pte_clear_savedwrite(pte))));
>  	WARN_ON(pte_savedwrite(pte_clear_savedwrite(pte_mk_savedwrite(pte))));
>  }
> +
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot)
>  {
> @@ -235,6 +239,9 @@ static void __init pmd_savedwrite_tests(unsigned long pfn, pgprot_t prot)
>  {
>  	pmd_t pmd = pfn_pmd(pfn, prot);
>  
> +	if (!IS_ENABLED(CONFIG_NUMA_BALANCING))
> +		return;
> +
>  	pr_debug("Validating PMD saved write\n");
>  	WARN_ON(!pmd_savedwrite(pmd_mk_savedwrite(pmd_clear_savedwrite(pmd))));
>  	WARN_ON(pmd_savedwrite(pmd_clear_savedwrite(pmd_mk_savedwrite(pmd))));
> @@ -1020,8 +1027,8 @@ static int __init debug_vm_pgtable(void)
>  	pmd_huge_tests(pmdp, pmd_aligned, prot);
>  	pud_huge_tests(pudp, pud_aligned, prot);
>  
> -	pte_savedwrite_tests(pte_aligned, prot);
> -	pmd_savedwrite_tests(pmd_aligned, prot);
> +	pte_savedwrite_tests(pte_aligned, protnone);
> +	pmd_savedwrite_tests(pmd_aligned, protnone);
>  
>  	pte_unmap_unlock(ptep, ptl);
>  
> 

There is a checkpatch.pl warning here.

WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#7: 
Saved write support was added to track the write bit of a pte after marking the

total: 0 errors, 1 warnings, 33 lines checked

Otherwise this looks good. With the checkpatch.pl warning fixed

Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>

^ permalink raw reply

* Re: [PATCH v3 06/13] mm/debug_vm_pgtable/THP: Mark the pte entry huge before using set_pmd/pud_at
From: Anshuman Khandual @ 2020-09-01  3:21 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm
  Cc: linux-arch, linux-s390, Christophe Leroy, x86, Mike Rapoport,
	Qian Cai, Gerald Schaefer, Vineet Gupta, linux-snps-arc,
	linuxppc-dev, linux-arm-kernel
In-Reply-To: <20200827080438.315345-7-aneesh.kumar@linux.ibm.com>



On 08/27/2020 01:34 PM, Aneesh Kumar K.V wrote:
> kernel expects entries to be marked huge before we use set_pmd_at()/set_pud_at().
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  mm/debug_vm_pgtable.c | 21 ++++++++++++---------
>  1 file changed, 12 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
> index 5c0680836fe9..de83a20c1b30 100644
> --- a/mm/debug_vm_pgtable.c
> +++ b/mm/debug_vm_pgtable.c
> @@ -155,7 +155,7 @@ static void __init pmd_advanced_tests(struct mm_struct *mm,
>  				      unsigned long pfn, unsigned long vaddr,
>  				      pgprot_t prot)
>  {
> -	pmd_t pmd = pfn_pmd(pfn, prot);
> +	pmd_t pmd;
>  
>  	if (!has_transparent_hugepage())
>  		return;
> @@ -164,19 +164,19 @@ static void __init pmd_advanced_tests(struct mm_struct *mm,
>  	/* Align the address wrt HPAGE_PMD_SIZE */
>  	vaddr = (vaddr & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE;
>  
> -	pmd = pfn_pmd(pfn, prot);
> +	pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
>  	set_pmd_at(mm, vaddr, pmdp, pmd);
>  	pmdp_set_wrprotect(mm, vaddr, pmdp);
>  	pmd = READ_ONCE(*pmdp);
>  	WARN_ON(pmd_write(pmd));
>  
> -	pmd = pfn_pmd(pfn, prot);
> +	pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
>  	set_pmd_at(mm, vaddr, pmdp, pmd);
>  	pmdp_huge_get_and_clear(mm, vaddr, pmdp);
>  	pmd = READ_ONCE(*pmdp);
>  	WARN_ON(!pmd_none(pmd));
>  
> -	pmd = pfn_pmd(pfn, prot);
> +	pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
>  	pmd = pmd_wrprotect(pmd);
>  	pmd = pmd_mkclean(pmd);
>  	set_pmd_at(mm, vaddr, pmdp, pmd);
> @@ -237,7 +237,7 @@ static void __init pmd_huge_tests(pmd_t *pmdp, unsigned long pfn, pgprot_t prot)
>  
>  static void __init pmd_savedwrite_tests(unsigned long pfn, pgprot_t prot)
>  {
> -	pmd_t pmd = pfn_pmd(pfn, prot);
> +	pmd_t pmd = pmd_mkhuge(pfn_pmd(pfn, prot));

There is no set_pmd_at() in this particular test, why change ?

>  
>  	if (!IS_ENABLED(CONFIG_NUMA_BALANCING))
>  		return;
> @@ -277,7 +277,7 @@ static void __init pud_advanced_tests(struct mm_struct *mm,
>  				      unsigned long pfn, unsigned long vaddr,
>  				      pgprot_t prot)
>  {
> -	pud_t pud = pfn_pud(pfn, prot);
> +	pud_t pud;
>  
>  	if (!has_transparent_hugepage())
>  		return;
> @@ -286,25 +286,28 @@ static void __init pud_advanced_tests(struct mm_struct *mm,
>  	/* Align the address wrt HPAGE_PUD_SIZE */
>  	vaddr = (vaddr & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE;
>  
> +	pud = pud_mkhuge(pfn_pud(pfn, prot));
>  	set_pud_at(mm, vaddr, pudp, pud);
>  	pudp_set_wrprotect(mm, vaddr, pudp);
>  	pud = READ_ONCE(*pudp);
>  	WARN_ON(pud_write(pud));
>  
>  #ifndef __PAGETABLE_PMD_FOLDED
> -	pud = pfn_pud(pfn, prot);
> +

Please drop the extra line here.

> +	pud = pud_mkhuge(pfn_pud(pfn, prot));
>  	set_pud_at(mm, vaddr, pudp, pud);
>  	pudp_huge_get_and_clear(mm, vaddr, pudp);
>  	pud = READ_ONCE(*pudp);
>  	WARN_ON(!pud_none(pud));
>  
> -	pud = pfn_pud(pfn, prot);
> +	pud = pud_mkhuge(pfn_pud(pfn, prot));
>  	set_pud_at(mm, vaddr, pudp, pud);
>  	pudp_huge_get_and_clear_full(mm, vaddr, pudp, 1);
>  	pud = READ_ONCE(*pudp);
>  	WARN_ON(!pud_none(pud));
>  #endif /* __PAGETABLE_PMD_FOLDED */
> -	pud = pfn_pud(pfn, prot);
> +

Please drop the extra line here.

> +	pud = pud_mkhuge(pfn_pud(pfn, prot));
>  	pud = pud_wrprotect(pud);
>  	pud = pud_mkclean(pud);
>  	set_pud_at(mm, vaddr, pudp, pud);
>

There is a checkpatch.pl warning here.

WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#7: 
kernel expects entries to be marked huge before we use set_pmd_at()/set_pud_at().

total: 0 errors, 1 warnings, 77 lines checked

^ permalink raw reply

* Re: [PATCH v3 08/13] mm/debug_vm_pgtable/thp: Use page table depost/withdraw with THP
From: Anshuman Khandual @ 2020-09-01  3:22 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm
  Cc: linux-arch, linux-s390, Christophe Leroy, x86, Mike Rapoport,
	Qian Cai, Gerald Schaefer, Vineet Gupta, linux-snps-arc,
	linuxppc-dev, linux-arm-kernel
In-Reply-To: <20200827080438.315345-9-aneesh.kumar@linux.ibm.com>



On 08/27/2020 01:34 PM, Aneesh Kumar K.V wrote:
> Architectures like ppc64 use deposited page table while updating the huge pte
> entries.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  mm/debug_vm_pgtable.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
> index f9f6358899a8..0ce5c6a24c5b 100644
> --- a/mm/debug_vm_pgtable.c
> +++ b/mm/debug_vm_pgtable.c
> @@ -154,7 +154,7 @@ static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot)
>  static void __init pmd_advanced_tests(struct mm_struct *mm,
>  				      struct vm_area_struct *vma, pmd_t *pmdp,
>  				      unsigned long pfn, unsigned long vaddr,
> -				      pgprot_t prot)
> +				      pgprot_t prot, pgtable_t pgtable)
>  {
>  	pmd_t pmd;
>  
> @@ -165,6 +165,8 @@ static void __init pmd_advanced_tests(struct mm_struct *mm,
>  	/* Align the address wrt HPAGE_PMD_SIZE */
>  	vaddr = (vaddr & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE;
>  
> +	pgtable_trans_huge_deposit(mm, pmdp, pgtable);
> +
>  	pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
>  	set_pmd_at(mm, vaddr, pmdp, pmd);
>  	pmdp_set_wrprotect(mm, vaddr, pmdp);
> @@ -193,6 +195,8 @@ static void __init pmd_advanced_tests(struct mm_struct *mm,
>  	pmdp_test_and_clear_young(vma, vaddr, pmdp);
>  	pmd = READ_ONCE(*pmdp);
>  	WARN_ON(pmd_young(pmd));
> +
> +	pgtable = pgtable_trans_huge_withdraw(mm, pmdp);

Should the call sites here be wrapped with __HAVE_ARCH_PGTABLE_DEPOSIT and
__HAVE_ARCH_PGTABLE_WITHDRAW respectively. Though there are generic fallback
definitions, wondering whether they are indeed essential for all platforms.

>  }
>  
>  static void __init pmd_leaf_tests(unsigned long pfn, pgprot_t prot)
> @@ -373,7 +377,7 @@ static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot) { }
>  static void __init pmd_advanced_tests(struct mm_struct *mm,
>  				      struct vm_area_struct *vma, pmd_t *pmdp,
>  				      unsigned long pfn, unsigned long vaddr,
> -				      pgprot_t prot)
> +				      pgprot_t prot, pgtable_t pgtable)
>  {
>  }
>  static void __init pud_advanced_tests(struct mm_struct *mm,
> @@ -1015,7 +1019,7 @@ static int __init debug_vm_pgtable(void)
>  	pgd_clear_tests(mm, pgdp);
>  
>  	pte_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
> -	pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot);
> +	pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot, saved_ptep);
>  	pud_advanced_tests(mm, vma, pudp, pud_aligned, vaddr, prot);
>  	hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
>  
> 

There is a checkpatch.pl warning here.

WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#7: 
Architectures like ppc64 use deposited page table while updating the huge pte

total: 0 errors, 1 warnings, 40 lines checked

^ permalink raw reply

* Re: [PATCH v3 13/13] mm/debug_vm_pgtable: populate a pte entry before fetching it
From: Anshuman Khandual @ 2020-09-01  3:25 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm
  Cc: linux-arch, linux-s390, Christophe Leroy, x86, Mike Rapoport,
	Qian Cai, Gerald Schaefer, Vineet Gupta, linux-snps-arc,
	linuxppc-dev, linux-arm-kernel
In-Reply-To: <20200827080438.315345-14-aneesh.kumar@linux.ibm.com>



On 08/27/2020 01:34 PM, Aneesh Kumar K.V wrote:
> pte_clear_tests operate on an existing pte entry. Make sure that is not a none
> pte entry.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  mm/debug_vm_pgtable.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
> index 21329c7d672f..8527ebb75f2c 100644
> --- a/mm/debug_vm_pgtable.c
> +++ b/mm/debug_vm_pgtable.c
> @@ -546,7 +546,7 @@ static void __init pgd_populate_tests(struct mm_struct *mm, pgd_t *pgdp,
>  static void __init pte_clear_tests(struct mm_struct *mm, pte_t *ptep,
>  				   unsigned long vaddr)
>  {
> -	pte_t pte = ptep_get(ptep);
> +	pte_t pte =  ptep_get_and_clear(mm, vaddr, ptep);

Seems like ptep_get_and_clear() here just clears the entry in preparation
for a following set_pte_at() which otherwise would have been a problem on
ppc64 as you had pointed out earlier i.e set_pte_at() should not update an
existing valid entry. So the commit message here is bit misleading.

>  
>  	pr_debug("Validating PTE clear\n");
>  	pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
> @@ -944,7 +944,7 @@ static int __init debug_vm_pgtable(void)
>  	p4d_t *p4dp, *saved_p4dp;
>  	pud_t *pudp, *saved_pudp;
>  	pmd_t *pmdp, *saved_pmdp, pmd;
> -	pte_t *ptep;
> +	pte_t *ptep, pte;
>  	pgtable_t saved_ptep;
>  	pgprot_t prot, protnone;
>  	phys_addr_t paddr;
> @@ -1049,6 +1049,8 @@ static int __init debug_vm_pgtable(void)
>  	 */
>  
>  	ptep = pte_alloc_map_lock(mm, pmdp, vaddr, &ptl);
> +	pte = pfn_pte(pte_aligned, prot);
> +	set_pte_at(mm, vaddr, ptep, pte);

Not here, creating and populating an entry must be done in respective
test functions itself. Besides, this seems bit redundant as well. The
test pte_clear_tests() with the above change added, already

- Clears the PTEP entry with ptep_get_and_clear()
- Creates and populates the PTEP with set_pte_at()
- Clears with pte_clear()
- Checks for pte_none()

If the objective is to clear the PTEP entry before calling set_pte_at(),
then only the first chunk is required and it should also be merged with
a previous patch. 

[PATCH v3 07/13] mm/debug_vm_pgtable/set_pte/pmd/pud: Don't use set_*_at to update an existing pte entry 


>  	pte_clear_tests(mm, ptep, vaddr);
>  	pte_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
>  	pte_unmap_unlock(ptep, ptl);
> 

There is a checkpatch.pl warning here.

WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#7: 
pte_clear_tests operate on an existing pte entry. Make sure that is not a none

total: 0 errors, 1 warnings, 24 lines checked

There is also a build failure on x86 reported from kernel test robot.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox