LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/7] Avoid cache trashing on clearing huge/gigantic page
From: Kirill A. Shutemov @ 2012-08-16 15:15 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-mips, linux-sh, Jan Beulich, H. Peter Anvin, sparclinux,
	Andrea Arcangeli, Andi Kleen, Robert Richter, x86, Hugh Dickins,
	Ingo Molnar, Mel Gorman, Alex Shi, Thomas Gleixner,
	KAMEZAWA Hiroyuki, Tim Chen, linux-kernel, Andy Lutomirski,
	Johannes Weiner, Andrew Morton, linuxppc-dev, Kirill A. Shutemov

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Clearing a 2MB huge page will typically blow away several levels of CPU
caches.  To avoid this only cache clear the 4K area around the fault
address and use a cache avoiding clears for the rest of the 2MB area.

This patchset implements cache avoiding version of clear_page only for
x86. If an architecture wants to provide cache avoiding version of
clear_page it should to define ARCH_HAS_USER_NOCACHE to 1 and implement
clear_page_nocache() and clear_user_highpage_nocache().

v3:
  - Rebased to current Linus' tree. kmap_atomic() build issue is fixed;
  - Pass fault address to clear_huge_page(). v2 had problem with clearing
    for sizes other than HPAGE_SIZE
  - x86: fix 32bit variant. Fallback version of clear_page_nocache() has
    been added for non-SSE2 systems;
  - x86: clear_page_nocache() moved to clear_page_{32,64}.S;
  - x86: use pushq_cfi/popq_cfi instead of push/pop;
v2:
  - No code change. Only commit messages are updated.
  - RFC mark is dropped.

Andi Kleen (5):
  THP: Use real address for NUMA policy
  THP: Pass fault address to __do_huge_pmd_anonymous_page()
  x86: Add clear_page_nocache
  mm: make clear_huge_page cache clear only around the fault address
  x86: switch the 64bit uncached page clear to SSE/AVX v2

Kirill A. Shutemov (2):
  hugetlb: pass fault address to hugetlb_no_page()
  mm: pass fault address to clear_huge_page()

 arch/x86/include/asm/page.h      |    2 +
 arch/x86/include/asm/string_32.h |    5 ++
 arch/x86/include/asm/string_64.h |    5 ++
 arch/x86/lib/Makefile            |    3 +-
 arch/x86/lib/clear_page_32.S     |   72 +++++++++++++++++++++++++++++++++++
 arch/x86/lib/clear_page_64.S     |   78 ++++++++++++++++++++++++++++++++++++++
 arch/x86/mm/fault.c              |    7 +++
 include/linux/mm.h               |    2 +-
 mm/huge_memory.c                 |   17 ++++----
 mm/hugetlb.c                     |   39 ++++++++++---------
 mm/memory.c                      |   37 +++++++++++++++---
 11 files changed, 232 insertions(+), 35 deletions(-)
 create mode 100644 arch/x86/lib/clear_page_32.S

-- 
1.7.7.6

^ permalink raw reply

* [PATCH v3 4/7] mm: pass fault address to clear_huge_page()
From: Kirill A. Shutemov @ 2012-08-16 15:15 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-mips, linux-sh, Jan Beulich, H. Peter Anvin, sparclinux,
	Andrea Arcangeli, Andi Kleen, Robert Richter, x86, Hugh Dickins,
	Ingo Molnar, Mel Gorman, Alex Shi, Thomas Gleixner,
	KAMEZAWA Hiroyuki, Tim Chen, linux-kernel, Andy Lutomirski,
	Johannes Weiner, Andrew Morton, linuxppc-dev, Kirill A. Shutemov
In-Reply-To: <1345130154-9602-1-git-send-email-kirill.shutemov@linux.intel.com>

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/mm.h |    2 +-
 mm/huge_memory.c   |    2 +-
 mm/hugetlb.c       |    3 ++-
 mm/memory.c        |    7 ++++---
 4 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 311be90..2858723 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1638,7 +1638,7 @@ extern void dump_page(struct page *page);
 
 #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS)
 extern void clear_huge_page(struct page *page,
-			    unsigned long addr,
+			    unsigned long haddr, unsigned long fault_address,
 			    unsigned int pages_per_huge_page);
 extern void copy_user_huge_page(struct page *dst, struct page *src,
 				unsigned long addr, struct vm_area_struct *vma,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 6f0825b611..070bf89 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -644,7 +644,7 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
 	if (unlikely(!pgtable))
 		return VM_FAULT_OOM;
 
-	clear_huge_page(page, haddr, HPAGE_PMD_NR);
+	clear_huge_page(page, haddr, address, HPAGE_PMD_NR);
 	__SetPageUptodate(page);
 
 	spin_lock(&mm->page_table_lock);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3c86d3d..5182192 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2718,7 +2718,8 @@ retry:
 				ret = VM_FAULT_SIGBUS;
 			goto out;
 		}
-		clear_huge_page(page, haddr, pages_per_huge_page(h));
+		clear_huge_page(page, haddr, fault_address,
+				pages_per_huge_page(h));
 		__SetPageUptodate(page);
 
 		if (vma->vm_flags & VM_MAYSHARE) {
diff --git a/mm/memory.c b/mm/memory.c
index 5736170..dfc179b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3984,19 +3984,20 @@ static void clear_gigantic_page(struct page *page,
 	}
 }
 void clear_huge_page(struct page *page,
-		     unsigned long addr, unsigned int pages_per_huge_page)
+		     unsigned long haddr, unsigned long fault_address,
+		     unsigned int pages_per_huge_page)
 {
 	int i;
 
 	if (unlikely(pages_per_huge_page > MAX_ORDER_NR_PAGES)) {
-		clear_gigantic_page(page, addr, pages_per_huge_page);
+		clear_gigantic_page(page, haddr, pages_per_huge_page);
 		return;
 	}
 
 	might_sleep();
 	for (i = 0; i < pages_per_huge_page; i++) {
 		cond_resched();
-		clear_user_highpage(page + i, addr + i * PAGE_SIZE);
+		clear_user_highpage(page + i, haddr + i * PAGE_SIZE);
 	}
 }
 
-- 
1.7.7.6

^ permalink raw reply related

* [PATCH v3 7/7] x86: switch the 64bit uncached page clear to SSE/AVX v2
From: Kirill A. Shutemov @ 2012-08-16 15:15 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-mips, linux-sh, Jan Beulich, H. Peter Anvin, sparclinux,
	Andrea Arcangeli, Andi Kleen, Robert Richter, x86, Hugh Dickins,
	Ingo Molnar, Mel Gorman, Alex Shi, Thomas Gleixner,
	KAMEZAWA Hiroyuki, Tim Chen, linux-kernel, Andy Lutomirski,
	Johannes Weiner, Andrew Morton, linuxppc-dev, Kirill A. Shutemov
In-Reply-To: <1345130154-9602-1-git-send-email-kirill.shutemov@linux.intel.com>

From: Andi Kleen <ak@linux.intel.com>

With multiple threads vector stores are more efficient, so use them.
This will cause the page clear to run non preemptable and add some
overhead. However on 32bit it was already non preempable (due to
kmap_atomic) and there is an preemption opportunity every 4K unit.

On a NPB (Nasa Parallel Benchmark) 128GB run on a Westmere this improves
the performance regression of enabling transparent huge pages
by ~2% (2.81% to 0.81%), near the runtime variability now.
On a system with AVX support more is expected.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
[kirill.shutemov@linux.intel.com: Properly save/restore arguments]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/lib/clear_page_64.S |   79 ++++++++++++++++++++++++++++++++++--------
 1 files changed, 64 insertions(+), 15 deletions(-)

diff --git a/arch/x86/lib/clear_page_64.S b/arch/x86/lib/clear_page_64.S
index 9d2f3c2..b302cff 100644
--- a/arch/x86/lib/clear_page_64.S
+++ b/arch/x86/lib/clear_page_64.S
@@ -73,30 +73,79 @@ ENDPROC(clear_page)
 			     .Lclear_page_end-clear_page,3b-2b
 	.previous
 
+#define SSE_UNROLL 128
+
 /*
  * Zero a page avoiding the caches
  * rdi	page
  */
 ENTRY(clear_page_nocache)
 	CFI_STARTPROC
-	xorl   %eax,%eax
-	movl   $4096/64,%ecx
+	pushq_cfi %rdi
+	call   kernel_fpu_begin
+	popq_cfi  %rdi
+	sub    $16,%rsp
+	CFI_ADJUST_CFA_OFFSET 16
+	movdqu %xmm0,(%rsp)
+	xorpd  %xmm0,%xmm0
+	movl   $4096/SSE_UNROLL,%ecx
 	.p2align 4
 .Lloop_nocache:
 	decl	%ecx
-#define PUT(x) movnti %rax,x*8(%rdi)
-	movnti %rax,(%rdi)
-	PUT(1)
-	PUT(2)
-	PUT(3)
-	PUT(4)
-	PUT(5)
-	PUT(6)
-	PUT(7)
-#undef PUT
-	leaq	64(%rdi),%rdi
+	.set x,0
+	.rept SSE_UNROLL/16
+	movntdq %xmm0,x(%rdi)
+	.set x,x+16
+	.endr
+	leaq	SSE_UNROLL(%rdi),%rdi
 	jnz	.Lloop_nocache
-	nop
-	ret
+	movdqu (%rsp),%xmm0
+	addq   $16,%rsp
+	CFI_ADJUST_CFA_OFFSET -16
+	jmp   kernel_fpu_end
 	CFI_ENDPROC
 ENDPROC(clear_page_nocache)
+
+#ifdef CONFIG_AS_AVX
+
+	.section .altinstr_replacement,"ax"
+1:	.byte 0xeb					/* jmp <disp8> */
+	.byte (clear_page_nocache_avx - clear_page_nocache) - (2f - 1b)
+	/* offset */
+2:
+	.previous
+	.section .altinstructions,"a"
+	altinstruction_entry clear_page_nocache,1b,X86_FEATURE_AVX,\
+	                     16, 2b-1b
+	.previous
+
+#define AVX_UNROLL 256 /* TUNE ME */
+
+ENTRY(clear_page_nocache_avx)
+	CFI_STARTPROC
+	pushq_cfi %rdi
+	call   kernel_fpu_begin
+	popq_cfi  %rdi
+	sub    $32,%rsp
+	CFI_ADJUST_CFA_OFFSET 32
+	vmovdqu %ymm0,(%rsp)
+	vxorpd  %ymm0,%ymm0,%ymm0
+	movl   $4096/AVX_UNROLL,%ecx
+	.p2align 4
+.Lloop_avx:
+	decl	%ecx
+	.set x,0
+	.rept AVX_UNROLL/32
+	vmovntdq %ymm0,x(%rdi)
+	.set x,x+32
+	.endr
+	leaq	AVX_UNROLL(%rdi),%rdi
+	jnz	.Lloop_avx
+	vmovdqu (%rsp),%ymm0
+	addq   $32,%rsp
+	CFI_ADJUST_CFA_OFFSET -32
+	jmp   kernel_fpu_end
+	CFI_ENDPROC
+ENDPROC(clear_page_nocache_avx)
+
+#endif
-- 
1.7.7.6

^ permalink raw reply related

* [PATCH v3 1/7] THP: Use real address for NUMA policy
From: Kirill A. Shutemov @ 2012-08-16 15:15 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-mips, linux-sh, Jan Beulich, H. Peter Anvin, sparclinux,
	Andrea Arcangeli, Andi Kleen, Robert Richter, x86, Hugh Dickins,
	Ingo Molnar, Mel Gorman, Alex Shi, Thomas Gleixner,
	KAMEZAWA Hiroyuki, Tim Chen, linux-kernel, Andy Lutomirski,
	Johannes Weiner, Andrew Morton, linuxppc-dev, Kirill A. Shutemov
In-Reply-To: <1345130154-9602-1-git-send-email-kirill.shutemov@linux.intel.com>

From: Andi Kleen <ak@linux.intel.com>

Use the fault address, not the rounded down hpage address for NUMA
policy purposes. In some circumstances this can give more exact
NUMA policy.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/huge_memory.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 57c4b93..70737ec 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -681,11 +681,11 @@ static inline gfp_t alloc_hugepage_gfpmask(int defrag, gfp_t extra_gfp)
 
 static inline struct page *alloc_hugepage_vma(int defrag,
 					      struct vm_area_struct *vma,
-					      unsigned long haddr, int nd,
+					      unsigned long address, int nd,
 					      gfp_t extra_gfp)
 {
 	return alloc_pages_vma(alloc_hugepage_gfpmask(defrag, extra_gfp),
-			       HPAGE_PMD_ORDER, vma, haddr, nd);
+			       HPAGE_PMD_ORDER, vma, address, nd);
 }
 
 #ifndef CONFIG_NUMA
@@ -710,7 +710,7 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		if (unlikely(khugepaged_enter(vma)))
 			return VM_FAULT_OOM;
 		page = alloc_hugepage_vma(transparent_hugepage_defrag(vma),
-					  vma, haddr, numa_node_id(), 0);
+					  vma, address, numa_node_id(), 0);
 		if (unlikely(!page)) {
 			count_vm_event(THP_FAULT_FALLBACK);
 			goto out;
@@ -944,7 +944,7 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	if (transparent_hugepage_enabled(vma) &&
 	    !transparent_hugepage_debug_cow())
 		new_page = alloc_hugepage_vma(transparent_hugepage_defrag(vma),
-					      vma, haddr, numa_node_id(), 0);
+					      vma, address, numa_node_id(), 0);
 	else
 		new_page = NULL;
 
-- 
1.7.7.6

^ permalink raw reply related

* Re: [PATCH] Powerpc 8xx CPM_UART delay in receive
From: Alan Cox @ 2012-08-16 15:21 UTC (permalink / raw)
  To: leroy christophe
  Cc: Marcelo Tosatti, linux-kernel, linux-serial, linuxppc-dev,
	Alan Cox
In-Reply-To: <502D054B.3010606@c-s.fr>

> MAX_IDL: Maximum idle characters. When a character is received, the 
> receiver begins counting idle characters. If MAX_IDL idle characters
> are received before the next data character, an idle timeout occurs
> and the buffer is closed,
> generating a maskable interrupt request to the core to receive the
> data from the buffer. Thus, MAX_IDL offers a way to demarcate frames.
> To disable the feature, clear MAX_IDL. The bit length of an idle
> character is calculated as follows: 1 + data length (5–9) + 1 (if
> parity is used) 
> + number of stop bits (1–2). For 8 data bits, no parity, and 1 stop
> bit, the character length is 10 bits


So if you have slightly bursty high speed data as its quite typical
before your change you would get one interrupt per buffer of 32 bytes,
with it you'll get a lot more interrupts.

You have two available hints about the way to set this - one of them is
the baud rate (low baud rates mean the fifo isn't a big win and the
latency is high), the other is the low_latency flag if the driver
supports the low latency feature (and arguably you can still use a
request for it as a hint even if you refuse the actual feature).

So I think a reasonable approach would be set the idle timeout down for
low baud rates or if low_latency is requested.

> generated if there is at least one word in the FIFO and for a time 
> equivalent to the transmission of four characters

Which is a bit more reasonable than one, although problematic at low
speed (hence the fifo on/off).

^ permalink raw reply

* Re: [PATCH] Powerpc 8xx CPM_UART delay in receive
From: leroy christophe @ 2012-08-16 14:35 UTC (permalink / raw)
  To: Alan Cox
  Cc: Marcelo Tosatti, linux-kernel, linux-serial, linuxppc-dev,
	Alan Cox
In-Reply-To: <20120816152918.5ed2649f@pyramind.ukuu.org.uk>

Le 16/08/2012 16:29, Alan Cox a écrit :
>> The PowerPC CPM is working differently. It doesn't use a fifo but
>> buffers. Buffers are handed to the microprocessor only when they are
>> full or after a timeout period which is adjustable. In the driver, the
> Which is different how - remembering we empty the FIFO on an IRQ
>
>> buffers are configured with a size of 32 bytes. And the timeout is set
>> to the size of the buffer. That is this timeout that I'm reducing to 1
>> byte in my proposed patch. I can't see what it would break for high
>> speed I/O.
> How can a timeout be measured in "bytes". Can we have a bit more clarity
> on how the hardware works and take it from there ?
>
> Alan
>
The reference manual of MPC885 says the following about the MAX_IDL 
parameter:

MAX_IDL: Maximum idle characters. When a character is received, the 
receiver begins counting idle characters. If MAX_IDL idle characters are 
received before the next data character, an idle timeout occurs and the 
buffer is closed,
generating a maskable interrupt request to the core to receive the data 
from the buffer. Thus, MAX_IDL offers a way to demarcate frames. To 
disable the feature, clear MAX_IDL. The bit length of an idle character 
is calculated as follows: 1 + data length (5–9) + 1 (if parity is used) 
+ number of stop bits (1–2). For 8 data bits, no parity, and 1 stop bit, 
the character length is 10 bits

If the UART is receiving data and gets an idle character (all ones), the 
channel begins counting consecutive idle characters received. If MAX_IDL 
is reached, the buffer is closed and an RX interrupt is generated if not 
masked. If no buffer is open, this event does not generate an interrupt 
or any status information. The internal idle counter (IDLC) is reset 
every time a character is received. To disable the idle sequence 
function, clear MAX_IDL


The datasheet of the 16550 UART says:

Besides, for FIFO mode operation a time out mechanism is implemented. 
Independently of the trigger level of the FIFO, an interrupt will be 
generated if there is at least one word in the FIFO and for a time 
equivalent to the transmission of four characters
- no new character has been received and
- the microprocessor has not read the RHR
To compute the time out, the current total number of bits (start, data, 
parity and stop(s)) is used, together with the current baud rate (i.e., 
it depends on the contents of the LCR, DLL, DLM and PSD registers).


Christophe

^ permalink raw reply

* Re: [PATCH] Powerpc 8xx CPM_UART delay in receive
From: Alan Cox @ 2012-08-16 14:29 UTC (permalink / raw)
  To: leroy christophe
  Cc: Marcelo Tosatti, linux-kernel, linux-serial, linuxppc-dev,
	Alan Cox
In-Reply-To: <502CF2A0.8080109@c-s.fr>

> The PowerPC CPM is working differently. It doesn't use a fifo but 
> buffers. Buffers are handed to the microprocessor only when they are 
> full or after a timeout period which is adjustable. In the driver, the 

Which is different how - remembering we empty the FIFO on an IRQ

> buffers are configured with a size of 32 bytes. And the timeout is set 
> to the size of the buffer. That is this timeout that I'm reducing to 1 
> byte in my proposed patch. I can't see what it would break for high 
> speed I/O.

How can a timeout be measured in "bytes". Can we have a bit more clarity
on how the hardware works and take it from there ?

Alan

^ permalink raw reply

* Re: powerpc/perf: hw breakpoints return ENOSPC
From: Peter Zijlstra @ 2012-08-16 14:15 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Michael Neuling, Frederic Weisbecker, linux-kernel, linuxppc-dev,
	K Prasad, Ingo Molnar
In-Reply-To: <1345125747.20062.12.camel@concordia>

On Fri, 2012-08-17 at 00:02 +1000, Michael Ellerman wrote:
> You do want to guarantee that the task will always be subject to the
> breakpoint, even if it moves cpus. So is there any way to guarantee that
> other than reserving a breakpoint slot on every cpu ahead of time?=20

That's not how regular perf works.. regular perf can overload hw
resources at will and stuff is strictly per-cpu.

So the regular perf record has perf_event_attr::inherit enabled by
default, this will result in it creating a per-task-per-cpu event for
each cpu and this will succeed because there's no strict reservation to
avoid/detect starvation against perf_event_attr::pinned events.

For regular (!pinned) events, we'll RR the created events on the
available hardware resources.

HWBP does things completely different and reserves a slot over all CPUs
for everything, thus stuff completely falls apart.

^ permalink raw reply

* Re: powerpc/perf: hw breakpoints return ENOSPC
From: Michael Ellerman @ 2012-08-16 14:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Michael Neuling, Frederic Weisbecker, linux-kernel, linuxppc-dev,
	K Prasad, Ingo Molnar
In-Reply-To: <1345117498.29668.23.camel@twins>

On Thu, 2012-08-16 at 13:44 +0200, Peter Zijlstra wrote:
> On Thu, 2012-08-16 at 21:17 +1000, Michael Neuling wrote:
> > Peter,
> > 
> > > > On this second syscall, fetch_bp_busy_slots() sets slots.pinned to be 1,
> > > > despite there being no breakpoint on this CPU.  This is because the call
> > > > the task_bp_pinned, checks all CPUs, rather than just the current CPU.
> > > > POWER7 only has one hardware breakpoint per CPU (ie. HBP_NUM=1), so we
> > > > return ENOSPC.
> > > 
> > > I think this comes from the ptrace legacy, we register a breakpoint on
> > > all cpus because when we migrate a task it cannot fail to migrate the
> > > breakpoint.
> > > 
> > > Its one of the things I hate most about the hwbp stuff as it relates to
> > > perf.
> > > 
> > > Frederic knows more...
> > 
> > Maybe I should wait for Frederic to respond but I'm not sure I
> > understand what you're saying.
> > 
> > I can see how using ptrace hw breakpoints and perf hw breakpoints at the
> > same time could be a problem, but I'm not sure how this would stop it.
> 
> ptrace uses perf for hwbp support so we're stuck with all kinds of
> stupid ptrace constraints.. or somesuch.
> 
> > Are you saying that we need to keep at least 1 slot free at all times,
> > so that we can use it for ptrace?
> 
> No, I'm saying perf-hwbp is weird because of ptrace, maybe the ptrace
> weirdness shouldn't live in perf-hwpb but in the ptrace-perf glue
> however..

But how else would it work, even if ptrace wasn't in the picture?

You do want to guarantee that the task will always be subject to the
breakpoint, even if it moves cpus. So is there any way to guarantee that
other than reserving a breakpoint slot on every cpu ahead of time?

Or can a hwbp event go into error state if it can't be installed on the
new cpu, like a pinned event does? I can't see any code that does that.

cheers

^ permalink raw reply

* Re: [PATCH] Powerpc 8xx CPM_UART delay in receive
From: leroy christophe @ 2012-08-16 13:16 UTC (permalink / raw)
  To: Alan Cox
  Cc: Marcelo Tosatti, linux-kernel, linux-serial, linuxppc-dev,
	Alan Cox
In-Reply-To: <20120814155227.018988da@pyramind.ukuu.org.uk>


Le 14/08/2012 16:52, Alan Cox a écrit :
> On Tue, 14 Aug 2012 16:26:28 +0200
> Christophe Leroy <christophe.leroy@c-s.fr> wrote:
>
>> Hello,
>>
>> I'm not sure who to address this Patch to either
>>
>> It fixes a delay issue with CPM UART driver on Powerpc MPC8xx.
>> The problem is that with the actual code, the driver waits 32 IDLE patterns before returning the received data to the upper level. It means for instance about 1 second at 300 bauds.
>> This fix limits to one byte the waiting period.
> Take a look how the 8250 does it - I think you want to set the value
> based upon the data rate. Your patch will break it for everyone doing
> high seed I/O.
>
> Alan
>
I'm not sure I understand what you mean. As far as I can see 8250/16550 
is working a bit different, as it is based on a fifo and triggers an 
interrupt as soon as a given number of bytes is received. I also see 
that in case this amount is not reached, there is a receive-timeout 
which goes on after no byte is received for a duration of more than 4 bytes.

The PowerPC CPM is working differently. It doesn't use a fifo but 
buffers. Buffers are handed to the microprocessor only when they are 
full or after a timeout period which is adjustable. In the driver, the 
buffers are configured with a size of 32 bytes. And the timeout is set 
to the size of the buffer. That is this timeout that I'm reducing to 1 
byte in my proposed patch. I can't see what it would break for high 
speed I/O.

Christophe

^ permalink raw reply

* Re: powerpc/perf: hw breakpoints return ENOSPC
From: Peter Zijlstra @ 2012-08-16 11:44 UTC (permalink / raw)
  To: Michael Neuling
  Cc: Ingo Molnar, Frederic Weisbecker, K Prasad, linux-kernel,
	linuxppc-dev
In-Reply-To: <20344.1345115849@neuling.org>

On Thu, 2012-08-16 at 21:17 +1000, Michael Neuling wrote:
> Peter,
>=20
> > > On this second syscall, fetch_bp_busy_slots() sets slots.pinned to be=
 1,
> > > despite there being no breakpoint on this CPU.  This is because the c=
all
> > > the task_bp_pinned, checks all CPUs, rather than just the current CPU=
.
> > > POWER7 only has one hardware breakpoint per CPU (ie. HBP_NUM=3D1), so=
 we
> > > return ENOSPC.
> >=20
> > I think this comes from the ptrace legacy, we register a breakpoint on
> > all cpus because when we migrate a task it cannot fail to migrate the
> > breakpoint.
> >=20
> > Its one of the things I hate most about the hwbp stuff as it relates to
> > perf.
> >=20
> > Frederic knows more...
>=20
> Maybe I should wait for Frederic to respond but I'm not sure I
> understand what you're saying.
>=20
> I can see how using ptrace hw breakpoints and perf hw breakpoints at the
> same time could be a problem, but I'm not sure how this would stop it.

ptrace uses perf for hwbp support so we're stuck with all kinds of
stupid ptrace constraints.. or somesuch.

> Are you saying that we need to keep at least 1 slot free at all times,
> so that we can use it for ptrace?

No, I'm saying perf-hwbp is weird because of ptrace, maybe the ptrace
weirdness shouldn't live in perf-hwpb but in the ptrace-perf glue
however..

> Is "perf record -e mem:0x10000000 true" ever going to be able to work on
> POWER7 with only one hw breakpoint resource per CPU? =20

I think it should work... but I'm fairly sure it currently doesn't
because of how things are done. 'perf record -ie mem:0x100... true'
might just work.

I always forget all the ptrace details but I am forever annoyed at the
mess that is perf-hwbp.. Frederic is there really nothing we can do
about this?

The fact that ptrace hwbp semantics are different per architecture
doesn't help of course.

^ permalink raw reply

* Re: powerpc/perf: hw breakpoints return ENOSPC
From: Michael Neuling @ 2012-08-16 11:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Frederic Weisbecker, K Prasad, linux-kernel,
	linuxppc-dev
In-Reply-To: <1345102812.31459.114.camel@twins>

Peter,

> > On this second syscall, fetch_bp_busy_slots() sets slots.pinned to be 1,
> > despite there being no breakpoint on this CPU.  This is because the call
> > the task_bp_pinned, checks all CPUs, rather than just the current CPU.
> > POWER7 only has one hardware breakpoint per CPU (ie. HBP_NUM=1), so we
> > return ENOSPC.
> 
> I think this comes from the ptrace legacy, we register a breakpoint on
> all cpus because when we migrate a task it cannot fail to migrate the
> breakpoint.
> 
> Its one of the things I hate most about the hwbp stuff as it relates to
> perf.
> 
> Frederic knows more...

Maybe I should wait for Frederic to respond but I'm not sure I
understand what you're saying.

I can see how using ptrace hw breakpoints and perf hw breakpoints at the
same time could be a problem, but I'm not sure how this would stop it.

Are you saying that we need to keep at least 1 slot free at all times,
so that we can use it for ptrace?

Is "perf record -e mem:0x10000000 true" ever going to be able to work on
POWER7 with only one hw breakpoint resource per CPU?  

Thanks,
Mikey

^ permalink raw reply

* Re: powerpc/perf: hw breakpoints return ENOSPC
From: Peter Zijlstra @ 2012-08-16  7:40 UTC (permalink / raw)
  To: Michael Neuling
  Cc: Ingo Molnar, Frederic Weisbecker, K Prasad, linux-kernel,
	linuxppc-dev
In-Reply-To: <28857.1345091034@neuling.org>

On Thu, 2012-08-16 at 14:23 +1000, Michael Neuling wrote:
>=20
> On this second syscall, fetch_bp_busy_slots() sets slots.pinned to be 1,
> despite there being no breakpoint on this CPU.  This is because the call
> the task_bp_pinned, checks all CPUs, rather than just the current CPU.
> POWER7 only has one hardware breakpoint per CPU (ie. HBP_NUM=3D1), so we
> return ENOSPC.

I think this comes from the ptrace legacy, we register a breakpoint on
all cpus because when we migrate a task it cannot fail to migrate the
breakpoint.

Its one of the things I hate most about the hwbp stuff as it relates to
perf.

Frederic knows more...

^ permalink raw reply

* Re: [PATCH v3 2/2] powerpc: Uprobes port to powerpc
From: Ananth N Mavinakayanahalli @ 2012-08-16  5:00 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Srikar Dronamraju, peterz, Oleg Nesterov, lkml, Paul Mackerras,
	Anton Blanchard, Ingo Molnar, linuxppc-dev
In-Reply-To: <1345066913.11751.4.camel@pasglop>

On Thu, Aug 16, 2012 at 07:41:53AM +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2012-08-15 at 18:59 +0200, Oleg Nesterov wrote:
> > On 07/26, Ananth N Mavinakayanahalli wrote:
> > >
> > > From: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
> > >
> > > This is the port of uprobes to powerpc. Usage is similar to x86.
> > 
> > I am just curious why this series was ignored by powerpc maintainers...
> 
> Because it arrived too late for the previous merge window considering my
> limited bandwidth for reviewing things and that nobody else seems to
> have reviewed it :-)
> 
> It's still on track for the next one, and I'm hoping to dedicate most of
> next week going through patches & doing a powerpc -next.

Thanks Ben!

> > Of course I can not review this code, I know nothing about powerpc,
> > but the patches look simple/straightforward.
> > 
> > Paul, Benjamin?
> > 
> > Just one question... Shouldn't arch_uprobe_pre_xol() forbid to probe
> > UPROBE_SWBP_INSN (at least) ?
> > 
> > (I assume that emulate_step() can't handle this case but of course I
> >  do not understand arch/powerpc/lib/sstep.c)
> > 
> > Note that uprobe_pre_sstep_notifier() sets utask->state = UTASK_BP_HIT
> > without any checks. This doesn't look right if it was UTASK_SSTEP...
> > 
> > But again, I do not know what powepc will actually do if we try to
> > single-step over UPROBE_SWBP_INSN.
> 
> Ananth ?

set_swbp() will return -EEXIST to install_breakpoint if we are trying to
put a breakpoint on UPROBE_SWBP_INSN. So, the arch agnostic code itself
takes care of this case... or am I missing something?

However, I see that we need a powerpc specific is_swbp_insn()
implementation since we will have to take care of all the trap variants.

I will need to update the patches based on changes being made by Oleg
and Sebastien for the single-step issues. Will incorporate the powerpc
specific is_swbp_insn() change along with the changes required for the
single-step part and send out the next version.

Ananth

^ permalink raw reply

* powerpc/perf: hw breakpoints return ENOSPC
From: Michael Neuling @ 2012-08-16  4:23 UTC (permalink / raw)
  To: K Prasad
  Cc: Ingo Molnar, Frederic Weisbecker, Peter Zijlstra, linux-kernel,
	linuxppc-dev

Hi,

I've been trying to get hardware breakpoints with perf to work on POWER7
but I'm getting the following:

  % perf record -e mem:0x10000000 true

    Error: sys_perf_event_open() syscall returned with 28 (No space left on device).  /bin/dmesg may provide additional information.

    Fatal: No CONFIG_PERF_EVENTS=y kernel support configured?

  true: Terminated

(FWIW adding -a and it works fine)

Debugging it seems that __reserve_bp_slot() is returning ENOSPC because
it thinks there are no free breakpoint slots on this CPU.

I have a 2 CPUs, so perf userspace is doing two perf_event_open syscalls
to add a counter to each CPU [1].  The first syscall succeeds but the
second is failing.

On this second syscall, fetch_bp_busy_slots() sets slots.pinned to be 1,
despite there being no breakpoint on this CPU.  This is because the call
the task_bp_pinned, checks all CPUs, rather than just the current CPU.
POWER7 only has one hardware breakpoint per CPU (ie. HBP_NUM=1), so we
return ENOSPC.

The following patch fixes this by checking the associated CPU for each
breakpoint in task_bp_pinned.  I'm not familiar with this code, so it's
provided as a reference to the above issue.

Mikey

1. not sure why it doesn't just do one syscall and specify all CPUs, but
that's another issue.  Using two syscalls should work.

diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index bb38c4d..e092daa 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -111,14 +111,16 @@ static unsigned int max_task_bp_pinned(int cpu, enum bp_type_idx type)
  * Count the number of breakpoints of the same type and same task.
  * The given event must be not on the list.
  */
-static int task_bp_pinned(struct perf_event *bp, enum bp_type_idx type)
+static int task_bp_pinned(int cpu, struct perf_event *bp, enum bp_type_idx type)
 {
 	struct task_struct *tsk = bp->hw.bp_target;
 	struct perf_event *iter;
 	int count = 0;
 
 	list_for_each_entry(iter, &bp_task_head, hw.bp_list) {
-		if (iter->hw.bp_target == tsk && find_slot_idx(iter) == type)
+		if (iter->hw.bp_target == tsk &&
+		    find_slot_idx(iter) == type &&
+		    cpu == iter->cpu)
 			count += hw_breakpoint_weight(iter);
 	}
 
@@ -141,7 +143,7 @@ fetch_bp_busy_slots(struct bp_busy_slots *slots, struct perf_event *bp,
 		if (!tsk)
 			slots->pinned += max_task_bp_pinned(cpu, type);
 		else
-			slots->pinned += task_bp_pinned(bp, type);
+			slots->pinned += task_bp_pinned(cpu, bp, type);
 		slots->flexible = per_cpu(nr_bp_flexible[type], cpu);
 
 		return;
@@ -154,7 +156,7 @@ fetch_bp_busy_slots(struct bp_busy_slots *slots, struct perf_event *bp,
 		if (!tsk)
 			nr += max_task_bp_pinned(cpu, type);
 		else
-			nr += task_bp_pinned(bp, type);
+			nr += task_bp_pinned(cpu, bp, type);
 
 		if (nr > slots->pinned)
 			slots->pinned = nr;
@@ -188,7 +190,7 @@ static void toggle_bp_task_slot(struct perf_event *bp, int cpu, bool enable,
 	int old_idx = 0;
 	int idx = 0;
 
-	old_count = task_bp_pinned(bp, type);
+	old_count = task_bp_pinned(cpu, bp, type);
 	old_idx = old_count - 1;
 	idx = old_idx + weight;
 

^ permalink raw reply related

* [PATCH] powerpc/mpc85xx:Add new ext fields to Integrated FLash Controller
From: Prabhakar Kushwaha @ 2012-08-16  3:58 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Prabhakar Kushwaha, York Sun

Freescale's Integrated Flash controller(IFC) v1.1.0 supports 40 bit
address bus width. 
In case more than 32 bit address is used, the EXT registers should be set.

Add support of ext registers.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: York Sun <yorksun@freescale.com>
Signed-off-by: Prabhakar Kushwaha <prabhakar@freescale.com>
---
Base upon git://git.kernel.org/pub/scm/linux/kernel/git/galak/powerpc.git
Branch next

 arch/powerpc/include/asm/fsl_ifc.h |   14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/fsl_ifc.h b/arch/powerpc/include/asm/fsl_ifc.h
index b955012..b8a4b9b 100644
--- a/arch/powerpc/include/asm/fsl_ifc.h
+++ b/arch/powerpc/include/asm/fsl_ifc.h
@@ -768,22 +768,24 @@ struct fsl_ifc_gpcm {
  */
 struct fsl_ifc_regs {
 	__be32 ifc_rev;
-	u32 res1[0x3];
+	u32 res1[0x2];
 	struct {
+		__be32 cspr_ext;
 		__be32 cspr;
-		u32 res2[0x2];
+		u32 res2;
 	} cspr_cs[FSL_IFC_BANK_COUNT];
-	u32 res3[0x18];
+	u32 res3[0x19];
 	struct {
 		__be32 amask;
 		u32 res4[0x2];
 	} amask_cs[FSL_IFC_BANK_COUNT];
-	u32 res5[0x18];
+	u32 res5[0x17];
 	struct {
+		__be32 csor_ext;
 		__be32 csor;
-		u32 res6[0x2];
+		u32 res6;
 	} csor_cs[FSL_IFC_BANK_COUNT];
-	u32 res7[0x18];
+	u32 res7[0x19];
 	struct {
 		__be32 ftim[4];
 		u32 res8[0x8];
-- 
1.7.9.5

^ permalink raw reply related

* RE: [PATCH V7 1/3] powerpc/pci: Make sure ISA IO base is not zero
From: Jia Hongtao-B38951 @ 2012-08-16  3:37 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Wood Scott-B07421
  Cc: Wood Scott-B07421, linuxppc-dev@lists.ozlabs.org, Li Yang-R58472
In-Reply-To: <1345070575.11751.8.camel@pasglop>

DQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogQmVuamFtaW4gSGVycmVu
c2NobWlkdCBbbWFpbHRvOmJlbmhAa2VybmVsLmNyYXNoaW5nLm9yZ10NCj4gU2VudDogVGh1cnNk
YXksIEF1Z3VzdCAxNiwgMjAxMiA2OjQzIEFNDQo+IFRvOiBXb29kIFNjb3R0LUIwNzQyMQ0KPiBD
YzogSmlhIEhvbmd0YW8tQjM4OTUxOyBsaW51eHBwYy1kZXZAbGlzdHMub3psYWJzLm9yZzsNCj4g
Z2FsYWtAa2VybmVsLmNyYXNoaW5nLm9yZzsgTGkgWWFuZy1SNTg0NzI7IFdvb2QgU2NvdHQtQjA3
NDIxDQo+IFN1YmplY3Q6IFJlOiBbUEFUQ0ggVjcgMS8zXSBwb3dlcnBjL3BjaTogTWFrZSBzdXJl
IElTQSBJTyBiYXNlIGlzIG5vdA0KPiB6ZXJvDQo+IA0KPiBPbiBXZWQsIDIwMTItMDgtMTUgYXQg
MTY6NTcgLTA1MDAsIFNjb3R0IFdvb2Qgd3JvdGU6DQo+ID4gSXMgdGhlcmUgbm8gbGFzdGluZyBy
ZW1uYW50IG9mIHRoYXQgdGVtcG9yYXJ5IHdyb25nIGlzYV9pb19iYXNlPyAgV2UNCj4gPiB3b24n
dCBoYXZlIEkvTyByZXNvdXJjZXMgdGhhdCB3ZXJlIGNhbGN1bGF0ZWQgcmVsYXRpdmUgdG8gdGhh
dCwgd2hpY2gNCj4gPiBzdG9wIHdvcmtpbmcgb25jZSBpc2FfaW9fYmFzZSBjaGFuZ2VzPyAgT3Ig
ZG9lcyB0aGF0IGhhcHBlbiBsYXRlciwNCj4gPiBhZnRlciB0aGlzIGZ1bmN0aW9uIGhhcyBiZWVu
IGNhbGxlZCBvbiBhbGwgYnVzZXMgKGFuZCB3b3VsZCB0aGF0DQo+ID4gY29udGludWUgdG8gYmUg
dGhlIGNhc2UgaWYgd2UgY2hhbmdlIHRoZSBQQ0kgYnVzIHRvIGEgcGxhdGZvcm0gZGV2aWNlKT8N
Cj4gDQo+IElmIHlvdSBjb250aW51ZSBjcmVhdGluZyB5b3VyIFBDSSBidXNzZXMgYWxsIGF0IG9u
Y2UgZWFybHkgb24geW91J2xsIGJlDQo+IGZpbmUuIFRoZSBwbGF0Zm9ybSBkZXZpY2UgYnVzaW5l
c3MgaXMgZ29pbmcgdG8gYnJlYWsgdGhhdCAoYW5kIG90aGVyDQo+IHRoaW5ncyBhcyB3ZWxsIGJ0
dywgc3VjaCBhcyBwY2lfZmluYWxfZml4dXApLg0KDQpJIGhhdmUgYWxyZWFkeSBkb25lIHNvbWUg
aW52ZXN0aWdhdGlvbiBhbmQgdGhlIHNlcXVlbmNlIG9mIGZpeHVwIChpbmNsdWRpbmcNCmVhcmx5
LCBoZWFkZXIsIGZpbmFsKSB3aWxsIG5vdCBiZSBjaGFuZ2VkIGluIHBsYXRmb3JtIGRyaXZlci4N
Cg0KV2UgcmVnaXN0ZXIgYW5kIGluaXQgUENJIGNvbnRyb2xsZXJzIGFzIHBsYXRmb3JtIGRldmlj
ZXMgYXQgYXJjaF9pbml0Y2FsbA0Kc3RhZ2UgYW5kIFBDSSBzY2FubmluZyAocGNpYmlvc19pbml0
KSBpcyBhdCBzdWJzeXNfaW5pdGNhbGwgc3RhZ2UgaW4gd2hpY2gNCmVhcmx5IGFuZCBoZWFkZXIg
Zml4dXAgd2lsbCBiZSBkb25lIGluIHJpZ2h0IHNlcXVlbmNlLiBUaGUgZmluYWwgZml4dXAgd2ls
bA0KYmUgc3RhcnQgYXQgcm9vdGZzX2luaXRjYWxsIHN0YWdlIHdoaWNoIGlzIGxhdGVyIHRoYW4g
ZWFybHkgYW5kIGhlYWRlciBmaXh1cC4NCg0KLSBIb25ndGFvLg0KDQo+IA0KPiBNYXliZSBpdCdz
IHRpbWUgdG8gY29udGVtcGxhdGUgZG9pbmcgc29tZXRoaW5nIG1vcmUgbGlrZSBwcGM2NCBhbmQN
Cj4gcmVzZXJ2ZSBhIHBpZWNlIG9mIHZpcnR1YWwgYWRkcmVzcyBzcGFjZSAoSSBrbm93IHRoZXJl
IGlzbid0IG11Y2gsIHNvDQo+IG1ha2UgaXQgNjRrIHBlciBidXMgbWF4KSBhbmQganVzdCBtYXAg
dGhlIGJ1c3NlcyBpbiB0aGVyZSB3aXRoIHRoZSBmaXJzdA0KPiA2NGsgYmVpbmcgcmVzZXJ2ZWQg
Zm9yIHRoZSBJU0Egc3R1ZmYgaWYgaXQgZXhpc3RzID8NCj4gDQo+IENoZWVycywNCj4gQmVuLg0K
PiANCj4gDQoNCg==

^ permalink raw reply

* RE: [PATCH V7 1/3] powerpc/pci: Make sure ISA IO base is not zero
From: Jia Hongtao-B38951 @ 2012-08-16  3:11 UTC (permalink / raw)
  To: Wood Scott-B07421; +Cc: linuxppc-dev@lists.ozlabs.org, Li Yang-R58472
In-Reply-To: <502BDC6B.4090800@freescale.com>

DQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogV29vZCBTY290dC1CMDc0
MjENCj4gU2VudDogVGh1cnNkYXksIEF1Z3VzdCAxNiwgMjAxMiAxOjI5IEFNDQo+IFRvOiBKaWEg
SG9uZ3Rhby1CMzg5NTENCj4gQ2M6IGxpbnV4cHBjLWRldkBsaXN0cy5vemxhYnMub3JnOyBnYWxh
a0BrZXJuZWwuY3Jhc2hpbmcub3JnOw0KPiBiZW5oQGtlcm5lbC5jcmFzaGluZy5vcmc7IExpIFlh
bmctUjU4NDcyOyBXb29kIFNjb3R0LUIwNzQyMQ0KPiBTdWJqZWN0OiBSZTogW1BBVENIIFY3IDEv
M10gcG93ZXJwYy9wY2k6IE1ha2Ugc3VyZSBJU0EgSU8gYmFzZSBpcyBub3QNCj4gemVybw0KPiAN
Cj4gT24gMDgvMTUvMjAxMiAwMzo1NyBBTSwgSmlhIEhvbmd0YW8gd3JvdGU6DQo+ID4gRnJvbTog
QmVuamFtaW4gSGVycmVuc2NobWlkdCA8YmVuaEBrZXJuZWwuY3Jhc2hpbmcub3JnPg0KPiA+DQo+
ID4gU29tZSBwbGF0Zm9ybXMgbGlrZSBRRU1VIHRyZWF0IDAgYXMgYW4gaW52YWxpZCBhZGRyZXNz
IGZvciBJU0EgSU8gYmFzZS4NCj4gPiBTbyB3ZSBtYWtlIHN1cmUgdGhhdCBJU0EgSU8gYmFzZSB3
aWxsIG5ldmVyIGJlIHplcm8uIEJ5IGZ1bmN0aW9uYWxpdHkNCj4gPiB0aGlzIGlzIGVxdWl2YWxl
bnQgdG8gYXNzZ2luIHRoZSBmaXJzdCBwY2kgYnVzIGRldGVjdGVkIGFzIGEgcHJpbWFyeQ0KPiBi
dXMuDQo+ID4NCj4gPiBTaWduZWQtb2ZmLWJ5OiBCZW5qYW1pbiBIZXJyZW5zY2htaWR0IDxiZW5o
QGtlcm5lbC5jcmFzaGluZy5vcmc+DQo+ID4gU2lnbmVkLW9mZi1ieTogSmlhIEhvbmd0YW8gPEIz
ODk1MUBmcmVlc2NhbGUuY29tPg0KPiANCj4gV2hlbiBkaWQgQmVuIHBvc3QgdGhpcz8NCj4gDQo+
IFN1Z2dlc3RpbmcgYSB0ZW1wb3Jhcnkgd29ya2Fyb3VuZCBpbiBhbiBlLW1haWwgaXMgKm5vdCog
dGhlIHNhbWUgYXMNCj4gcG9zdGluZyBhIHBhdGNoLCBhbmQgZGVmaW5pdGVseSBub3QgdGhlIHNh
bWUgYXMgcHJvdmlkaW5nIGEgc2lnbmVkLW9mZi1ieQ0KPiB3aGljaCBBRkFJQ1QgeW91IGZvcmdl
ZC4gIERvbid0IGRvIHRoYXQuDQo+IA0KPiA+IC0tLQ0KPiA+ICBhcmNoL3Bvd2VycGMva2VybmVs
L3BjaS1jb21tb24uYyB8ICAgIDIgKy0NCj4gPiAgMSBmaWxlcyBjaGFuZ2VkLCAxIGluc2VydGlv
bnMoKyksIDEgZGVsZXRpb25zKC0pDQo+ID4NCj4gPiBkaWZmIC0tZ2l0IGEvYXJjaC9wb3dlcnBj
L2tlcm5lbC9wY2ktY29tbW9uLmMNCj4gPiBiL2FyY2gvcG93ZXJwYy9rZXJuZWwvcGNpLWNvbW1v
bi5jDQo+ID4gaW5kZXggMGY3NWJkNS4uMmEwOWFhNSAxMDA2NDQNCj4gPiAtLS0gYS9hcmNoL3Bv
d2VycGMva2VybmVsL3BjaS1jb21tb24uYw0KPiA+ICsrKyBiL2FyY2gvcG93ZXJwYy9rZXJuZWwv
cGNpLWNvbW1vbi5jDQo+ID4gQEAgLTczNCw3ICs3MzQsNyBAQCB2b2lkIF9fZGV2aW5pdCBwY2lf
cHJvY2Vzc19icmlkZ2VfT0ZfcmFuZ2VzKHN0cnVjdA0KPiBwY2lfY29udHJvbGxlciAqaG9zZSwN
Cj4gPiAgCQkJaG9zZS0+aW9fYmFzZV92aXJ0ID0gaW9yZW1hcChjcHVfYWRkciwgc2l6ZSk7DQo+
ID4NCj4gPiAgCQkJLyogRXhwZWN0IHRyb3VibGUgaWYgcGNpX2FkZHIgaXMgbm90IDAgKi8NCj4g
PiAtCQkJaWYgKHByaW1hcnkpDQo+ID4gKwkJCWlmIChwcmltYXJ5IHx8ICFpc2FfaW9fYmFzZSkN
Cj4gPiAgCQkJCWlzYV9pb19iYXNlID0NCj4gPiAgCQkJCQkodW5zaWduZWQgbG9uZylob3NlLT5p
b19iYXNlX3ZpcnQ7ICAjZW5kaWYNCj4gLyogQ09ORklHX1BQQzMyICovDQo+ID4NCj4gDQo+IERp
ZG4ndCBJIGFscmVhZHkgcG9pbnQgb3V0IHRoYXQgdGhpcyBoYXMgcHJvYmxlbXMgd2hlbiB0aGUg
cHJpbWFyeSBidXMgaXMNCj4gbm90IHRoZSBmaXJzdCB0byBiZSBwcm9iZWQ/ICBJZiB5b3VyIGFu
c3dlciBpcyB0aGF0IHlvdSBmaXggdGhhdCBpbiBhDQo+IGxhdGVyIHBhdGNoLCB0aGF0IGJyZWFr
cyBiaXNlY3RhYmlsaXR5Lg0KPiANCj4gLVNjb3R0DQoNClNvcnJ5LCBteSBhbnN3ZXIgaXMgbm90
IHRoYXQgSSBmaXggdGhhdCBpbiBsYXRlciBwYXRjaC4NCk15IGFuc3dlciBpcywgd2l0aG91dCB0
aGlzIHBhdGNoIHRoZXJlIGlzIGFsc28gcHJvYmxlbSB3aXRoIG5vbi1maXJzdC1wcmltYXJ5Lg0K
VGhhdCBpcyB0byBzYXkgdGhlIGJpc2VjdGFiaWxpdHkgcHJvYmxlbSBoYXMgYmVlbiBhbHJlYWR5
IHRoZXJlLg0KVGhlIHByb2JsZW0gaXMgbm90IGJyb3VnaHQgYnkgdGhpcyBwYXRjaC4NCg0KLSBI
b25ndGFvLg0KDQo=

^ permalink raw reply

* Re: [PATCH V7 1/3] powerpc/pci: Make sure ISA IO base is not zero
From: Benjamin Herrenschmidt @ 2012-08-15 22:42 UTC (permalink / raw)
  To: Scott Wood; +Cc: B07421, linuxppc-dev, Jia Hongtao
In-Reply-To: <502C1B64.8050505@freescale.com>

On Wed, 2012-08-15 at 16:57 -0500, Scott Wood wrote:
> Is there no lasting remnant of that temporary wrong isa_io_base?  We
> won't have I/O resources that were calculated relative to that, which
> stop working once isa_io_base changes?  Or does that happen later, after
> this function has been called on all buses (and would that continue to
> be the case if we change the PCI bus to a platform device)?

If you continue creating your PCI busses all at once early on you'll be
fine. The platform device business is going to break that (and other
things as well btw, such as pci_final_fixup).

Maybe it's time to contemplate doing something more like ppc64 and
reserve a piece of virtual address space (I know there isn't much, so
make it 64k per bus max) and just map the busses in there with the first
64k being reserved for the ISA stuff if it exists ?

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH V7 1/3] powerpc/pci: Make sure ISA IO base is not zero
From: Scott Wood @ 2012-08-15 21:57 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: B07421, linuxppc-dev, Jia Hongtao
In-Reply-To: <1345066379.11596.1.camel@pasglop>

On 08/15/2012 04:32 PM, Benjamin Herrenschmidt wrote:
> On Wed, 2012-08-15 at 12:29 -0500, Scott Wood wrote:
>>> ---
>>>  arch/powerpc/kernel/pci-common.c |    2 +-
>>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
>>> index 0f75bd5..2a09aa5 100644
>>> --- a/arch/powerpc/kernel/pci-common.c
>>> +++ b/arch/powerpc/kernel/pci-common.c
>>> @@ -734,7 +734,7 @@ void __devinit pci_process_bridge_OF_ranges(struct pci_controller *hose,
>>>  			hose->io_base_virt = ioremap(cpu_addr, size);
>>>  
>>>  			/* Expect trouble if pci_addr is not 0 */
>>> -			if (primary)
>>> +			if (primary || !isa_io_base)
>>>  				isa_io_base =
>>>  					(unsigned long)hose->io_base_virt;
>>>  #endif /* CONFIG_PPC32 */
>>>
>>
>> Didn't I already point out that this has problems when the primary bus
>> is not the first to be probed?  If your answer is that you fix that in a
>> later patch, that breaks bisectability.
> 
> Is it though ? ie, we will override it with the real primary in the
> above test, so it will only very temporarily be set to the "wrong" bus
> no ? IE, the test will still trip on the actual "primary" if there's
> one

Is there no lasting remnant of that temporary wrong isa_io_base?  We
won't have I/O resources that were calculated relative to that, which
stop working once isa_io_base changes?  Or does that happen later, after
this function has been called on all buses (and would that continue to
be the case if we change the PCI bus to a platform device)?

-Scott

^ permalink raw reply

* Re: [PATCH v3 2/2] powerpc: Uprobes port to powerpc
From: Benjamin Herrenschmidt @ 2012-08-15 21:41 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: peterz, lkml, Paul Mackerras, Anton Blanchard, Ingo Molnar,
	linuxppc-dev, Srikar Dronamraju
In-Reply-To: <20120815165931.GA10059@redhat.com>

On Wed, 2012-08-15 at 18:59 +0200, Oleg Nesterov wrote:
> On 07/26, Ananth N Mavinakayanahalli wrote:
> >
> > From: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
> >
> > This is the port of uprobes to powerpc. Usage is similar to x86.
> 
> I am just curious why this series was ignored by powerpc maintainers...

Because it arrived too late for the previous merge window considering my
limited bandwidth for reviewing things and that nobody else seems to
have reviewed it :-)

It's still on track for the next one, and I'm hoping to dedicate most of
next week going through patches & doing a powerpc -next.

> Of course I can not review this code, I know nothing about powerpc,
> but the patches look simple/straightforward.
> 
> Paul, Benjamin?
> 
> Just one question... Shouldn't arch_uprobe_pre_xol() forbid to probe
> UPROBE_SWBP_INSN (at least) ?
> 
> (I assume that emulate_step() can't handle this case but of course I
>  do not understand arch/powerpc/lib/sstep.c)
> 
> Note that uprobe_pre_sstep_notifier() sets utask->state = UTASK_BP_HIT
> without any checks. This doesn't look right if it was UTASK_SSTEP...
> 
> But again, I do not know what powepc will actually do if we try to
> single-step over UPROBE_SWBP_INSN.

Ananth ?

Cheers,
Ben.

^ permalink raw reply

* Re: therm_pm72 units, interface
From: Benjamin Herrenschmidt @ 2012-08-15 21:36 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: linuxppc-dev
In-Reply-To: <alpine.LNX.2.01.1208151941010.22695@frira.zrqbmnf.qr>

BTW... On a somewhat related note ... if you happen to have a spare
Xserve G5 PSU I'm interested :-) Mine died (well I -think- it's the
PSU ... it just won't power up) which means I can't test on these things
anymore (the new Windfarm RackMac driver is totally untested for
example).

Cheers,
Ben.

On Wed, 2012-08-15 at 19:53 +0200, Jan Engelhardt wrote:
> About a week ago, an XServe G5 of mine started powering off more or less 
> randomly (after 1 hour, chances were good it for it to occur). A 
> problematic UPS has already been cut from the loop, and today I cleaned 
> the machine inside out with pressurized air. So far it runs, for now at 
> least, with a load >= 2.0, but I am keeping an eye on whether this is a 
> thermal issue.
> 
> To that end, I wanted to obtain some statistics. Despite 
> sensors-detect(8) loads lm87.ko for me, running sensors(1) shows no 
> sensors. Oddly enough, I found a "kfand" process running that seems to 
> stem from therm_pm72.ko, which brings me to the sysfs files.
> 
> Is there a reason sensors(1) is not supported for Rackmac3,1?
> 
> Certain sysfs files have a value with an unknown unit.
> "current" is likely in Ampere, temperature must be in Celsius
> (because there's no way the server room is 54°F=12°C cold).
> 
> Is there a way to obtain the trip points for the hardware?
> 
> $ cd /sys/devices/temperature; grep '' *;
> backside_fan_pwm:32
> backside_temperature:54.000
> cpu0_current:34.423
> cpu0_exhaust_fan_rpm:5340
> cpu0_intake_fan_rpm:5340
> cpu0_temperature:72.889
> cpu0_voltage:1.252
> cpu1_current:34.179
> cpu1_exhaust_fan_rpm:4584
> cpu1_intake_fan_rpm:4584
> cpu1_temperature:68.526
> cpu1_voltage:1.259
> dimms_temperature:53.000
> grep: driver: Er en filkatalog
> modalias:platform:temperature
> grep: power: Er en filkatalog
> slots_fan_pwm:20
> slots_temperature:38.500
> grep: subsystem: Er en filkatalog
> uevent:DRIVER=temperature
> uevent:OF_NAME=fan
> uevent:OF_FULLNAME=/u3@0,f8000000/i2c@f8001000/fan@15e
> uevent:OF_TYPE=fcu
> uevent:OF_COMPATIBLE_0=fcu
> uevent:OF_COMPATIBLE_N=1
> uevent:MODALIAS=of:NfanTfcuCfcu

^ permalink raw reply

* Re: therm_pm72 units, interface
From: Benjamin Herrenschmidt @ 2012-08-15 21:35 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: linuxppc-dev
In-Reply-To: <alpine.LNX.2.01.1208151941010.22695@frira.zrqbmnf.qr>

On Wed, 2012-08-15 at 19:53 +0200, Jan Engelhardt wrote:
> About a week ago, an XServe G5 of mine started powering off more or less 
> randomly (after 1 hour, chances were good it for it to occur). A 
> problematic UPS has already been cut from the loop, and today I cleaned 
> the machine inside out with pressurized air. So far it runs, for now at 
> least, with a load >= 2.0, but I am keeping an eye on whether this is a 
> thermal issue.
> 
> To that end, I wanted to obtain some statistics. Despite 
> sensors-detect(8) loads lm87.ko for me, running sensors(1) shows no 
> sensors. Oddly enough, I found a "kfand" process running that seems to 
> stem from therm_pm72.ko, which brings me to the sysfs files.
> 
> Is there a reason sensors(1) is not supported for Rackmac3,1?
> 
> Certain sysfs files have a value with an unknown unit.
> "current" is likely in Ampere, temperature must be in Celsius
> (because there's no way the server room is 54°F=12°C cold).
> 
> Is there a way to obtain the trip points for the hardware?

BTW. There's a new windfarm driver for these in recent kernels...

Appart from that, the trip points are coming from a calibration EEPROM,
you may want to tweak the driver to warn a bit earlier or that sort of
things ? (Or just to print more things out ?)

Cheers,
Ben.

> $ cd /sys/devices/temperature; grep '' *;
> backside_fan_pwm:32
> backside_temperature:54.000
> cpu0_current:34.423
> cpu0_exhaust_fan_rpm:5340
> cpu0_intake_fan_rpm:5340
> cpu0_temperature:72.889
> cpu0_voltage:1.252
> cpu1_current:34.179
> cpu1_exhaust_fan_rpm:4584
> cpu1_intake_fan_rpm:4584
> cpu1_temperature:68.526
> cpu1_voltage:1.259
> dimms_temperature:53.000
> grep: driver: Er en filkatalog
> modalias:platform:temperature
> grep: power: Er en filkatalog
> slots_fan_pwm:20
> slots_temperature:38.500
> grep: subsystem: Er en filkatalog
> uevent:DRIVER=temperature
> uevent:OF_NAME=fan
> uevent:OF_FULLNAME=/u3@0,f8000000/i2c@f8001000/fan@15e
> uevent:OF_TYPE=fcu
> uevent:OF_COMPATIBLE_0=fcu
> uevent:OF_COMPATIBLE_N=1
> uevent:MODALIAS=of:NfanTfcuCfcu

^ permalink raw reply

* Re: [PATCH V7 1/3] powerpc/pci: Make sure ISA IO base is not zero
From: Benjamin Herrenschmidt @ 2012-08-15 21:32 UTC (permalink / raw)
  To: Scott Wood; +Cc: B07421, linuxppc-dev, Jia Hongtao
In-Reply-To: <502BDC6B.4090800@freescale.com>

On Wed, 2012-08-15 at 12:29 -0500, Scott Wood wrote:
> > ---
> >  arch/powerpc/kernel/pci-common.c |    2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> > 
> > diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
> > index 0f75bd5..2a09aa5 100644
> > --- a/arch/powerpc/kernel/pci-common.c
> > +++ b/arch/powerpc/kernel/pci-common.c
> > @@ -734,7 +734,7 @@ void __devinit pci_process_bridge_OF_ranges(struct pci_controller *hose,
> >  			hose->io_base_virt = ioremap(cpu_addr, size);
> >  
> >  			/* Expect trouble if pci_addr is not 0 */
> > -			if (primary)
> > +			if (primary || !isa_io_base)
> >  				isa_io_base =
> >  					(unsigned long)hose->io_base_virt;
> >  #endif /* CONFIG_PPC32 */
> > 
> 
> Didn't I already point out that this has problems when the primary bus
> is not the first to be probed?  If your answer is that you fix that in a
> later patch, that breaks bisectability.

Is it though ? ie, we will override it with the real primary in the
above test, so it will only very temporarily be set to the "wrong" bus
no ? IE, the test will still trip on the actual "primary" if there's
one.

Cheers,
Ben.

^ permalink raw reply

* GE IMP3a
From: Kumar Gala @ 2012-08-15 21:32 UTC (permalink / raw)
  To: Martyn Welch; +Cc: linuxppc-dev@lists.ozlabs.org list

Martyn,

Do you know why ge_imp3a.c has 0x9000 as the 'primary' PCIe bus on the board?

- k

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox