LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v9 00/13] KVM: PPC: IOMMU in-kernel handling of VFIO
From: Gleb Natapov @ 2013-08-30 17:58 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm, Alexander Graf, kvm-ppc, linux-kernel, linux-mm,
	Alex Williamson, Paul Mackerras, Paolo Bonzini, linuxppc-dev,
	David Gibson
In-Reply-To: <5220736E.5050503@ozlabs.ru>

On Fri, Aug 30, 2013 at 08:26:54PM +1000, Alexey Kardashevskiy wrote:
> On 08/28/2013 06:37 PM, Alexey Kardashevskiy wrote:
> > This accelerates VFIO DMA operations on POWER by moving them
> > into kernel.
> > 
> > This depends on VFIO external API patch which is on its way to upstream.
> > 
> > Changes:
> > v9:
> > * replaced the "link logical bus number to IOMMU group" ioctl to KVM
> > with a KVM device doing the same thing, i.e. the actual changes are in
> > these 3 patches:
> >   KVM: PPC: reserve a capability and KVM device type for realmode VFIO
> >   KVM: PPC: remove warning from kvmppc_core_destroy_vm
> >   KVM: PPC: Add support for IOMMU in-kernel handling
> > 
> > * moved some VFIO external API bits to a separate patch to reduce the size
> > of the "KVM: PPC: Add support for IOMMU in-kernel handling" patch
> > 
> > * fixed code style problems reported by checkpatch.pl.
> > 
> > v8:
> > * fixed comments about capabilities numbers
> > 
> > v7:
> > * rebased on v3.11-rc3.
> > * VFIO external user API will go through VFIO tree so it is
> > excluded from this series.
> > * As nobody ever reacted on "hashtable: add hash_for_each_possible_rcu_notrace()",
> > Ben suggested to push it via his tree so I included it to the series.
> > * realmode_(get|put)_page is reworked.
> > 
> > More details in the individual patch comments.
> > 
> > Alexey Kardashevskiy (13):
> >   KVM: PPC: POWERNV: move iommu_add_device earlier
> >   hashtable: add hash_for_each_possible_rcu_notrace()
> >   KVM: PPC: reserve a capability number for multitce support
> >   KVM: PPC: reserve a capability and KVM device type for realmode VFIO
> 
> 
> Hi Gleb!
> 
> Could you please review and pick (if they are ok) the two "capability"
> patches from above?
> 
> It would be cool if you also looked at "KVM: PPC: Add support for IOMMU
> in-kernel handling", the part about KVM device for SPAPR TCE IOMMU table.
> 
> Thanks!
Will do it next week.

> 
> 
> 
> >   powerpc: Prepare to support kernel handling of IOMMU map/unmap
> >   powerpc: add real mode support for dma operations on powernv
> >   KVM: PPC: enable IOMMU_API for KVM_BOOK3S_64 permanently
> >   KVM: PPC: Add support for multiple-TCE hcalls
> >   powerpc/iommu: rework to support realmode
> >   KVM: PPC: remove warning from kvmppc_core_destroy_vm
> >   KVM: PPC: add trampolines for VFIO external API
> >   KVM: PPC: Add support for IOMMU in-kernel handling
> >   KVM: PPC: Add hugepage support for IOMMU in-kernel handling
> > 
> >  Documentation/virtual/kvm/api.txt                  |  26 +
> >  .../virtual/kvm/devices/spapr_tce_iommu.txt        |  37 ++
> >  arch/powerpc/include/asm/iommu.h                   |  18 +-
> >  arch/powerpc/include/asm/kvm_host.h                |  38 ++
> >  arch/powerpc/include/asm/kvm_ppc.h                 |  16 +-
> >  arch/powerpc/include/asm/machdep.h                 |  12 +
> >  arch/powerpc/include/asm/pgtable-ppc64.h           |   2 +
> >  arch/powerpc/include/uapi/asm/kvm.h                |   8 +
> >  arch/powerpc/kernel/iommu.c                        | 243 +++++----
> >  arch/powerpc/kvm/Kconfig                           |   1 +
> >  arch/powerpc/kvm/book3s_64_vio.c                   | 597 ++++++++++++++++++++-
> >  arch/powerpc/kvm/book3s_64_vio_hv.c                | 408 +++++++++++++-
> >  arch/powerpc/kvm/book3s_hv.c                       |  42 +-
> >  arch/powerpc/kvm/book3s_hv_rmhandlers.S            |   8 +-
> >  arch/powerpc/kvm/book3s_pr_papr.c                  |  35 ++
> >  arch/powerpc/kvm/powerpc.c                         |   4 +
> >  arch/powerpc/mm/init_64.c                          |  50 +-
> >  arch/powerpc/platforms/powernv/pci-ioda.c          |  57 +-
> >  arch/powerpc/platforms/powernv/pci-p5ioc2.c        |   2 +-
> >  arch/powerpc/platforms/powernv/pci.c               |  75 ++-
> >  arch/powerpc/platforms/powernv/pci.h               |   3 +-
> >  arch/powerpc/platforms/pseries/iommu.c             |   8 +-
> >  include/linux/hashtable.h                          |  15 +
> >  include/linux/kvm_host.h                           |   1 +
> >  include/linux/mm.h                                 |  14 +
> >  include/linux/page-flags.h                         |   4 +-
> >  include/uapi/linux/kvm.h                           |   3 +
> >  virt/kvm/kvm_main.c                                |   5 +
> >  28 files changed, 1564 insertions(+), 168 deletions(-)
> >  create mode 100644 Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
> > 
> 
> 
> -- 
> Alexey

--
			Gleb.

^ permalink raw reply

* Re: Ethernet over PCIe driver for Inter-Processor Communication
From: Saravanan S @ 2013-08-30 17:37 UTC (permalink / raw)
  To: David Hawkins, Michael George, naishab
  Cc: linuxppc-dev@lists.ozlabs.org, Ira W. Snyder
In-Reply-To: <521A8782.60807@ovro.caltech.edu>

[-- Attachment #1: Type: text/plain, Size: 5228 bytes --]

Hi All ,


On Mon, Aug 26, 2013 at 4:08 AM, David Hawkins <dwh@ovro.caltech.edu> wrote:

> Hi S.Saravanan,
>
>
>  Root complex's would normally interrupt a device via a PCIe write
>>> to a register in a BAR on the end-point (or in extended configuration
>>> space registers depending on the hardware implementation).
>>>
>>
>> MPC8640 End point implements only the Type 0 header (Page 1116) . The
>> header implements five BARs (Page 1165).
>>
>
> One of those BARs typically provides access to the PowerPC memory
> mapped registers (or at least a 1MB window onto those registers).
> This is how your root complex can write to some form of messaging
> register.
>
>  PCIe drivers need some way to interrupt the processor, so there must
>>> be an option somewhere ... for example, what are the message register
>>> interrupts intended for? See p479
>>>
>>> http://cache.freescale.com/files/32bit/doc/ref_manual/MPC8641DRM.pdf
>>>
>>> (Ira and myself have not used the MPC8640 so are not familiar with
>>> its user manual).
>>>
>>
>> Message registers are for interrupting the processor. A write to
>> them sends an interrupt to the processor.  Actually message registers
>>
>> are used by the RC to enable interrupts to the processor when an EP
>> sends an MSI transaction to RC. In RC driver i register separately for
>>
>> the msi interrupts from all three EPs.
>>
>
> This is pretty much what you are looking for then right?



I successfully  mapped the Programmable Interrupt Controller registers in
the EP to the PCI space . Thus now I can write the shared message interrupt
registers in the EP from the RC over PCI . But  I am facing the following
problems now  .

1) In my driver at EP, to register for this interrupt I need to know the
hardware irq number but I can't find any interrupt number assigned  by the
PIC for the messages interrupt sources(Page 451 , MPC8641DRM manual).
2) Otherwise i need to get the virtual irq number assigned by kernel
corresponding to the message interrupt . I am unable to find a method to
get this also.

In the RC side driver i get the virtual irq number after calling
pci_enable_msi() which is straightforward.
I studied the RC code which sets up shared message interrupts (Page 481,
MPC manual)  for PCI MSI interrupts . When  msi is enabled the
"arch_setup_msi_irqs()" is called leading to the fsl_setup_msi_irqs() (
http://lxr.free-electrons.com/source/arch/powerpc/sysdev/fsl_msi.c?v=3.7#L151)
. In this function the virtual irq no is obtained as below:

*virq = irq_create_mapping(msi_data->irqhost, hwirq);*

* *
In the above function the hardware irq number is same as the value
written into the  Shared Message Signaled Interrupt Index Register (Page
482) which is strange. Further these functions are called in the RC during
pci_probe at boot time or when pci_enable_msi() is called . Thus there is a
always a PCI slave device context to it. However I  require to do it in the
EP which has no pci probing nor any  pci device reference whatsoever as it
a slave. Is this approach right  ?

The end-points interrrupt the root-complex using PCIe MSI interrupts,
> whereas the root-complex interrupts an end-point by writing directly
> to its MSI interrupt.
>
>
>  To access them in the EP from the RC  i will have to set an inbound
>> window mapping the PIC register space in the EP to the PCI mem space
>> assigned to it . An inbound window maps a PCI address on the bus
>> received by the PCIe controller to a platform address. I will try that
>> and let u know .
>>
>
> Right, as I comment above, one of the BARs typically exposes the PowerPC
> internal registers.
>
>
>  Feel free to discuss your ideas for your PCIe driver (eg., why start
>>> with rionet rather than Ira's driver), either on-list, or email Ira
>>> and myself directly
>>>
>>
>> To be frank with you there was no particular reason in starting with
>> rionet. Maybe because our board also had SRIO interface and we are using
>> rionet driver successfully. I had looked at Ira's driver later. I will
>> study that also and try   to come back with a skeleton for my driver.
>>
>
> Its always a good idea to discuss different options, and to stub out
> drivers or create minimal (but functional) drivers. That way you'll
> be able to see how similar your new driver is to other drivers, and
> you'll quickly discover if there is a hardware feature in the
> existing driver that you cannot emulate (eg., some SRIO feature
> used by the rionet driver).
>

Right now I am trying a very primitive driver just to check the feasibility
of bi-directional communication between the RC and the EP. Once this is
established  I will be in a better position to get inputs on making it a
more effective one.


>  One further note. You might want to look at rproc/rpmsg and their virtio
>>> driver support. That seems to be where the Linux world is moving for
>>> inter-processor communications. See for example the ARM CPUs interfacing
>>> with DSPs.
>>>
>>
>> I will study that as i am not familiar with virtio .
>>
>
> Follow Ira's advice. Talk to the guys working on virtio, tell them what
> you are trying to do. They'll likely have good advice for you.
>
> Good luck!
>
> Cheers,
> Dave
>
>
>
Warm Regards,

S.Saravanan

[-- Attachment #2: Type: text/html, Size: 8105 bytes --]

^ permalink raw reply

* Re: [PATCH 1/5] jump_label: factor out the base part of jump_label.h to a separate file
From: Radim Krčmář @ 2013-08-30 16:37 UTC (permalink / raw)
  To: Kevin Hao; +Cc: linuxppc, linux-kernel
In-Reply-To: <1377414952-15995-2-git-send-email-haokexin@gmail.com>

2013-08-25 15:15+0800, Kevin Hao:
> We plan to use the jump label in the cpu/mmu feature check on ppc.
> This will need to include the jump_label.h in several very basic header
> files of ppc which seems to be included by most of the other head
> files implicitly or explicitly. But in the current jump_label.h,
> it also include the "linux/workqueue.h" and this will cause recursive
> inclusion. In order to fix this, we choose to factor out the base
> part of jump_label.h to a separate header file and we can include
> that file instead of jump_label.h to avoid the recursive inclusion.
> No functional change.

"linux/workqueue.h" was included because of deferred keys and they are
split into "linux/jump_label_ratelimit.h" to solve the same problem in
paravirt ticket spinlock series.
(still in -next: 851cf6e7 jump_label: Split jumplabel ratelimit)

^ permalink raw reply

* Re: [PATCH v2 0/4] Unify CPU hotplug lock interface
From: Toshi Kani @ 2013-08-30 14:35 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: fenghua.yu, bp, gregkh, x86, linux-kernel, linux-acpi,
	Yasuaki Ishimatsu, mingo, srivatsa.bhat, nfont, tglx, hpa,
	linuxppc-dev
In-Reply-To: <4448049.EdPFQ3JMc0@vostro.rjw.lan>

On Fri, 2013-08-30 at 14:14 +0200, Rafael J. Wysocki wrote:
> On Friday, August 30, 2013 11:44:06 AM Yasuaki Ishimatsu wrote:
> > (2013/08/30 9:22), Toshi Kani wrote:
> > > lock_device_hotplug() was recently introduced to serialize CPU & Memory
> > > online/offline and hotplug operations, along with sysfs online interface
> > > restructure (commit 4f3549d7).  With this new locking scheme,
> > > cpu_hotplug_driver_lock() is redundant and is no longer necessary.
> > > 
> > > This patchset makes sure that lock_device_hotplug() covers all CPU online/
> > > offline interfaces, and then removes cpu_hotplug_driver_lock().
> > > 
> > > v2:
> > >   - Rebased to the pm tree, bleeding-edge.
> > >   - Changed patch 2/4 to use lock_device_hotplug_sysfs().
> > > 
> > > ---
> > > Toshi Kani (4):
> > >    hotplug, x86: Fix online state in cpu0 debug interface
> > >    hotplug, x86: Add hotplug lock to missing places
> > >    hotplug, x86: Disable ARCH_CPU_PROBE_RELEASE on x86
> > >    hotplug, powerpc, x86: Remove cpu_hotplug_driver_lock()
> > > 
> > > ---
> > The patch-set looks good to me.
> > 
> > Acked-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> 
> Thanks!  I'll tentatively queue up this series for 3.12 (for the second
> half of the merge window).

Thanks Rafael!
-Toshi

^ permalink raw reply

* Re: [PATCH v2 0/4] Unify CPU hotplug lock interface
From: Toshi Kani @ 2013-08-30 14:33 UTC (permalink / raw)
  To: Yasuaki Ishimatsu
  Cc: fenghua.yu, bp, gregkh, x86, linux-kernel, rjw, linux-acpi, mingo,
	srivatsa.bhat, nfont, tglx, hpa, linuxppc-dev
In-Reply-To: <522006F6.2060206@jp.fujitsu.com>

On Fri, 2013-08-30 at 11:44 +0900, Yasuaki Ishimatsu wrote:
> (2013/08/30 9:22), Toshi Kani wrote:
> > lock_device_hotplug() was recently introduced to serialize CPU & Memory
> > online/offline and hotplug operations, along with sysfs online interface
> > restructure (commit 4f3549d7).  With this new locking scheme,
> > cpu_hotplug_driver_lock() is redundant and is no longer necessary.
> > 
> > This patchset makes sure that lock_device_hotplug() covers all CPU online/
> > offline interfaces, and then removes cpu_hotplug_driver_lock().
> > 
> > v2:
> >   - Rebased to the pm tree, bleeding-edge.
> >   - Changed patch 2/4 to use lock_device_hotplug_sysfs().
> > 
> > ---
> > Toshi Kani (4):
> >    hotplug, x86: Fix online state in cpu0 debug interface
> >    hotplug, x86: Add hotplug lock to missing places
> >    hotplug, x86: Disable ARCH_CPU_PROBE_RELEASE on x86
> >    hotplug, powerpc, x86: Remove cpu_hotplug_driver_lock()
> > 
> > ---
> The patch-set looks good to me.
> 
> Acked-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

Thanks Yasuaki!
-Toshi

^ permalink raw reply

* Re: [PATCH v2 0/4] Unify CPU hotplug lock interface
From: Rafael J. Wysocki @ 2013-08-30 12:14 UTC (permalink / raw)
  To: Yasuaki Ishimatsu
  Cc: fenghua.yu, bp, Toshi Kani, gregkh, x86, linux-kernel, linux-acpi,
	mingo, srivatsa.bhat, nfont, tglx, hpa, linuxppc-dev
In-Reply-To: <522006F6.2060206@jp.fujitsu.com>

On Friday, August 30, 2013 11:44:06 AM Yasuaki Ishimatsu wrote:
> (2013/08/30 9:22), Toshi Kani wrote:
> > lock_device_hotplug() was recently introduced to serialize CPU & Memory
> > online/offline and hotplug operations, along with sysfs online interface
> > restructure (commit 4f3549d7).  With this new locking scheme,
> > cpu_hotplug_driver_lock() is redundant and is no longer necessary.
> > 
> > This patchset makes sure that lock_device_hotplug() covers all CPU online/
> > offline interfaces, and then removes cpu_hotplug_driver_lock().
> > 
> > v2:
> >   - Rebased to the pm tree, bleeding-edge.
> >   - Changed patch 2/4 to use lock_device_hotplug_sysfs().
> > 
> > ---
> > Toshi Kani (4):
> >    hotplug, x86: Fix online state in cpu0 debug interface
> >    hotplug, x86: Add hotplug lock to missing places
> >    hotplug, x86: Disable ARCH_CPU_PROBE_RELEASE on x86
> >    hotplug, powerpc, x86: Remove cpu_hotplug_driver_lock()
> > 
> > ---
> The patch-set looks good to me.
> 
> Acked-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

Thanks!  I'll tentatively queue up this series for 3.12 (for the second
half of the merge window).

Thanks,
Rafael

^ permalink raw reply

* Re: [PATCH V2 0/6] perf: New conditional branch filter
From: Stephane Eranian @ 2013-08-30 11:48 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Sukadev Bhattiprolu, LKML, Arnaldo Carvalho de Melo,
	Linux PPC dev, ellerman, michael.neuling
In-Reply-To: <1377836690-32710-1-git-send-email-khandual@linux.vnet.ibm.com>

2013/8/30 Anshuman Khandual <khandual@linux.vnet.ibm.com>
>
>         This patchset is the re-spin of the original branch stack sampling
> patchset which introduced new PERF_SAMPLE_BRANCH_COND filter. This patchset
> also enables SW based branch filtering support for PPC64 platforms which have
> branch stack sampling support. With this new enablement, the branch filter support
> for PPC64 platforms have been extended to include all these combinations discussed
> below with a sample test application program.
>
>
I am trying to understand which HW has support for capturing the
branches: PPC7 or PPC8.
Then it seems you're saying that only PPC8 has the filtering support.
On PPC7 you use the
SW filter. Did I get this right?

I will look at the patch set.

>
> (1) perf record -e branch-misses:u -b ./cprog
> # Overhead  Command  Source Shared Object          Source Symbol  Target Shared Object          Target Symbol
> # ........  .......  ....................  .....................  ....................  .....................
> #
>      4.42%    cprog  cprog                 [k] sw_4_2             cprog                 [k] lr_addr
>      4.41%    cprog  cprog                 [k] symbol2            cprog                 [k] hw_1_2
>      4.41%    cprog  cprog                 [k] ctr_addr           cprog                 [k] sw_4_1
>      4.41%    cprog  cprog                 [k] lr_addr            cprog                 [k] sw_4_2
>      4.41%    cprog  cprog                 [k] sw_4_2             cprog                 [k] callme
>      4.41%    cprog  cprog                 [k] symbol1            cprog                 [k] hw_1_1
>      4.41%    cprog  cprog                 [k] success_3_1_3      cprog                 [k] sw_3_1
>      2.43%    cprog  cprog                 [k] sw_4_1             cprog                 [k] ctr_addr
>      2.43%    cprog  cprog                 [k] hw_1_2             cprog                 [k] symbol2
>      2.43%    cprog  cprog                 [k] callme             cprog                 [k] hw_1_2
>      2.43%    cprog  cprog                 [k] address1           cprog                 [k] back1
>      2.43%    cprog  cprog                 [k] back1              cprog                 [k] callme
>      2.43%    cprog  cprog                 [k] hw_2_1             cprog                 [k] address1
>      2.43%    cprog  cprog                 [k] sw_3_1_1           cprog                 [k] sw_3_1
>      2.43%    cprog  cprog                 [k] sw_3_1_2           cprog                 [k] sw_3_1
>      2.43%    cprog  cprog                 [k] sw_3_1_3           cprog                 [k] sw_3_1
>      2.43%    cprog  cprog                 [k] sw_3_1             cprog                 [k] sw_3_1_1
>      2.43%    cprog  cprog                 [k] sw_3_1             cprog                 [k] sw_3_1_2
>      2.43%    cprog  cprog                 [k] sw_3_1             cprog                 [k] sw_3_1_3
>      2.43%    cprog  cprog                 [k] callme             cprog                 [k] sw_3_1
>      2.43%    cprog  cprog                 [k] callme             cprog                 [k] sw_4_2
>      2.43%    cprog  cprog                 [k] hw_1_1             cprog                 [k] symbol1
>      2.43%    cprog  cprog                 [k] callme             cprog                 [k] hw_1_1
>      2.42%    cprog  cprog                 [k] sw_3_1             cprog                 [k] callme
>      1.99%    cprog  cprog                 [k] success_3_1_1      cprog                 [k] sw_3_1
>      1.99%    cprog  cprog                 [k] sw_3_1             cprog                 [k] success_3_1_1
>      1.99%    cprog  cprog                 [k] address2           cprog                 [k] back2
>      1.99%    cprog  cprog                 [k] hw_2_2             cprog                 [k] address2
>      1.99%    cprog  cprog                 [k] back2              cprog                 [k] callme
>      1.99%    cprog  cprog                 [k] callme             cprog                 [k] main
>      1.99%    cprog  cprog                 [k] sw_3_1             cprog                 [k] success_3_1_3
>      1.99%    cprog  cprog                 [k] hw_1_1             cprog                 [k] callme
>      1.99%    cprog  cprog                 [k] sw_3_2             cprog                 [k] callme
>      1.99%    cprog  cprog                 [k] callme             cprog                 [k] sw_3_2
>      1.99%    cprog  cprog                 [k] success_3_1_2      cprog                 [k] sw_3_1
>      1.99%    cprog  cprog                 [k] sw_3_1             cprog                 [k] success_3_1_2
>      1.99%    cprog  cprog                 [k] hw_1_2             cprog                 [k] callme
>      1.99%    cprog  cprog                 [k] sw_4_1             cprog                 [k] callme
>      0.02%    cprog  [unknown]             [k] 0xf7ba2328         [unknown]             [k] 0xf7ba2320
>      0.00%    cprog  libc-2.11.2.so        [k] _IO_file_overflow  libc-2.11.2.so        [k] _IO_file_overflow
>      0.00%    cprog  libc-2.11.2.so        [k] _IO_file_xsputn    libc-2.11.2.so        [k] _IO_file_overflow
>      0.00%    cprog  cprog                 [k] callme             cprog                 [k] hw_2_2
>
> PMU filters
> -----------
> (2) perf record -e branch-misses:u -j any_call ./cprog
>
> # Overhead  Command  Source Shared Object            Source Symbol  Target Shared Object           Target Symbol
> # ........  .......  ....................  .......................  ....................  ......................
> #
>      7.82%    cprog  cprog                 [k] sw_3_1               cprog                 [k] success_3_1_2
>      6.88%    cprog  cprog                 [k] sw_3_1               cprog                 [k] sw_3_1_2
>      6.88%    cprog  cprog                 [k] hw_1_1               cprog                 [k] symbol1
>      5.88%    cprog  cprog                 [k] sw_3_1               cprog                 [k] sw_3_1_1
>      5.88%    cprog  cprog                 [k] callme               cprog                 [k] hw_1_1
>      5.88%    cprog  cprog                 [k] sw_3_1               cprog                 [k] success_3_1_1
>      5.88%    cprog  cprog                 [k] sw_3_1               cprog                 [k] sw_3_1_3
>      5.88%    cprog  cprog                 [k] callme               cprog                 [k] hw_1_2
>      5.88%    cprog  cprog                 [k] hw_1_2               cprog                 [k] symbol2
>      5.88%    cprog  cprog                 [k] sw_4_2               cprog                 [k] lr_addr
>      5.88%    cprog  cprog                 [k] callme               cprog                 [k] sw_4_2
>      4.88%    cprog  cprog                 [k] sw_3_1               cprog                 [k] success_3_1_3
>      4.88%    cprog  cprog                 [k] callme               cprog                 [k] sw_3_2
>      4.88%    cprog  cprog                 [k] callme               cprog                 [k] hw_2_2
>      3.94%    cprog  cprog                 [k] callme               cprog                 [k] sw_3_1
>      3.94%    cprog  cprog                 [k] callme               cprog                 [k] hw_2_1
>      2.94%    cprog  cprog                 [k] main                 cprog                 [k] callme
>      2.94%    cprog  cprog                 [k] sw_4_1               cprog                 [k] ctr_addr
>      2.94%    cprog  cprog                 [k] callme               cprog                 [k] sw_4_1
>      0.01%    cprog  [unknown]             [k] 0xf79076c4           [unknown]             [k] 0xf78f22c0
>      0.00%    cprog  libc-2.11.2.so        [k] _IO_file_doallocate  libc-2.11.2.so        [k] _IO_setb
>      0.00%    cprog  libc-2.11.2.so        [k] _IO_file_doallocate  libc-2.11.2.so        [k] mmap
>      0.00%    cprog  libc-2.11.2.so        [k] _IO_file_xsputn      libc-2.11.2.so        [k] _IO_default_xsputn
>      0.00%    cprog  libc-2.11.2.so        [k] _IO_file_overflow    libc-2.11.2.so        [k] _IO_do_write
>      0.00%    cprog  ld-2.11.2.so          [k] malloc               [unknown]             [k] 0xf790b380
>
>
> (3) perf record -e branch-misses:u -j cond ./cprog
> # Overhead  Command  Source Shared Object       Source Symbol  Target Shared Object            Target Symbol
> # ........  .......  ....................  ..................  ....................  .......................
> #
>     24.85%    cprog  [unknown]             [k] 00000000        cprog                 [k] callme
>     15.71%    cprog  cprog                 [k] sw_3_1          cprog                 [k] sw_3_1
>      7.14%    cprog  cprog                 [k] sw_4_2          cprog                 [k] lr_addr
>      6.57%    cprog  [unknown]             [k] 00000000        cprog                 [k] sw_4_2
>      4.57%    cprog  cprog                 [k] hw_2_2          cprog                 [k] callme
>      4.57%    cprog  cprog                 [k] sw_3_1_1        cprog                 [k] sw_3_1
>      4.57%    cprog  cprog                 [k] sw_4_1          cprog                 [k] ctr_addr
>      4.57%    cprog  [unknown]             [k] 00000000        cprog                 [k] sw_4_1
>      4.57%    cprog  cprog                 [k] main            cprog                 [k] hw_1_1
>      4.57%    cprog  cprog                 [k] hw_1_2          cprog                 [k] hw_1_2
>      4.57%    cprog  [unknown]             [k] 00000000        cprog                 [k] main
>      4.57%    cprog  cprog                 [k] hw_2_1          cprog                 [k] callme
>      4.57%    cprog  cprog                 [k] sw_3_1_3        cprog                 [k] sw_3_1
>      4.57%    cprog  cprog                 [k] sw_3_1_2        cprog                 [k] sw_3_1
>      0.01%    cprog  [unknown]             [k] 0xf7aa25dc      [unknown]             [k] 0xf7aa27e4
>      0.00%    cprog  libc-2.11.2.so        [k] _IO_doallocbuf  libc-2.11.2.so        [k] _IO_file_doallocate
>      0.00%    cprog  [unknown]             [k] 00000000        libc-2.11.2.so        [k] _IO_file_doallocate
>      0.00%    cprog  [unknown]             [k] 00000000        libc-2.11.2.so        [k] _IO_file_stat
>
> SW filters
> ----------
> (4) perf record -e branch-misses:u -j any_ret ./cprog
> # Overhead  Command  Source Shared Object      Source Symbol  Target Shared Object   Target Symbol
> # ........  .......  ....................  .................  ....................  ..............
> #
>      7.91%    cprog  cprog                 [k] symbol1        cprog                 [k] hw_1_1
>      7.91%    cprog  cprog                 [k] success_3_1_3  cprog                 [k] sw_3_1
>      7.91%    cprog  cprog                 [k] ctr_addr       cprog                 [k] sw_4_1
>      7.91%    cprog  cprog                 [k] lr_addr        cprog                 [k] sw_4_2
>      7.91%    cprog  cprog                 [k] symbol2        cprog                 [k] hw_1_2
>      7.90%    cprog  cprog                 [k] sw_4_2         cprog                 [k] callme
>      4.34%    cprog  cprog                 [k] success_3_1_2  cprog                 [k] sw_3_1
>      4.33%    cprog  cprog                 [k] sw_4_1         cprog                 [k] callme
>      4.33%    cprog  cprog                 [k] hw_1_2         cprog                 [k] callme
>      4.33%    cprog  cprog                 [k] success_3_1_1  cprog                 [k] sw_3_1
>      4.33%    cprog  cprog                 [k] sw_3_2         cprog                 [k] callme
>      4.33%    cprog  cprog                 [k] back2          cprog                 [k] callme
>      4.33%    cprog  cprog                 [k] callme         cprog                 [k] main
>      4.33%    cprog  cprog                 [k] hw_1_1         cprog                 [k] callme
>      3.58%    cprog  cprog                 [k] sw_3_1         cprog                 [k] callme
>      3.58%    cprog  cprog                 [k] sw_3_1_1       cprog                 [k] sw_3_1
>      3.58%    cprog  cprog                 [k] sw_3_1_2       cprog                 [k] sw_3_1
>      3.58%    cprog  cprog                 [k] back1          cprog                 [k] callme
>      3.57%    cprog  cprog                 [k] sw_3_1_3       cprog                 [k] sw_3_1
>      0.00%    cprog  [unknown]             [k] 0xf7abacf4     [unknown]             [k] 0xf7abae40
>
>
> (5) perf record -e branch-misses:u -j ind_call ./cprog
> # Overhead  Command  Source Shared Object  Source Symbol  Target Shared Object  Target Symbol
> # ........  .......  ....................  .............  ....................  .............
> #
>     63.56%    cprog  cprog                 [k] sw_4_2     cprog                 [k] lr_addr
>     36.44%    cprog  cprog                 [k] sw_4_1     cprog                 [k] ctr_addr
>
>
> Mixed filters
> -------------
> (6) perf record -e branch-misses:u -j any_call,any_ret ./cprog
> Error:
> The perf.data file has no samples!
>
> NOTE: As expected. The HW filters all the branches which are calls and SW tries to find return
> branches in that given set. Both the filters are mutually exclussive, so obviously no samples
> found in the end profile.
>
> (7) perf record -e branch-misses:u -j any_call,ind_call ./cprog
> # Overhead  Command  Source Shared Object   Source Symbol  Target Shared Object   Target Symbol
> # ........  .......  ....................  ..............  ....................  ..............
> #
>     66.69%    cprog  cprog                 [k] sw_4_2      cprog                 [k] lr_addr
>     33.31%    cprog  cprog                 [k] sw_4_1      cprog                 [k] ctr_addr
>      0.00%    cprog  [unknown]             [k] 0x0fe7f264  [unknown]             [k] 0x0ff926d0
>
>
> (8) perf record -e branch-misses:u -j any_call,any_ret,ind_call ./cprog
> Error:
> The perf.data file has no samples!
>
> (9) perf record -e branch-misses:u -j cond,any_ret ./cprog
> # Overhead  Command  Source Shared Object   Source Symbol  Target Shared Object            Target Symbol
> # ........  .......  ....................  ..............  ....................  .......................
> #
>     46.01%    cprog  [unknown]             [k] 00000000    cprog                 [k] callme
>     13.54%    cprog  [unknown]             [k] 00000000    cprog                 [k] sw_4_2
>      8.18%    cprog  cprog                 [k] sw_3_1_2    cprog                 [k] sw_3_1
>      8.07%    cprog  [unknown]             [k] 00000000    cprog                 [k] main
>      8.07%    cprog  cprog                 [k] sw_3_1_1    cprog                 [k] sw_3_1
>      8.07%    cprog  cprog                 [k] sw_3_1_3    cprog                 [k] sw_3_1
>      8.07%    cprog  [unknown]             [k] 00000000    cprog                 [k] sw_4_1
>      0.00%    cprog  [unknown]             [k] 00000000    [unknown]             [k] 0xf7c1480c
>      0.00%    cprog  libc-2.11.2.so        [k] mmap        libc-2.11.2.so        [k] _IO_file_doallocate
>
> (10) perf record -e branch-misses:u -j cond,ind_call ./cprog
> # Overhead  Command  Source Shared Object   Source Symbol  Target Shared Object   Target Symbol
> # ........  .......  ....................  ..............  ....................  ..............
> #
>     48.11%    cprog  [unknown]             [k] 00000000    cprog                 [k] callme
>     13.52%    cprog  [unknown]             [k] 00000000    cprog                 [k] sw_4_2
>     12.42%    cprog  cprog                 [k] sw_4_2      cprog                 [k] lr_addr
>      8.65%    cprog  [unknown]             [k] 00000000    cprog                 [k] main
>      8.65%    cprog  cprog                 [k] sw_4_1      cprog                 [k] ctr_addr
>      8.65%    cprog  [unknown]             [k] 00000000    cprog                 [k] sw_4_1
>      0.00%    cprog  [unknown]             [k] 00000000    [unknown]             [k] 0xf7a4581c
>
>
> (11) perf record -e branch-misses:u -j cond,any_ret,ind_call ./cprog
> # Overhead  Command  Source Shared Object   Source Symbol  Target Shared Object      Target Symbol
> # ........  .......  ....................  ..............  ....................  .................
> #
>     45.91%    cprog  [unknown]             [k] 00000000    cprog                 [k] callme
>     13.26%    cprog  [unknown]             [k] 00000000    cprog                 [k] sw_4_2
>      8.17%    cprog  cprog                 [k] sw_3_1_3    cprog                 [k] sw_3_1
>      8.17%    cprog  [unknown]             [k] 00000000    cprog                 [k] sw_4_1
>      8.17%    cprog  cprog                 [k] sw_3_1_2    cprog                 [k] sw_3_1
>      8.17%    cprog  [unknown]             [k] 00000000    cprog                 [k] main
>      8.16%    cprog  cprog                 [k] sw_3_1_1    cprog                 [k] sw_3_1
>      0.00%    cprog  [unknown]             [k] 00000000    [unknown]             [k] 0xf7f87704
>      0.00%    cprog  [unknown]             [k] 00000000    libc-2.11.2.so        [k] _IO_file_sync
>
> Test application program
> ========================
> (1) Makefile:
> --------------------------------------------
> all: sample.o cprog of.cprog of.sample
>
> sample.o: sample.s
>         as -o sample.o sample.s
> cprog: cprog.c sample.o
>         gcc -o cprog cprog.c sample.o
> of.sample: sample.o
>         objdump -d sample.o > of.sample
> of.cprog: cprog
>         objdump -d cprog > of.cprog
> clean:
>         rm sample.o cprog of.sample of.cprog
> ---------------------------------------------
> (2) cprog.c
> ---------------------------------------------
> #include <stdio.h>
> #define LOOP_COUNT 100000
>
> extern void callme(void);
>
> int main(int argc, char *argv[])
> {
>         int i;
>         for(i = 0; i < LOOP_COUNT; i++)
>                 callme();
>
>         printf("end");
>         return 0;
> }
> ---------------------------------------------
> (3) sample.S
> ---------------------------------------------
> # r25, r26, r27 will be used as first level, second level
> # and third level stack for LR. Register r20, r21, r22, r23
> # r24 will be used for general programming purpose.
>
> .data
>
> msg:
>         .string "BHRB filter tests\n"
>         len = . - msg
> msg_1_1:
>         .string "Test: hw_1_1\n"
>         len_1_1 = 13
> msg_1_2:
>         .string "Test: hw_1_2\n"
>         len_1_2 = 13
> msg_2_1:
>         .string "Test: hw_2_1\n"
>         len_2_1 = 13
> msg_2_2:
>         .string "Test: hw_2_2\n"
>         len_2_2 = 13
> msg_3_1:
>         .string "Test: sw_3_1\n"
>         len_3_1 = 13
> msg_3_1_1:
>         .string "Test: sw_3_1_1\n"
>         len_3_1_1 = 15
> msg_3_1_2:
>         .string "Test: sw_3_1_2\n"
>         len_3_1_2 = 15
> msg_3_1_3:
>         .string "Test: sw_3_1_3\n"
>         len_3_1_3 = 15
> msg_3_2:
>         .string "Test: sw_3_2\n"
>         len_3_3 = 13
> msg_4_1:
>         .string "Test: sw_4_1\n"
>         len_4_1 = 13
> msg_4_2:
>         .string "Test: sw_4_2\n"
>         len_4_2 = 13
>
> hw_3_1_1_passed:
>         .string "\thw_3_1_1_passed\n\n"
>         len_hw_3_1_1_passed = 18
> hw_3_1_2_passed:
>         .string "\thw_3_1_2_passed\n\n"
>         len_hw_3_1_2_passed = 18
> hw_3_1_3_passed:
>         .string "\thw_3_1_3_passed\n\n"
>         len_hw_3_1_3_passed = 18
>
> hw_2_1_passed:
>         .string "\thw_2_1_passed\n\n"
>         len_hw_2_1_passed = 16
>
> hw_2_2_passed:
>         .string "\thw_2_2_passed\n\n"
>         len_hw_2_2_passed = 16
>
> hw_1_1_passed:
>         .string "\thw_1_1_passed\n\n"
>         len_hw_1_1_passed = 16
>
> hw_1_2_passed:
>         .string "\thw_1_2_passed\n\n"
>         len_hw_1_2_passed = 16
>
> hw_4_1_passed:
>         .string "\thw_4_1_passed\n\n"
>         len_hw_4_1_passed = 16
>
> hw_4_2_passed:
>         .string "\thw_4_2_passed\n\n"
>         len_hw_4_2_passed = 16
>
> msg_error:
>         .string "\tError\n"
>         len_error = 7
> .text
>         .global callme
>         .global hw_1_1
>         .global hw_1_2
>         .global hw_2_1
>         .global hw_2_2
>
> # HW filter test symbols
> symbol1:
>         # Print "hw_1_1_passed"
>         li      0, 4
>         li      3, 1
>         lis     4, hw_1_1_passed@ha
>         addi    4, 4, hw_1_1_passed@l
>         li      5, len_hw_1_1_passed
>         sc
>
>         blr                             # PERF_SAMPLE_BRANCH_ANY_RET
>
> hw_1_1:
>         # Save LR - second level
>         mflr 26
>
>         # Print "hw_1_1 called"
>         li      0, 4
>         li      3, 1
>         lis     4, msg_1_1@ha
>         addi    4, 4, msg_1_1@l
>         li      5, len_1_1
>         sc
>
>         bl symbol1                      # PERF_SAMPLE_BRANCH_ANY_CALL
>
>         # Restore LR
>         mtlr 26
>         blr                             # PERF_SAMPLE_BRANCH_ANY_RET
>
> symbol2:
>         # Print "Symbol2 taken"
>         li      0, 4
>         li      3, 1
>         lis     4, hw_1_2_passed@ha
>         addi    4, 4, hw_1_2_passed@l
>         li      5, len_hw_1_2_passed
>         sc
>
>         blr                             # PERF_SAMPLE_BRANCH_ANY_RET
> hw_1_2:
>         # Save LR - second level
>         mflr 26
>
>         # Print "hw_1_2 called"
>         li      0, 4
>         li      3, 1
>         lis     4, msg_1_2@ha
>         addi    4, 4, msg_1_2@l
>         li      5, len_1_2
>         sc
>
>         li 4,20
>         cmpi 0,4,20
>         bcl 12, 4*cr0+2, symbol2        # PERF_SAMPLE_BRANCH_ANY_CALL | PERF_SAMPLE_BRANCH_COND
>
>         mtlr 26
>         blr                             # PERF_SAMPLE_BRANCH_ANY_RET
>
> # HW filter test
>
> address1:
>         # Print "hw_2_1_passed"
>         li      0, 4
>         li      3, 1
>         lis     4, hw_2_1_passed@ha
>         addi    4, 4, hw_2_1_passed@l
>         li      5, len_hw_2_1_passed
>         sc
>         b  back1                        # PERF_SAMPLE_BRANCH_ANY
>
> hw_2_1:
>         # Print "hw_2_1 called"
>         li      0, 4
>         li      3, 1
>         lis     4, msg_2_1@ha
>         addi    4, 4, msg_2_1@l
>         li      5, len_2_1
>         sc
>
>         # Simple conditional branch (equal)
>         li      20, 12
>         cmpi    3, 20, 12
>         bc      12, 4*cr3+2, address1   # PERF_SAMPLE_BRANCH_COND
>
> back1:
>         blr                             # PERF_SAMPLE_BRANCH_ANY_RET
>
> address2:
>         # Print "hw_2_2_passed"
>         li      0, 4
>         li      3, 1
>         lis     4, hw_2_2_passed@ha
>         addi    4, 4, hw_2_2_passed@l
>         li      5, len_hw_2_2_passed
>         sc
>         b  back2                        # PERF_SAMPLE_BRANCH_ANY
>
> hw_2_2:
>         # Print "hw_2_2 called"
>         li      0, 4
>         li      3, 1
>         lis     4, msg_2_2@ha
>         addi    4, 4, msg_2_2@l
>         li      5, len_2_2
>         sc
>
>         # Simple conditional branch (less than)
>         li      20, 12
>         cmpi    4, 20, 20
>         bc      12, 4*cr4+0, address2   # PERF_SAMPLE_BRANCH_COND
> back2:
>         blr                             # PERF_SAMPLE_BRANCH_ANY_RET
>
> # SW filter test symbols
> sw_3_1_1:
>         # Print "Test: sw_3_1_1"
>         li      0, 4
>         li      3, 1
>         lis     4, msg_3_1_1@ha
>         addi    4, 4, msg_3_1_1@l
>         li      5, len_3_1_1
>         sc
>
>         li      22,0
>         # Test the condition and return
>         li      21, 10
>         cmpi    0, 21, 10
>         bclr    12, 2                   # PERF_SAMPLE_BRANCH_ANY_RET | PERF_SAMPLE_BRANCH_COND
>
>         # Should not have come here
>         li      0, 4
>         li      3, 1
>         lis     4, msg_error@ha
>         addi    4, 4, msg_error@l
>         li      5, len_error
>         sc
>
>         # Mark the error
>         li      22, 1
>
>         # Safe fall back
>         blr                             # PERF_SAMPLE_BRANCH_ANY_RET
>
> sw_3_1_2:
>         # Print "Test: sw_3_1_2"
>         li      0, 4
>         li      3, 1
>         lis     4, msg_3_1_2@ha
>         addi    4, 4, msg_3_1_2@l
>         li      5, len_3_1_2
>         sc
>
>         li      23, 0
>         # Test the condition and return
>         li      21, 10
>         cmpi    0, 21, 20
>         bclr    12, 0                   # PERF_SAMPLE_BRANCH_ANY_RET | PERF_SAMPLE_BRANCH_COND
>
>         # Should not have come here
>         li      0, 4
>         li      3, 1
>         lis     4, msg_error@ha
>         addi    4, 4, msg_error@l
>         li      5, len_error
>         sc
>
>         # Mark the error
>         li      23, 1
>
>         # Safe fall back
>         blr                             # PERF_SAMPLE_BRANCH_ANY_RET
>
> sw_3_1_3:
>         # Print "Test: sw_3_1_3"
>         li      0, 4
>         li      3, 1
>         lis     4, msg_3_1_3@ha
>         addi    4, 4, msg_3_1_3@l
>         li      5, len_3_1_3
>         sc
>
>         li      24, 0
>         # Test the condition and return
>         li      21, 10
>         cmpi    0, 21, 5
>         bclr    12, 1                   # PERF_SAMPLE_BRANCH_ANY_RET | PERF_SAMPLE_BRANCH_COND
>
>         # Mark the error
>         li      24, 1
>
>         # Should not have come here
>         li      0, 4
>         li      3, 1
>         lis     4, msg_error@ha
>         addi    4, 4, msg_error@l
>         li      5, len_error
>         sc
>
>         # Safe fall back
>         blr                             # PERF_SAMPLE_BRANCH_ANY_RET
>
> success_3_1_1:
>         li      0, 4
>         li      3, 1
>         lis     4, hw_3_1_1_passed@ha
>         addi    4, 4, hw_3_1_1_passed@l
>         li      5, len_hw_3_1_1_passed
>         sc
>         blr
>
> success_3_1_2:
>         li      0, 4
>         li      3, 1
>         lis     4, hw_3_1_2_passed@ha
>         addi    4, 4, hw_3_1_2_passed@l
>         li      5, len_hw_3_1_2_passed
>         sc
>         blr
>
> success_3_1_3:
>         li      0, 4
>         li      3, 1
>         lis     4, hw_3_1_3_passed@ha
>         addi    4, 4, hw_3_1_3_passed@l
>         li      5, len_hw_3_1_3_passed
>         sc
>         blr
>
> sw_3_1:
>         # Save LR
>         mflr 26
>
>         # Print "Test: sw_3_1"
>         li      0, 4
>         li      3, 1
>         lis     4, msg_3_1@ha
>         addi    4, 4, msg_3_1@l
>         li      5, len_3_1
>         sc
>
>         # Equal comparison condition
>         bl sw_3_1_1                     # PERF_SAMPLE_BRANCH_ANY_CALL
>         cmpi    0, 22, 0
>         bcl     12, 2, success_3_1_1    # PERF_SAMPLE_BRANCH_ANY_CALL | PERF_SAMPLE_BRANCH_COND
>
>         # LT comparison condition
>         bl sw_3_1_2                     # PERF_SAMPLE_BRANCH_ANY_CALL
>         cmpi    0, 23, 0
>         bcl     12, 2, success_3_1_2    # PERF_SAMPLE_BRANCH_ANY_CALL | PERF_SAMPLE_BRANCH_COND
>
>         # GT comparison condition
>         bl sw_3_1_3                     # PERF_SAMPLE_BRANCH_ANY_CALL
>         cmpi    0, 24, 0
>         bcl     12, 2, success_3_1_3    # PERF_SAMPLE_BRANCH_ANY_CALL | PERF_SAMPLE_BRANCH_COND
>
>         mtlr 26
>         blr                             # PERF_SAMPLE_BRANCH_ANY_RET
> sw_3_2:
>         # Print "Test: sw_3_2"
>         li      0, 4
>         li      3, 1
>         lis     4, msg_3_2@ha
>         addi    4, 4, msg_3_2@l
>         li      5, len_3_1
>         sc
>
>         # FIXME: Anything more here ?
>         blr                             # PERF_SAMPLE_BRANCH_ANY_RET
>
> # Indirect call tests
>
> # CTR
> ctr_addr:
>         # Print "bcctr taken"
>         li      0, 4
>         li      3, 1
>         lis     4, hw_4_1_passed@ha
>         addi    4, 4, hw_4_1_passed@l
>         li      5, len_hw_4_1_passed
>         sc
>
>         blr                             # PERF_SAMPLE_BRANCH_ANY_RET
> sw_4_1:
>         # Save LR
>         mflr    26
>
>         # Print "sw_4_1 called"
>         li      0, 4
>         li      3, 1
>         lis     4, msg_4_1@ha
>         addi    4, 4, msg_4_1@l
>         li      5, len_4_1
>         sc
>
>         # Save address in CTR
>         lis     20, ctr_addr@ha
>         addi    20, 20, ctr_addr@l
>         mtctr   20
>
>
>         # Compare and jump to CTR
>         li      21, 10
>         cmpi    0, 21, 10
>         bcctrl  12, 4*cr0+2             # PERF_SAMPLE_BRANCH_IND_CALL
>
>         mtlr    26
>         blr                             # PERF_SAMPLE_BRANCH_ANY_RET
> # LR
> lr_addr:
>         # Print "bclrl taken"
>         li      0, 4
>         li      3, 1
>         lis     4, hw_4_2_passed@ha
>         addi    4, 4, hw_4_2_passed@l
>         li      5, len_hw_4_2_passed
>         sc
>
>         blr                             # PERF_SAMPLE_BRANCH_ANY_RET
>
> sw_4_2:
>         # Save LR
>         mflr    26
>
>         # Print "Test: sw_4_2"
>         li      0, 4
>         li      3, 1
>         lis     4, msg_4_2@ha
>         addi    4, 4, msg_4_2@l
>         li      5, len_4_2
>         sc
>
>         # Save address in LR
>         lis     20, lr_addr@ha
>         addi    20, 20, lr_addr@l
>         mtlr    20
>
>
>         # Compare and jump to CTR
>         li      21, 10
>         cmpi    0, 21, 10
>         bclrl   12, 4*cr0+2             # PERF_SAMPLE_BRANCH_IND_CALL
>
>         # Restore LR
>         mtlr    26
>         blr                             # PERF_SAMPLE_BRANCH_ANY_RET
>
> callme:
>         # Save LR
>         mflr    25
>
>         # Print "Branch filter Test"
>         li      0, 4
>         li      3, 1
>         lis     4, msg@ha
>         addi    4, 4, msg@l
>         li      5, len
>         sc
>
>         # PERF_SAMPLE_BRANCH_ANY_CALL
>         bl hw_1_1                       # PERF_SAMPLE_BRANCH_ANY_CALL
>         bl hw_1_2                       # PERF_SAMPLE_BRANCH_ANY_CALL
>         # PERF_SAMPLE_BRANCH_COND
>         bl hw_2_1                       # PERF_SAMPLE_BRANCH_ANY_CALL
>         bl hw_2_2                       # PERF_SAMPLE_BRANCH_ANY_CALL
>
>         # PERF_SAMPLE_BRANCH_ANY_RET
>         bl sw_3_1                       # PERF_SAMPLE_BRANCH_ANY_CALL
>         bl sw_3_2                       # PERF_SAMPLE_BRANCH_ANY_CALL
>         # PERF_SAMPLE_BRANCH_IND_CALL
>         bl sw_4_1                       # PERF_SAMPLE_BRANCH_ANY_CALL
>         bl sw_4_2                       # PERF_SAMPLE_BRANCH_ANY_CALL
>
>         # Restore LR
>         mtlr 25
>         blr                             # PERF_SAMPLE_BRANCH_ANY_RET
> --------------------------------------------------------------------
>
> Changes in V2
> --------------
> (1) Enabled PPC64 SW branch filtering support
> (2) Incorporated changes required for all previous comments
>
> Anshuman Khandual (6):
>   perf: New conditional branch filter criteria in branch stack sampling
>   powerpc, perf: Enable conditional branch filter for POWER8
>   perf, tool: Conditional branch filter 'cond' added to perf record
>   x86, perf: Add conditional branch filtering support
>   perf, documentation: Description for conditional branch filter
>   powerpc, perf: Enable SW filtering in branch stack sampling framework
>
>  arch/powerpc/include/asm/perf_event_server.h |   2 +-
>  arch/powerpc/perf/core-book3s.c              | 200 +++++++++++++++++++++++++--
>  arch/powerpc/perf/power8-pmu.c               |  25 ++--
>  arch/x86/kernel/cpu/perf_event_intel_lbr.c   |   5 +
>  include/uapi/linux/perf_event.h              |   3 +-
>  tools/perf/Documentation/perf-record.txt     |   3 +-
>  tools/perf/builtin-record.c                  |   1 +
>  7 files changed, 216 insertions(+), 23 deletions(-)
>
> --
> 1.7.11.7
>

^ permalink raw reply

* [PATCH v9 3/3] DMA: Freescale: update driver to support 8-channel DMA engine
From: hongbo.zhang @ 2013-08-30 11:26 UTC (permalink / raw)
  To: rob.herring, pawel.moll, mark.rutland, swarren, ian.campbell,
	vinod.koul, djbw
  Cc: Hongbo Zhang, devicetree, linuxppc-dev, linux-kernel
In-Reply-To: <1377861980-7027-1-git-send-email-hongbo.zhang@freescale.com>

From: Hongbo Zhang <hongbo.zhang@freescale.com>

This patch adds support to 8-channel DMA engine, thus the driver works for both
the new 8-channel and the legacy 4-channel DMA engines.

Signed-off-by: Hongbo Zhang <hongbo.zhang@freescale.com>
---
 drivers/dma/Kconfig  |    9 +++++----
 drivers/dma/fsldma.c |    9 ++++++---
 drivers/dma/fsldma.h |    2 +-
 3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index 6825957..3979c65 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -89,14 +89,15 @@ config AT_HDMAC
 	  Support the Atmel AHB DMA controller.
 
 config FSL_DMA
-	tristate "Freescale Elo and Elo Plus DMA support"
+	tristate "Freescale Elo series DMA support"
 	depends on FSL_SOC
 	select DMA_ENGINE
 	select ASYNC_TX_ENABLE_CHANNEL_SWITCH
 	---help---
-	  Enable support for the Freescale Elo and Elo Plus DMA controllers.
-	  The Elo is the DMA controller on some 82xx and 83xx parts, and the
-	  Elo Plus is the DMA controller on 85xx and 86xx parts.
+	  Enable support for the Freescale Elo series DMA controllers.
+	  The Elo is the DMA controller on some mpc82xx and mpc83xx parts, the
+	  EloPlus is on mpc85xx and mpc86xx and Pxxx parts, and the Elo3 is on
+	  some Txxx and Bxxx parts.
 
 config MPC512X_DMA
 	tristate "Freescale MPC512x built-in DMA engine support"
diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 49e8fbd..16a9a48 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -1261,7 +1261,9 @@ static int fsl_dma_chan_probe(struct fsldma_device *fdev,
 	WARN_ON(fdev->feature != chan->feature);
 
 	chan->dev = fdev->dev;
-	chan->id = ((res.start - 0x100) & 0xfff) >> 7;
+	chan->id = (res.start & 0xfff) < 0x300 ?
+		   ((res.start - 0x100) & 0xfff) >> 7 :
+		   ((res.start - 0x200) & 0xfff) >> 7;
 	if (chan->id >= FSL_DMA_MAX_CHANS_PER_DEVICE) {
 		dev_err(fdev->dev, "too many channels for device\n");
 		err = -EINVAL;
@@ -1434,6 +1436,7 @@ static int fsldma_of_remove(struct platform_device *op)
 }
 
 static const struct of_device_id fsldma_of_ids[] = {
+	{ .compatible = "fsl,elo3-dma", },
 	{ .compatible = "fsl,eloplus-dma", },
 	{ .compatible = "fsl,elo-dma", },
 	{}
@@ -1455,7 +1458,7 @@ static struct platform_driver fsldma_of_driver = {
 
 static __init int fsldma_init(void)
 {
-	pr_info("Freescale Elo / Elo Plus DMA driver\n");
+	pr_info("Freescale Elo series DMA driver\n");
 	return platform_driver_register(&fsldma_of_driver);
 }
 
@@ -1467,5 +1470,5 @@ static void __exit fsldma_exit(void)
 subsys_initcall(fsldma_init);
 module_exit(fsldma_exit);
 
-MODULE_DESCRIPTION("Freescale Elo / Elo Plus DMA driver");
+MODULE_DESCRIPTION("Freescale Elo series DMA driver");
 MODULE_LICENSE("GPL");
diff --git a/drivers/dma/fsldma.h b/drivers/dma/fsldma.h
index f5c3879..1ffc244 100644
--- a/drivers/dma/fsldma.h
+++ b/drivers/dma/fsldma.h
@@ -112,7 +112,7 @@ struct fsldma_chan_regs {
 };
 
 struct fsldma_chan;
-#define FSL_DMA_MAX_CHANS_PER_DEVICE 4
+#define FSL_DMA_MAX_CHANS_PER_DEVICE 8
 
 struct fsldma_device {
 	void __iomem *regs;	/* DGSR register base */
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH v9 2/3] DMA: Freescale: Add new 8-channel DMA engine device tree nodes
From: hongbo.zhang @ 2013-08-30 11:26 UTC (permalink / raw)
  To: rob.herring, pawel.moll, mark.rutland, swarren, ian.campbell,
	vinod.koul, djbw
  Cc: Hongbo Zhang, devicetree, linuxppc-dev, linux-kernel
In-Reply-To: <1377861980-7027-1-git-send-email-hongbo.zhang@freescale.com>

From: Hongbo Zhang <hongbo.zhang@freescale.com>

Freescale QorIQ T4 and B4 introduce new 8-channel DMA engines, this patch adds
the device tree nodes for them.

Signed-off-by: Hongbo Zhang <hongbo.zhang@freescale.com>
---
 .../devicetree/bindings/powerpc/fsl/dma.txt        |   67 ++++++++++++++++
 arch/powerpc/boot/dts/fsl/b4si-post.dtsi           |    4 +-
 arch/powerpc/boot/dts/fsl/elo3-dma-0.dtsi          |   82 ++++++++++++++++++++
 arch/powerpc/boot/dts/fsl/elo3-dma-1.dtsi          |   82 ++++++++++++++++++++
 arch/powerpc/boot/dts/fsl/t4240si-post.dtsi        |    4 +-
 5 files changed, 235 insertions(+), 4 deletions(-)
 create mode 100644 arch/powerpc/boot/dts/fsl/elo3-dma-0.dtsi
 create mode 100644 arch/powerpc/boot/dts/fsl/elo3-dma-1.dtsi

diff --git a/Documentation/devicetree/bindings/powerpc/fsl/dma.txt b/Documentation/devicetree/bindings/powerpc/fsl/dma.txt
index ddf17af..332ac77 100644
--- a/Documentation/devicetree/bindings/powerpc/fsl/dma.txt
+++ b/Documentation/devicetree/bindings/powerpc/fsl/dma.txt
@@ -126,6 +126,73 @@ Example:
 		};
 	};
 
+** Freescale Elo3 DMA Controller
+   This is EloPlus controller with 8 channels, used in Freescale Txxx and Bxxx
+   series chips, such as t1040, t4240, b4860.
+
+Required properties:
+
+- compatible        : must include "fsl,elo3-dma"
+- reg               : <registers specifier for DMA general status reg>
+- ranges            : describes the mapping between the address space of the
+                      DMA channels and the address space of the DMA controller
+
+- DMA channel nodes:
+        - compatible        : must include "fsl,eloplus-dma-channel"
+        - reg               : <registers specifier for channel>
+        - interrupts        : <interrupt specifier for DMA channel IRQ>
+        - interrupt-parent  : optional, if needed for interrupt mapping
+
+Example:
+dma@100300 {
+	#address-cells = <1>;
+	#size-cells = <1>;
+	compatible = "fsl,elo3-dma";
+	reg = <0x100300 0x4>,
+	      <0x100600 0x4>;
+	ranges = <0x0 0x100100 0x500>;
+	dma-channel@0 {
+		compatible = "fsl,eloplus-dma-channel";
+		reg = <0x0 0x80>;
+		interrupts = <28 2 0 0>;
+	};
+	dma-channel@80 {
+		compatible = "fsl,eloplus-dma-channel";
+		reg = <0x80 0x80>;
+		interrupts = <29 2 0 0>;
+	};
+	dma-channel@100 {
+		compatible = "fsl,eloplus-dma-channel";
+		reg = <0x100 0x80>;
+		interrupts = <30 2 0 0>;
+	};
+	dma-channel@180 {
+		compatible = "fsl,eloplus-dma-channel";
+		reg = <0x180 0x80>;
+		interrupts = <31 2 0 0>;
+	};
+	dma-channel@300 {
+		compatible = "fsl,eloplus-dma-channel";
+		reg = <0x300 0x80>;
+		interrupts = <76 2 0 0>;
+	};
+	dma-channel@380 {
+		compatible = "fsl,eloplus-dma-channel";
+		reg = <0x380 0x80>;
+		interrupts = <77 2 0 0>;
+	};
+	dma-channel@400 {
+		compatible = "fsl,eloplus-dma-channel";
+		reg = <0x400 0x80>;
+		interrupts = <78 2 0 0>;
+	};
+	dma-channel@480 {
+		compatible = "fsl,eloplus-dma-channel";
+		reg = <0x480 0x80>;
+		interrupts = <79 2 0 0>;
+	};
+};
+
 Note on DMA channel compatible properties: The compatible property must say
 "fsl,elo-dma-channel" or "fsl,eloplus-dma-channel" to be used by the Elo DMA
 driver (fsldma).  Any DMA channel used by fsldma cannot be used by another
diff --git a/arch/powerpc/boot/dts/fsl/b4si-post.dtsi b/arch/powerpc/boot/dts/fsl/b4si-post.dtsi
index 7399154..ea53ea1 100644
--- a/arch/powerpc/boot/dts/fsl/b4si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/b4si-post.dtsi
@@ -223,13 +223,13 @@
 		reg = <0xe2000 0x1000>;
 	};
 
-/include/ "qoriq-dma-0.dtsi"
+/include/ "elo3-dma-0.dtsi"
 	dma@100300 {
 		fsl,iommu-parent = <&pamu0>;
 		fsl,liodn-reg = <&guts 0x580>; /* DMA1LIODNR */
 	};
 
-/include/ "qoriq-dma-1.dtsi"
+/include/ "elo3-dma-1.dtsi"
 	dma@101300 {
 		fsl,iommu-parent = <&pamu0>;
 		fsl,liodn-reg = <&guts 0x584>; /* DMA2LIODNR */
diff --git a/arch/powerpc/boot/dts/fsl/elo3-dma-0.dtsi b/arch/powerpc/boot/dts/fsl/elo3-dma-0.dtsi
new file mode 100644
index 0000000..3c210e0
--- /dev/null
+++ b/arch/powerpc/boot/dts/fsl/elo3-dma-0.dtsi
@@ -0,0 +1,82 @@
+/*
+ * QorIQ Elo3 DMA device tree stub [ controller @ offset 0x100000 ]
+ *
+ * Copyright 2013 Freescale Semiconductor Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in the
+ *       documentation and/or other materials provided with the distribution.
+ *     * Neither the name of Freescale Semiconductor nor the
+ *       names of its contributors may be used to endorse or promote products
+ *       derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+dma0: dma@100300 {
+	#address-cells = <1>;
+	#size-cells = <1>;
+	compatible = "fsl,elo3-dma";
+	reg = <0x100300 0x4>,
+	      <0x100600 0x4>;
+	ranges = <0x0 0x100100 0x500>;
+	dma-channel@0 {
+		compatible = "fsl,eloplus-dma-channel";
+		reg = <0x0 0x80>;
+		interrupts = <28 2 0 0>;
+	};
+	dma-channel@80 {
+		compatible = "fsl,eloplus-dma-channel";
+		reg = <0x80 0x80>;
+		interrupts = <29 2 0 0>;
+	};
+	dma-channel@100 {
+		compatible = "fsl,eloplus-dma-channel";
+		reg = <0x100 0x80>;
+		interrupts = <30 2 0 0>;
+	};
+	dma-channel@180 {
+		compatible = "fsl,eloplus-dma-channel";
+		reg = <0x180 0x80>;
+		interrupts = <31 2 0 0>;
+	};
+	dma-channel@300 {
+		compatible = "fsl,eloplus-dma-channel";
+		reg = <0x300 0x80>;
+		interrupts = <76 2 0 0>;
+	};
+	dma-channel@380 {
+		compatible = "fsl,eloplus-dma-channel";
+		reg = <0x380 0x80>;
+		interrupts = <77 2 0 0>;
+	};
+	dma-channel@400 {
+		compatible = "fsl,eloplus-dma-channel";
+		reg = <0x400 0x80>;
+		interrupts = <78 2 0 0>;
+	};
+	dma-channel@480 {
+		compatible = "fsl,eloplus-dma-channel";
+		reg = <0x480 0x80>;
+		interrupts = <79 2 0 0>;
+	};
+};
diff --git a/arch/powerpc/boot/dts/fsl/elo3-dma-1.dtsi b/arch/powerpc/boot/dts/fsl/elo3-dma-1.dtsi
new file mode 100644
index 0000000..cccf3bb
--- /dev/null
+++ b/arch/powerpc/boot/dts/fsl/elo3-dma-1.dtsi
@@ -0,0 +1,82 @@
+/*
+ * QorIQ Elo3 DMA device tree stub [ controller @ offset 0x101000 ]
+ *
+ * Copyright 2013 Freescale Semiconductor Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in the
+ *       documentation and/or other materials provided with the distribution.
+ *     * Neither the name of Freescale Semiconductor nor the
+ *       names of its contributors may be used to endorse or promote products
+ *       derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+dma1: dma@101300 {
+	#address-cells = <1>;
+	#size-cells = <1>;
+	compatible = "fsl,elo3-dma";
+	reg = <0x101300 0x4>,
+	      <0x101600 0x4>;
+	ranges = <0x0 0x101100 0x500>;
+	dma-channel@0 {
+		compatible = "fsl,eloplus-dma-channel";
+		reg = <0x0 0x80>;
+		interrupts = <32 2 0 0>;
+	};
+	dma-channel@80 {
+		compatible = "fsl,eloplus-dma-channel";
+		reg = <0x80 0x80>;
+		interrupts = <33 2 0 0>;
+	};
+	dma-channel@100 {
+		compatible = "fsl,eloplus-dma-channel";
+		reg = <0x100 0x80>;
+		interrupts = <34 2 0 0>;
+	};
+	dma-channel@180 {
+		compatible = "fsl,eloplus-dma-channel";
+		reg = <0x180 0x80>;
+		interrupts = <35 2 0 0>;
+	};
+	dma-channel@300 {
+		compatible = "fsl,eloplus-dma-channel";
+		reg = <0x300 0x80>;
+		interrupts = <80 2 0 0>;
+	};
+	dma-channel@380 {
+		compatible = "fsl,eloplus-dma-channel";
+		reg = <0x380 0x80>;
+		interrupts = <81 2 0 0>;
+	};
+	dma-channel@400 {
+		compatible = "fsl,eloplus-dma-channel";
+		reg = <0x400 0x80>;
+		interrupts = <82 2 0 0>;
+	};
+	dma-channel@480 {
+		compatible = "fsl,eloplus-dma-channel";
+		reg = <0x480 0x80>;
+		interrupts = <83 2 0 0>;
+	};
+};
diff --git a/arch/powerpc/boot/dts/fsl/t4240si-post.dtsi b/arch/powerpc/boot/dts/fsl/t4240si-post.dtsi
index bd611a9..ec95c60 100644
--- a/arch/powerpc/boot/dts/fsl/t4240si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/t4240si-post.dtsi
@@ -387,8 +387,8 @@
 		reg	   = <0xea000 0x4000>;
 	};
 
-/include/ "qoriq-dma-0.dtsi"
-/include/ "qoriq-dma-1.dtsi"
+/include/ "elo3-dma-0.dtsi"
+/include/ "elo3-dma-1.dtsi"
 
 /include/ "qoriq-espi-0.dtsi"
 	spi@110000 {
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH v9 1/3] DMA: Freescale: revise device tree binding document
From: hongbo.zhang @ 2013-08-30 11:26 UTC (permalink / raw)
  To: rob.herring, pawel.moll, mark.rutland, swarren, ian.campbell,
	vinod.koul, djbw
  Cc: Hongbo Zhang, devicetree, linuxppc-dev, linux-kernel
In-Reply-To: <1377861980-7027-1-git-send-email-hongbo.zhang@freescale.com>

From: Hongbo Zhang <hongbo.zhang@freescale.com>

This patch updates the discription of each type of DMA controller and its
channels, it is preparation for adding another new DMA controller binding, it
also fixes some defects of indent for text alignment at the same time.

Signed-off-by: Hongbo Zhang <hongbo.zhang@freescale.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
---
 .../devicetree/bindings/powerpc/fsl/dma.txt        |   62 +++++++++-----------
 1 file changed, 27 insertions(+), 35 deletions(-)

diff --git a/Documentation/devicetree/bindings/powerpc/fsl/dma.txt b/Documentation/devicetree/bindings/powerpc/fsl/dma.txt
index 2a4b4bc..ddf17af 100644
--- a/Documentation/devicetree/bindings/powerpc/fsl/dma.txt
+++ b/Documentation/devicetree/bindings/powerpc/fsl/dma.txt
@@ -1,33 +1,29 @@
-* Freescale 83xx DMA Controller
+* Freescale DMA Controllers
 
-Freescale PowerPC 83xx have on chip general purpose DMA controllers.
+** Freescale Elo DMA Controller
+   This is a little-endian DMA controller, used in Freescale mpc83xx series
+   chips such as mpc8315, mpc8349, mpc8379 etc.
 
 Required properties:
 
-- compatible        : compatible list, contains 2 entries, first is
-		 "fsl,CHIP-dma", where CHIP is the processor
-		 (mpc8349, mpc8360, etc.) and the second is
-		 "fsl,elo-dma"
-- reg               : <registers mapping for DMA general status reg>
-- ranges		: Should be defined as specified in 1) to describe the
-		  DMA controller channels.
+- compatible        : must include "fsl,elo-dma"
+- reg               : <registers specifier for DMA general status reg>
+- ranges            : describes the mapping between the address space of the
+                      DMA channels and the address space of the DMA controller
 - cell-index        : controller index.  0 for controller @ 0x8100
-- interrupts        : <interrupt mapping for DMA IRQ>
+- interrupts        : <interrupt specifier for DMA IRQ>
 - interrupt-parent  : optional, if needed for interrupt mapping
 
-
 - DMA channel nodes:
-        - compatible        : compatible list, contains 2 entries, first is
-			 "fsl,CHIP-dma-channel", where CHIP is the processor
-			 (mpc8349, mpc8350, etc.) and the second is
-			 "fsl,elo-dma-channel". However, see note below.
-        - reg               : <registers mapping for channel>
+        - compatible        : must include "fsl,elo-dma-channel"
+                              However, see note below.
+        - reg               : <registers specifier for channel>
         - cell-index        : dma channel index starts at 0.
 
 Optional properties:
-        - interrupts        : <interrupt mapping for DMA channel IRQ>
-			  (on 83xx this is expected to be identical to
-			   the interrupts property of the parent node)
+        - interrupts        : <interrupt specifier for DMA channel IRQ>
+                              (on 83xx this is expected to be identical to
+                              the interrupts property of the parent node)
         - interrupt-parent  : optional, if needed for interrupt mapping
 
 Example:
@@ -70,30 +66,26 @@ Example:
 		};
 	};
 
-* Freescale 85xx/86xx DMA Controller
-
-Freescale PowerPC 85xx/86xx have on chip general purpose DMA controllers.
+** Freescale EloPlus DMA Controller
+   This is DMA controller with extended addresses and chaining, mainly used in
+   Freescale mpc85xx/86xx, Pxxx and BSC series chips, such as mpc8540, mpc8641
+   p4080, bsc9131 etc.
 
 Required properties:
 
-- compatible        : compatible list, contains 2 entries, first is
-		 "fsl,CHIP-dma", where CHIP is the processor
-		 (mpc8540, mpc8540, etc.) and the second is
-		 "fsl,eloplus-dma"
-- reg               : <registers mapping for DMA general status reg>
+- compatible        : must include "fsl,eloplus-dma"
+- reg               : <registers specifier for DMA general status reg>
 - cell-index        : controller index.  0 for controller @ 0x21000,
                                          1 for controller @ 0xc000
-- ranges		: Should be defined as specified in 1) to describe the
-		  DMA controller channels.
+- ranges            : describes the mapping between the address space of the
+                      DMA channels and the address space of the DMA controller
 
 - DMA channel nodes:
-        - compatible        : compatible list, contains 2 entries, first is
-			 "fsl,CHIP-dma-channel", where CHIP is the processor
-			 (mpc8540, mpc8560, etc.) and the second is
-			 "fsl,eloplus-dma-channel". However, see note below.
+        - compatible        : must include "fsl,eloplus-dma-channel"
+                              However, see note below.
         - cell-index        : dma channel index starts at 0.
-        - reg               : <registers mapping for channel>
-        - interrupts        : <interrupt mapping for DMA channel IRQ>
+        - reg               : <registers specifier for channel>
+        - interrupts        : <interrupt specifier for DMA channel IRQ>
         - interrupt-parent  : optional, if needed for interrupt mapping
 
 Example:
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH v9 0/3] DMA: Freescale: Add support for 8-channel DMA engine
From: hongbo.zhang @ 2013-08-30 11:26 UTC (permalink / raw)
  To: rob.herring, pawel.moll, mark.rutland, swarren, ian.campbell,
	vinod.koul, djbw
  Cc: Hongbo Zhang, devicetree, linuxppc-dev, linux-kernel

From: Hongbo Zhang <hongbo.zhang@freescale.com>

Hi DMA and DT maintainers, please have a look at these V9 patches.

Freescale QorIQ T4 and B4 introduce new 8-channel DMA engines, this patch set
adds support this DMA engine.

V8->V9 changes:
- add "Acked-by: Mark Rutland <mark.rutland@arm.com>" into patch [1/3]
- update reg entry <0x100300 0x4 0x100600 0x4> to two seperate ones
  <0x100300 0x4>, <0x100600 0x4> in patch [2/3]
- and also use "QorIQ Elo3 DMA" to mention previous "QorIQ DMA" in [2/3]

V7->V8 changes:
- change the word "mapping" to "specifier" for reg and interrupts description

V6->V7 changes:
- only remove unnecessary "CHIP-dma" explanations in [1/3]

V5->V6 changes:
- minor updates of descriptions in binding document and Kconfig
- remove [4/4], that should be another patch in future

V4->V5 changes:
- update description in the dt binding document, to make it more resonable
- add new patch [4/4] to eliminate a compiling warning which already exists
  for a long time

V3->V4 changes:
- introduce new patch [1/3] to revise the legacy dma binding document
- and then add new paragraph to describe new dt node binding in [2/3]
- rebase to latest kernel v3.11-rc1

V2->V3 changes:
- edit Documentation/devicetree/bindings/powerpc/fsl/dma.txt
- edit text string in Kconfig and the driver files, using "elo series" to
  mention all the current "elo*"

V1->V2 changes:
- removed the codes handling the register dgsr1, since it isn't used currently
- renamed the DMA DT compatible to "fsl,elo3-dma"
- renamed the new dts files to "elo3-dma-<n>.dtsi"

Hongbo Zhang (3):
  DMA: Freescale: revise device tree binding document
  DMA: Freescale: Add new 8-channel DMA engine device tree nodes
  DMA: Freescale: update driver to support 8-channel DMA engine

 .../devicetree/bindings/powerpc/fsl/dma.txt        |  129 ++++++++++++++------
 arch/powerpc/boot/dts/fsl/b4si-post.dtsi           |    4 +-
 arch/powerpc/boot/dts/fsl/elo3-dma-0.dtsi          |   82 +++++++++++++
 arch/powerpc/boot/dts/fsl/elo3-dma-1.dtsi          |   82 +++++++++++++
 arch/powerpc/boot/dts/fsl/t4240si-post.dtsi        |    4 +-
 drivers/dma/Kconfig                                |    9 +-
 drivers/dma/fsldma.c                               |    9 +-
 drivers/dma/fsldma.h                               |    2 +-
 8 files changed, 274 insertions(+), 47 deletions(-)
 create mode 100644 arch/powerpc/boot/dts/fsl/elo3-dma-0.dtsi
 create mode 100644 arch/powerpc/boot/dts/fsl/elo3-dma-1.dtsi

-- 
1.7.9.5

^ permalink raw reply

* Re: [PATCH v9 00/13] KVM: PPC: IOMMU in-kernel handling of VFIO
From: Alexey Kardashevskiy @ 2013-08-30 10:26 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Gleb Natapov
  Cc: kvm, Alexander Graf, kvm-ppc, linux-kernel, linux-mm,
	Alex Williamson, Paul Mackerras, Paolo Bonzini, linuxppc-dev,
	David Gibson
In-Reply-To: <1377679070-3515-1-git-send-email-aik@ozlabs.ru>

On 08/28/2013 06:37 PM, Alexey Kardashevskiy wrote:
> This accelerates VFIO DMA operations on POWER by moving them
> into kernel.
> 
> This depends on VFIO external API patch which is on its way to upstream.
> 
> Changes:
> v9:
> * replaced the "link logical bus number to IOMMU group" ioctl to KVM
> with a KVM device doing the same thing, i.e. the actual changes are in
> these 3 patches:
>   KVM: PPC: reserve a capability and KVM device type for realmode VFIO
>   KVM: PPC: remove warning from kvmppc_core_destroy_vm
>   KVM: PPC: Add support for IOMMU in-kernel handling
> 
> * moved some VFIO external API bits to a separate patch to reduce the size
> of the "KVM: PPC: Add support for IOMMU in-kernel handling" patch
> 
> * fixed code style problems reported by checkpatch.pl.
> 
> v8:
> * fixed comments about capabilities numbers
> 
> v7:
> * rebased on v3.11-rc3.
> * VFIO external user API will go through VFIO tree so it is
> excluded from this series.
> * As nobody ever reacted on "hashtable: add hash_for_each_possible_rcu_notrace()",
> Ben suggested to push it via his tree so I included it to the series.
> * realmode_(get|put)_page is reworked.
> 
> More details in the individual patch comments.
> 
> Alexey Kardashevskiy (13):
>   KVM: PPC: POWERNV: move iommu_add_device earlier
>   hashtable: add hash_for_each_possible_rcu_notrace()
>   KVM: PPC: reserve a capability number for multitce support
>   KVM: PPC: reserve a capability and KVM device type for realmode VFIO


Hi Gleb!

Could you please review and pick (if they are ok) the two "capability"
patches from above?

It would be cool if you also looked at "KVM: PPC: Add support for IOMMU
in-kernel handling", the part about KVM device for SPAPR TCE IOMMU table.

Thanks!



>   powerpc: Prepare to support kernel handling of IOMMU map/unmap
>   powerpc: add real mode support for dma operations on powernv
>   KVM: PPC: enable IOMMU_API for KVM_BOOK3S_64 permanently
>   KVM: PPC: Add support for multiple-TCE hcalls
>   powerpc/iommu: rework to support realmode
>   KVM: PPC: remove warning from kvmppc_core_destroy_vm
>   KVM: PPC: add trampolines for VFIO external API
>   KVM: PPC: Add support for IOMMU in-kernel handling
>   KVM: PPC: Add hugepage support for IOMMU in-kernel handling
> 
>  Documentation/virtual/kvm/api.txt                  |  26 +
>  .../virtual/kvm/devices/spapr_tce_iommu.txt        |  37 ++
>  arch/powerpc/include/asm/iommu.h                   |  18 +-
>  arch/powerpc/include/asm/kvm_host.h                |  38 ++
>  arch/powerpc/include/asm/kvm_ppc.h                 |  16 +-
>  arch/powerpc/include/asm/machdep.h                 |  12 +
>  arch/powerpc/include/asm/pgtable-ppc64.h           |   2 +
>  arch/powerpc/include/uapi/asm/kvm.h                |   8 +
>  arch/powerpc/kernel/iommu.c                        | 243 +++++----
>  arch/powerpc/kvm/Kconfig                           |   1 +
>  arch/powerpc/kvm/book3s_64_vio.c                   | 597 ++++++++++++++++++++-
>  arch/powerpc/kvm/book3s_64_vio_hv.c                | 408 +++++++++++++-
>  arch/powerpc/kvm/book3s_hv.c                       |  42 +-
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S            |   8 +-
>  arch/powerpc/kvm/book3s_pr_papr.c                  |  35 ++
>  arch/powerpc/kvm/powerpc.c                         |   4 +
>  arch/powerpc/mm/init_64.c                          |  50 +-
>  arch/powerpc/platforms/powernv/pci-ioda.c          |  57 +-
>  arch/powerpc/platforms/powernv/pci-p5ioc2.c        |   2 +-
>  arch/powerpc/platforms/powernv/pci.c               |  75 ++-
>  arch/powerpc/platforms/powernv/pci.h               |   3 +-
>  arch/powerpc/platforms/pseries/iommu.c             |   8 +-
>  include/linux/hashtable.h                          |  15 +
>  include/linux/kvm_host.h                           |   1 +
>  include/linux/mm.h                                 |  14 +
>  include/linux/page-flags.h                         |   4 +-
>  include/uapi/linux/kvm.h                           |   3 +
>  virt/kvm/kvm_main.c                                |   5 +
>  28 files changed, 1564 insertions(+), 168 deletions(-)
>  create mode 100644 Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
> 


-- 
Alexey

^ permalink raw reply

* [PATCH V2 2/2] powerpc/85xx: Add TWR-P1025 board support
From: Xie Xiaobo @ 2013-08-30 10:01 UTC (permalink / raw)
  To: linuxppc-dev, galak, scottwood; +Cc: Michael Johnston, Xie Xiaobo
In-Reply-To: <1377856896-29244-1-git-send-email-X.Xie@freescale.com>

TWR-P1025 Overview
 -----------------
 512Mbyte DDR3 (on board DDR)
 64MB Nor Flash
 eTSEC1: Connected to RGMII PHY AR8035
 eTSEC3: Connected to RGMII PHY AR8035
 Two USB2.0 Type A
 One microSD Card slot
 One mini-PCIe slot
 One mini-USB TypeB dual UART

Signed-off-by: Michael Johnston <michael.johnston@freescale.com>
Signed-off-by: Xie Xiaobo <X.Xie@freescale.com>
---
Patch V2: QE related init codes were factored out to a common file

 arch/powerpc/boot/dts/p1025twr.dtsi     | 244 ++++++++++++++++++++++++++++++++
 arch/powerpc/boot/dts/p1025twr_32b.dts  | 135 ++++++++++++++++++
 arch/powerpc/platforms/85xx/Kconfig     |   6 +
 arch/powerpc/platforms/85xx/Makefile    |   1 +
 arch/powerpc/platforms/85xx/twr_p102x.c | 133 +++++++++++++++++
 5 files changed, 519 insertions(+)
 create mode 100644 arch/powerpc/boot/dts/p1025twr.dtsi
 create mode 100644 arch/powerpc/boot/dts/p1025twr_32b.dts
 create mode 100644 arch/powerpc/platforms/85xx/twr_p102x.c

diff --git a/arch/powerpc/boot/dts/p1025twr.dtsi b/arch/powerpc/boot/dts/p1025twr.dtsi
new file mode 100644
index 0000000..07df721
--- /dev/null
+++ b/arch/powerpc/boot/dts/p1025twr.dtsi
@@ -0,0 +1,244 @@
+/*
+ * P1025 TWR Device Tree Source stub (no addresses or top-level ranges)
+ *
+ * Copyright 2013 Freescale Semiconductor Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in the
+ *       documentation and/or other materials provided with the distribution.
+ *     * Neither the name of Freescale Semiconductor nor the
+ *       names of its contributors may be used to endorse or promote products
+ *       derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/{
+       aliases {
+		ethernet3 = &enet3;
+		ethernet4 = &enet4;
+       };
+};
+
+&lbc {
+	nor@0,0 {
+		#address-cells = <1>;
+		#size-cells = <1>;
+		compatible = "cfi-flash";
+		reg = <0x0 0x0 0x4000000>;
+		bank-width = <2>;
+		device-width = <1>;
+
+		partition@0 {
+			/* This location must not be altered  */
+			/* 256KB for Vitesse 7385 Switch firmware */
+			reg = <0x0 0x00040000>;
+			label = "NOR Vitesse-7385 Firmware";
+			read-only;
+		};
+
+		partition@40000 {
+			/* 256KB for DTB Image */
+			reg = <0x00040000 0x00040000>;
+			label = "NOR DTB Image";
+		};
+
+		partition@80000 {
+			/* 3.5 MB for Linux Kernel Image */
+			reg = <0x00080000 0x00380000>;
+			label = "NOR Linux Kernel Image";
+		};
+
+		partition@400000 {
+			/* 58.75MB for JFFS2 based Root file System */
+			reg = <0x00400000 0x03ac0000>;
+			label = "NOR Root File System";
+		};
+
+		partition@ec0000 {
+			/* This location must not be altered  */
+			/* 256KB for QE ucode firmware*/
+			reg = <0x03ec0000 0x00040000>;
+			label = "NOR QE microcode firmware";
+			read-only;
+		};
+
+		partition@f00000 {
+			/* This location must not be altered  */
+			/* 512KB for u-boot Bootloader Image */
+			/* 512KB for u-boot Environment Variables */
+			reg = <0x03f00000 0x00100000>;
+			label = "NOR U-Boot Image";
+			read-only;
+		};
+	};
+
+	/* CS2 for Display */
+	ssd1289@2,0 {
+		#address-cells = <1>;
+		#size-cells = <1>;
+		compatible = "ssd1289";
+		reg = <0x2 0x0000 0x0002
+		       0x2 0x0002 0x0002>;
+	};
+
+};
+
+&soc {
+	usb@22000 {
+		phy_type = "ulpi";
+	};
+
+	mdio@24000 {
+		phy0: ethernet-phy@2 {
+			interrupt-parent = <&mpic>;
+			interrupts = <1 1>;
+			reg = <0x2>;
+		};
+
+		phy1: ethernet-phy@1 {
+			interrupt-parent = <&mpic>;
+			interrupts = <2 1>;
+			reg = <0x1>;
+		};
+
+		tbi0: tbi-phy@11 {
+			reg = <0x11>;
+			device_type = "tbi-phy";
+		};
+	};
+
+	mdio@25000 {
+		tbi1: tbi-phy@11 {
+			reg = <0x11>;
+			device_type = "tbi-phy";
+		};
+	};
+
+	mdio@26000 {
+		tbi2: tbi-phy@11 {
+			reg = <0x11>;
+			device_type = "tbi-phy";
+		};
+	};
+
+	enet0: ethernet@b0000 {
+		phy-handle = <&phy0>;
+		phy-connection-type = "rgmii-id";
+
+	};
+
+	enet1: ethernet@b1000 {
+		status = "disabled";
+	};
+
+	enet2: ethernet@b2000 {
+		phy-handle = <&phy1>;
+		phy-connection-type = "rgmii-id";
+	};
+
+	par_io@e0100 {
+		#address-cells = <1>;
+		#size-cells = <1>;
+		reg = <0xe0100 0x60>;
+		ranges = <0x0 0xe0100 0x60>;
+		device_type = "par_io";
+		num-ports = <3>;
+		pio1: ucc_pin@01 {
+			pio-map = <
+		/* port  pin  dir  open_drain  assignment  has_irq */
+				0x1  0x13 0x1  0x0  0x1  0x0    /* QE_MUX_MDC */
+				0x1  0x14 0x3  0x0  0x1  0x0    /* QE_MUX_MDIO */
+				0x0  0x17 0x2  0x0  0x2  0x0    /* CLK12 */
+				0x0  0x18 0x2  0x0  0x1  0x0    /* CLK9 */
+				0x0  0x7  0x1  0x0  0x2  0x0    /* ENET1_TXD0_SER1_TXD0 */
+				0x0  0x9  0x1  0x0  0x2  0x0    /* ENET1_TXD1_SER1_TXD1 */
+				0x0  0xb  0x1  0x0  0x2  0x0    /* ENET1_TXD2_SER1_TXD2 */
+				0x0  0xc  0x1  0x0  0x2  0x0    /* ENET1_TXD3_SER1_TXD3 */
+				0x0  0x6  0x2  0x0  0x2  0x0    /* ENET1_RXD0_SER1_RXD0 */
+				0x0  0xa  0x2  0x0  0x2  0x0    /* ENET1_RXD1_SER1_RXD1 */
+				0x0  0xe  0x2  0x0  0x2  0x0    /* ENET1_RXD2_SER1_RXD2 */
+				0x0  0xf  0x2  0x0  0x2  0x0    /* ENET1_RXD3_SER1_RXD3 */
+				0x0  0x5  0x1  0x0  0x2  0x0    /* ENET1_TX_EN_SER1_RTS_B */
+				0x0  0xd  0x1  0x0  0x2  0x0    /* ENET1_TX_ER */
+				0x0  0x4  0x2  0x0  0x2  0x0    /* ENET1_RX_DV_SER1_CTS_B */
+				0x0  0x8  0x2  0x0  0x2  0x0    /* ENET1_RX_ER_SER1_CD_B */
+				0x0  0x11 0x2  0x0  0x2  0x0    /* ENET1_CRS */
+				0x0  0x10 0x2  0x0  0x2  0x0>;    /* ENET1_COL */
+		};
+
+		pio2: ucc_pin@02 {
+			pio-map = <
+		/* port  pin  dir  open_drain  assignment  has_irq */
+				0x1  0x13 0x1  0x0  0x1  0x0    /* QE_MUX_MDC */
+				0x1  0x14 0x3  0x0  0x1  0x0    /* QE_MUX_MDIO */
+				0x1  0xb  0x2  0x0  0x1  0x0    /* CLK13 */
+				0x1  0x7  0x1  0x0  0x2  0x0    /* ENET5_TXD0_SER5_TXD0 */
+				0x1  0xa  0x1  0x0  0x2  0x0    /* ENET5_TXD1_SER5_TXD1 */
+				0x1  0x6  0x2  0x0  0x2  0x0    /* ENET5_RXD0_SER5_RXD0 */
+				0x1  0x9  0x2  0x0  0x2  0x0    /* ENET5_RXD1_SER5_RXD1 */
+				0x1  0x5  0x1  0x0  0x2  0x0    /* ENET5_TX_EN_SER5_RTS_B */
+				0x1  0x4  0x2  0x0  0x2  0x0    /* ENET5_RX_DV_SER5_CTS_B */
+				0x1  0x8  0x2  0x0  0x2  0x0>;    /* ENET5_RX_ER_SER5_CD_B */
+		};
+
+		pio3: ucc_pin@03 {
+			pio-map = <
+		/* port  pin  dir  open_drain  assignment  has_irq */
+				0x0  0x16 0x2  0x0  0x2  0x0    /* SER7_CD_B*/
+				0x0  0x12 0x2  0x0  0x2  0x0    /* SER7_CTS_B*/
+				0x0  0x13 0x1  0x0  0x2  0x0    /* SER7_RTS_B*/
+				0x0  0x14 0x2  0x0  0x2  0x0    /* SER7_RXD0*/
+				0x0  0x15 0x1  0x0  0x2  0x0>;    /* SER7_TXD0*/
+		};
+
+		pio4: ucc_pin@04 {
+			pio-map = <
+		/* port  pin  dir  open_drain  assignment  has_irq */
+				0x1  0x0  0x2  0x0  0x2  0x0    /* SER3_CD_B*/
+				0x0  0x1c 0x2  0x0  0x2  0x0    /* SER3_CTS_B*/
+				0x0  0x1d 0x1  0x0  0x2  0x0    /* SER3_RTS_B*/
+				0x0  0x1e 0x2  0x0  0x2  0x0    /* SER3_RXD0*/
+				0x0  0x1f 0x1  0x0  0x2  0x0>;    /* SER3_TXD0*/
+		};
+	};
+};
+
+&qe {
+	serial2: ucc@2600 {
+		device_type = "serial";
+		compatible = "ucc_uart";
+		port-number = <0>;
+		rx-clock-name = "brg6";
+		tx-clock-name = "brg6";
+		pio-handle = <&pio3>;
+	};
+
+	serial3: ucc@2200 {
+		device_type = "serial";
+		compatible = "ucc_uart";
+		port-number = <1>;
+		rx-clock-name = "brg2";
+		tx-clock-name = "brg2";
+		pio-handle = <&pio4>;
+	};
+};
diff --git a/arch/powerpc/boot/dts/p1025twr_32b.dts b/arch/powerpc/boot/dts/p1025twr_32b.dts
new file mode 100644
index 0000000..a3a5266
--- /dev/null
+++ b/arch/powerpc/boot/dts/p1025twr_32b.dts
@@ -0,0 +1,135 @@
+/*
+ * P1025 TWR Device Tree Source (32-bit address map)
+ *
+ * Copyright 2013 Freescale Semiconductor Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in the
+ *       documentation and/or other materials provided with the distribution.
+ *     * Neither the name of Freescale Semiconductor nor the
+ *       names of its contributors may be used to endorse or promote products
+ *       derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/include/ "fsl/p1021si-pre.dtsi"
+/ {
+	model = "fsl,P1025";
+	compatible = "fsl,TWR-P1025";
+
+	memory {
+		device_type = "memory";
+	};
+
+	lbc: localbus@ffe05000 {
+		reg = <0 0xffe05000 0 0x1000>;
+
+		/* NOR Flash and SSD1289 */
+		ranges = <0x0 0x0 0x0 0xec000000 0x04000000
+			  0x2 0x0 0x0 0xe0000000 0x00020000>;
+	};
+
+	soc: soc@ffe00000 {
+		ranges = <0x0 0x0 0xffe00000 0x100000>;
+	};
+
+	pci0: pcie@ffe09000 {
+		ranges = <0x2000000 0x0 0xe0000000 0 0xa0000000 0x0 0x20000000
+			  0x1000000 0x0 0x00000000 0 0xffc10000 0x0 0x10000>;
+		reg = <0 0xffe09000 0 0x1000>;
+		pcie@0 {
+			ranges = <0x2000000 0x0 0xe0000000
+				  0x2000000 0x0 0xe0000000
+				  0x0 0x20000000
+
+				  0x1000000 0x0 0x0
+				  0x1000000 0x0 0x0
+				  0x0 0x100000>;
+		};
+	};
+
+	pci1: pcie@ffe0a000 {
+		reg = <0 0xffe0a000 0 0x1000>;
+		ranges = <0x2000000 0x0 0xe0000000 0 0x80000000 0x0 0x20000000
+			  0x1000000 0x0 0x00000000 0 0xffc00000 0x0 0x10000>;
+		pcie@0 {
+			ranges = <0x2000000 0x0 0xe0000000
+				  0x2000000 0x0 0xe0000000
+				  0x0 0x20000000
+
+				  0x1000000 0x0 0x0
+				  0x1000000 0x0 0x0
+				  0x0 0x100000>;
+		};
+	};
+
+	qe: qe@ffe80000 {
+		ranges = <0x0 0x0 0xffe80000 0x40000>;
+		reg = <0 0xffe80000 0 0x480>;
+		brg-frequency = <0>;
+		bus-frequency = <0>;
+		status = "disabled"; /* no firmware loaded */
+
+		enet3: ucc@2000 {
+			device_type = "network";
+			compatible = "ucc_geth";
+			rx-clock-name = "clk12";
+			tx-clock-name = "clk9";
+			pio-handle = <&pio1>;
+			phy-handle = <&qe_phy0>;
+			phy-connection-type = "mii";
+		};
+
+		mdio@2120 {
+			qe_phy0: ethernet-phy@18 {
+				interrupt-parent = <&mpic>;
+				interrupts = <4 1 0 0>;
+				reg = <0x18>;
+				device_type = "ethernet-phy";
+			};
+			qe_phy1: ethernet-phy@19 {
+				interrupt-parent = <&mpic>;
+				interrupts = <5 1 0 0>;
+				reg = <0x19>;
+				device_type = "ethernet-phy";
+			};
+			tbi-phy@11 {
+				reg = <0x11>;
+				device_type = "tbi-phy";
+			};
+		};
+
+		enet4: ucc@2400 {
+			device_type = "network";
+			compatible = "ucc_geth";
+			rx-clock-name = "none";
+			tx-clock-name = "clk13";
+			pio-handle = <&pio2>;
+			phy-handle = <&qe_phy1>;
+			phy-connection-type = "rmii";
+		};
+	};
+};
+
+/include/ "p1025twr.dtsi"
+/include/ "fsl/p1021si-post.dtsi"
diff --git a/arch/powerpc/platforms/85xx/Kconfig b/arch/powerpc/platforms/85xx/Kconfig
index 8f02b05..fe36689 100644
--- a/arch/powerpc/platforms/85xx/Kconfig
+++ b/arch/powerpc/platforms/85xx/Kconfig
@@ -117,6 +117,12 @@ config P1023_RDS
 	help
 	  This option enables support for the P1023 RDS board
 
+config TWR_P102x
+	bool "Freescale TWR-P102x"
+	select DEFAULT_UIMAGE
+	help
+	  This option enables support for the TWR-P1025 board.
+
 config SOCRATES
 	bool "Socrates"
 	select DEFAULT_UIMAGE
diff --git a/arch/powerpc/platforms/85xx/Makefile b/arch/powerpc/platforms/85xx/Makefile
index 2eab37e..b8d9f66 100644
--- a/arch/powerpc/platforms/85xx/Makefile
+++ b/arch/powerpc/platforms/85xx/Makefile
@@ -17,6 +17,7 @@ obj-$(CONFIG_P1010_RDB)   += p1010rdb.o
 obj-$(CONFIG_P1022_DS)    += p1022_ds.o
 obj-$(CONFIG_P1022_RDK)   += p1022_rdk.o
 obj-$(CONFIG_P1023_RDS)   += p1023_rds.o
+obj-$(CONFIG_TWR_P102x)   += twr_p102x.o
 obj-$(CONFIG_P2041_RDB)   += p2041_rdb.o corenet_ds.o
 obj-$(CONFIG_P3041_DS)    += p3041_ds.o corenet_ds.o
 obj-$(CONFIG_P4080_DS)    += p4080_ds.o corenet_ds.o
diff --git a/arch/powerpc/platforms/85xx/twr_p102x.c b/arch/powerpc/platforms/85xx/twr_p102x.c
new file mode 100644
index 0000000..c136a61
--- /dev/null
+++ b/arch/powerpc/platforms/85xx/twr_p102x.c
@@ -0,0 +1,133 @@
+/*
+ * Copyright 2010-2011, 2013 Freescale Semiconductor, Inc.
+ *
+ * Author: Michael Johnston <michael.johnston@freescale.com>
+ *
+ * Description:
+ * TWR-P102x Board Setup
+ *
+ * This program is free software; you can redistribute  it and/or modify it
+ * under  the terms of  the GNU General  Public License as published by the
+ * Free Software Foundation;  either version 2 of the  License, or (at your
+ * option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/errno.h>
+#include <linux/pci.h>
+#include <linux/of_platform.h>
+
+#include <asm/pci-bridge.h>
+#include <asm/udbg.h>
+#include <asm/mpic.h>
+#include <asm/qe.h>
+#include <asm/qe_ic.h>
+#include <asm/fsl_guts.h>
+
+#include <sysdev/fsl_soc.h>
+#include <sysdev/fsl_pci.h>
+#include "smp.h"
+
+#include "mpc85xx.h"
+
+static void __init twr_p1025_pic_init(void)
+{
+	struct mpic *mpic;
+
+	mpic = mpic_alloc(NULL, 0, MPIC_BIG_ENDIAN |
+			MPIC_SINGLE_DEST_CPU,
+			0, 256, " OpenPIC  ");
+
+	BUG_ON(mpic == NULL);
+	mpic_init(mpic);
+
+#ifdef CONFIG_QUICC_ENGINE
+	mpc85xx_qe_pic_init();
+#endif
+}
+
+/* ************************************************************************
+ *
+ * Setup the architecture
+ *
+ */
+static void __init twr_p1025_setup_arch(void)
+{
+#ifdef CONFIG_QUICC_ENGINE
+	struct device_node *np;
+#endif
+
+	if (ppc_md.progress)
+		ppc_md.progress("twr_p1025_setup_arch()", 0);
+
+	mpc85xx_smp_init();
+
+	fsl_pci_assign_primary();
+
+#ifdef CONFIG_QUICC_ENGINE
+	mpc85xx_qe_init();
+
+#if defined(CONFIG_UCC_GETH) || defined(CONFIG_SERIAL_QE)
+	if (machine_is(twr_p1025)) {
+		struct ccsr_guts __iomem *guts;
+
+		np = of_find_node_by_name(NULL, "global-utilities");
+		if (np) {
+			guts = of_iomap(np, 0);
+			if (!guts)
+				pr_err("twr_p1025: could not map"
+					"global utilities register\n");
+			else {
+			/* P1025 has pins muxed for QE and other functions. To
+			 * enable QE UEC mode, we need to set bit QE0 for UCC1
+			 * in Eth mode, QE0 and QE3 for UCC5 in Eth mode, QE9
+			 * and QE12 for QE MII management signals in PMUXCR
+			 * register.
+			 */
+
+			printk(KERN_INFO "P1025 pinmux configured for QE\n");
+
+			/* Set QE mux bits in PMUXCR */
+			setbits32(&guts->pmuxcr, MPC85xx_PMUXCR_QE(0) |
+					MPC85xx_PMUXCR_QE(3) |
+					MPC85xx_PMUXCR_QE(9) |
+					MPC85xx_PMUXCR_QE(12));
+			iounmap(guts);
+
+			/* Drive PB29 to CPLD low - CPLD will then change
+			 * muxing from LBC to QE */
+			par_io_config_pin(1, 29, 1, 0, 0, 0);
+			par_io_data_set(1, 29, 0);
+			}
+			of_node_put(np);
+		}
+	}
+#endif
+#endif	/* CONFIG_QUICC_ENGINE */
+
+	printk(KERN_INFO "TWR-P1025 board from Freescale Semiconductor\n");
+}
+
+machine_arch_initcall(twr_p1025, mpc85xx_common_publish_devices);
+
+static int __init twr_p1025_probe(void)
+{
+	unsigned long root = of_get_flat_dt_root();
+
+	return of_flat_dt_is_compatible(root, "fsl,TWR-P1025");
+}
+
+define_machine(twr_p1025) {
+	.name			= "TWR-P1025",
+	.probe			= twr_p1025_probe,
+	.setup_arch		= twr_p1025_setup_arch,
+	.init_IRQ		= twr_p1025_pic_init,
+#ifdef CONFIG_PCI
+	.pcibios_fixup_bus	= fsl_pcibios_fixup_bus,
+#endif
+	.get_irq		= mpic_get_irq,
+	.restart		= fsl_rstcr_restart,
+	.calibrate_decr		= generic_calibrate_decr,
+	.progress		= udbg_progress,
+};
-- 
1.8.0

^ permalink raw reply related

* [PATCH V2 1/2] powerpc/85xx: Add QE common init functions
From: Xie Xiaobo @ 2013-08-30 10:01 UTC (permalink / raw)
  To: linuxppc-dev, galak, scottwood; +Cc: Xie Xiaobo

Define two QE init functions in common file, and avoid
the same codes being duplicated in board files.

Signed-off-by: Xie Xiaobo <X.Xie@freescale.com>
---
 arch/powerpc/platforms/85xx/common.c  | 47 +++++++++++++++++++++++++++++++++++
 arch/powerpc/platforms/85xx/mpc85xx.h |  8 ++++++
 2 files changed, 55 insertions(+)

diff --git a/arch/powerpc/platforms/85xx/common.c b/arch/powerpc/platforms/85xx/common.c
index d0861a0..fb3f5e6 100644
--- a/arch/powerpc/platforms/85xx/common.c
+++ b/arch/powerpc/platforms/85xx/common.c
@@ -7,6 +7,8 @@
  */
 #include <linux/of_platform.h>
 
+#include <asm/qe.h>
+#include <asm/qe_ic.h>
 #include <sysdev/cpm2_pic.h>
 
 #include "mpc85xx.h"
@@ -80,3 +82,48 @@ void __init mpc85xx_cpm2_pic_init(void)
 	irq_set_chained_handler(irq, cpm2_cascade);
 }
 #endif
+
+#ifdef CONFIG_QUICC_ENGINE
+void __init mpc85xx_qe_pic_init(void)
+{
+	struct device_node *np;
+
+	np = of_find_compatible_node(NULL, NULL, "fsl,qe-ic");
+	if (np) {
+		qe_ic_init(np, 0, qe_ic_cascade_low_mpic,
+				qe_ic_cascade_high_mpic);
+		of_node_put(np);
+	} else
+		pr_err("%s: Could not find qe-ic node\n", __func__);
+}
+
+void __init mpc85xx_qe_init(void)
+{
+	struct device_node *np;
+
+	np = of_find_compatible_node(NULL, NULL, "fsl,qe");
+	if (!np) {
+		np = of_find_node_by_name(NULL, "qe");
+		if (!np) {
+			pr_err("%s: Could not find Quicc Engine node\n",
+					__func__);
+			return;
+		}
+	}
+
+	qe_reset();
+	of_node_put(np);
+
+	np = of_find_node_by_name(NULL, "par_io");
+	if (np) {
+		struct device_node *ucc;
+
+		par_io_init(np);
+		of_node_put(np);
+
+		for_each_node_by_name(ucc, "ucc")
+			par_io_of_config(ucc);
+
+	}
+}
+#endif
diff --git a/arch/powerpc/platforms/85xx/mpc85xx.h b/arch/powerpc/platforms/85xx/mpc85xx.h
index 2aa7c5d..1d39095 100644
--- a/arch/powerpc/platforms/85xx/mpc85xx.h
+++ b/arch/powerpc/platforms/85xx/mpc85xx.h
@@ -8,4 +8,12 @@ extern void mpc85xx_cpm2_pic_init(void);
 static inline void __init mpc85xx_cpm2_pic_init(void) {}
 #endif /* CONFIG_CPM2 */
 
+#ifdef CONFIG_QUICC_ENGINE
+extern void mpc85xx_qe_pic_init(void);
+extern void mpc85xx_qe_init(void);
+#else
+static inline void __init mpc85xx_qe_pic_init(void) {}
+static inline void __init mpc85xx_qe_init(void) {}
+#endif
+
 #endif
-- 
1.8.0

^ permalink raw reply related

* [PATCH 1/2] ASoC: fsl: Add wrapping for dev_dbg() in fsl_spdif.c
From: Nicolin Chen @ 2013-08-30  9:38 UTC (permalink / raw)
  To: broonie; +Cc: alsa-devel, linuxppc-dev, timur

Add wrapping '\n' for dev_dbg() in fsl_spdif.c

Signed-off-by: Nicolin Chen <b42378@freescale.com>
---
 sound/soc/fsl/fsl_spdif.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/sound/soc/fsl/fsl_spdif.c b/sound/soc/fsl/fsl_spdif.c
index e93dc0d..98741e9 100644
--- a/sound/soc/fsl/fsl_spdif.c
+++ b/sound/soc/fsl/fsl_spdif.c
@@ -1071,9 +1071,9 @@ static int fsl_spdif_probe_txclk(struct fsl_spdif_priv *spdif_priv,
 			break;
 	}
 
-	dev_dbg(&pdev->dev, "use rxtx%d as tx clock source for %dHz sample rate",
+	dev_dbg(&pdev->dev, "use rxtx%d as tx clock source for %dHz sample rate\n",
 			spdif_priv->txclk_src[index], rate[index]);
-	dev_dbg(&pdev->dev, "use divisor %d for %dHz sample rate",
+	dev_dbg(&pdev->dev, "use divisor %d for %dHz sample rate\n",
 			spdif_priv->txclk_div[index], rate[index]);
 
 	return 0;
-- 
1.7.1

^ permalink raw reply related

* [PATCH V2 6/6] powerpc, perf: Enable SW filtering in branch stack sampling framework
From: Anshuman Khandual @ 2013-08-30  4:24 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev
  Cc: eranian, acme, ellerman, sukadev, michael.neuling
In-Reply-To: <1377836690-32710-1-git-send-email-khandual@linux.vnet.ibm.com>

This patch enables SW based post processing of BHRB captured branches
to be able to meet more user defined branch filtration criteria in perf
branch stack sampling framework. This changes increase the number of
filters and their valid combinations on powerpc64 platform with BHRB
support. Summary of code changes described below.

(1) struct cpu_hw_events

	Introduced two new variables and modified one to track various filters.

	a) bhrb_hw_filter	Tracks PMU based HW branch filter flags.
				Computed from PMU dependent call back.
	b) bhrb_sw_filter	Tracks SW based instruction filter flags
				Computed from PPC64 generic SW filter.
	c) filter_mask		Tracks overall filter flags for PPC64

(2) Creating HW event with BHRB request

	Kernel would try to figure out supported HW filters through a PMU call
	back ppmu->bhrb_filter_map(). Here it would only invalidate unsupported
	HW filter combinations. In future we could process one element from the
	combination in HW and one in SW. Meanwhile cpuhw->filter_mask would be
	tracking the overall supported branch filter requests on the PMU.

	Kernel would also process the user request against available SW filters
	for PPC64. Then we would process filter_mask to verify whether all the
	user requested branch filters have been taken care of either in HW or in
	SW.

(3) BHRB SW filter processing

	During the BHRB data capture inside the PMU interrupt context, each
	of the captured "perf_branch_entry.from" would be checked for compliance
	with applicable SW branch filters. If the entry does not confirm to the
	filter requirements, it would be discarded from the final perf branch
	stack buffer.

(4) Instruction classification for proposed SW filters

	Here are the list of category of instructions which have been classified
	under the proposed SW filters.

	(a) PERF_SAMPLE_BRANCH_ANY_RETURN

		(i) [Un]conditional branch to LR without setting the LR
			(1) blr
			(2) bclr
			(3) btlr
			(4) bflr
			(5) bdnzlr
			(6) bdnztlr
			(7) bdnzflr
			(8) bdzlr
			(9) bdztlr
			(10) bdzflr
			(11) bltlr
			(12) blelr
			(13) beqlr
			(14) bgelr
			(15) bgtlr
			(16) bnllr
			(17) bnelr
			(18) bnglr
			(19) bsolr
			(20) bnslr
			(21) biclr
			(22) bnilr
			(23) bunlr
			(24) bnulr

	(b) PERF_SAMPLE_BRANCH_IND_CALL

		(i) [Un]conditional branch to CTR with setting the link
			(1) bctrl
			(2) bcctrl
			(3) btctrl
			(4) bfctrl
			(5) bltctrl
			(6) blectrl
			(7) beqctrl
			(8) bgectrl
			(9) bgtctrl
			(10) bnlctrl
			(11) bnectrl
			(12) bngctrl
			(13) bsoctrl
			(14) bnsctrl
			(15) bicctrl
			(16) bnictrl
			(17) bunctrl
			(18) bnuctrl

		(ii) [Un]conditional branch to LR setting the link
			(0) bclrl
			(1) blrl
			(2) btlrl
			(3) bflrl
			(4) bdnzlrl
			(5) bdnztlrl
			(6) bdnzflrl
			(7) bdzlrl
			(8) bdztlrl
			(9) bdzflrl
			(10) bltlrl
			(11) blelrl
			(12) beqlrl
			(13) bgelrl
			(14) bgtlrl
			(15) bnllrl
			(16) bnelrl
			(17) bnglrl
			(18) bsolrl
			(19) bnslrl
			(20) biclrl
			(21) bnilrl
			(22) bunlrl
			(23) bnulrl

		(iii) [Un]conditional branch to TAR setting the link
			(1) btarl
			(2) bctarl

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/perf_event_server.h |   2 +-
 arch/powerpc/perf/core-book3s.c              | 200 +++++++++++++++++++++++++--
 arch/powerpc/perf/power8-pmu.c               |  19 ++-
 3 files changed, 198 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h
index 8b24926..5fc798b 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -34,7 +34,7 @@ struct power_pmu {
 				unsigned long *valp);
 	int		(*get_alternatives)(u64 event_id, unsigned int flags,
 				u64 alt[]);
-	u64             (*bhrb_filter_map)(u64 branch_sample_type);
+	u64             (*bhrb_filter_map)(u64 branch_sample_type, u64 *filter_mask);
 	void            (*config_bhrb)(u64 pmu_bhrb_filter);
 	void		(*disable_pmc)(unsigned int pmc, unsigned long mmcr[]);
 	int		(*limited_pmc_event)(u64 event_id);
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index eeae308..81c4a1d 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -26,6 +26,10 @@
 #define BHRB_PREDICTION		0x0000000000000001
 #define BHRB_EA			0xFFFFFFFFFFFFFFFC
 
+#define for_each_branch_sample_type(x) \
+        for ((x) = PERF_SAMPLE_BRANCH_USER; \
+             (x) < PERF_SAMPLE_BRANCH_MAX; (x) <<= 1)
+
 struct cpu_hw_events {
 	int n_events;
 	int n_percpu;
@@ -47,7 +51,9 @@ struct cpu_hw_events {
 	int n_txn_start;
 
 	/* BHRB bits */
-	u64				bhrb_filter;	/* BHRB HW branch filter */
+	u64				bhrb_hw_filter;	/* BHRB HW branch filter */
+	u64				bhrb_sw_filter; /* BHRB SW branch filter */
+	u64				filter_mask;	/* Branch filter mask */
 	int				bhrb_users;
 	void				*bhrb_context;
 	struct	perf_branch_stack	bhrb_stack;
@@ -400,6 +406,101 @@ static __u64 power_pmu_bhrb_to(u64 addr)
 	return target - (unsigned long)&instr + addr;
 }
 
+#define BRANCH_LINK   0x00000001
+#define BRANCH_LR     0x4C000020
+#define BRANCH_CTR    0x4C000420
+#define BRANCH_TAR    0x4C000460
+
+/* Check the instruction opcodes */
+static bool validate_instruction(unsigned int *addr, u64 bhrb_sw_filter)
+{
+	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_ANY_RETURN) {
+		/* Link is not set */
+		if (!(*addr & BRANCH_LINK)) {
+			/*
+			 * Conditional and unconditional
+			 * branch to LR.
+			 */
+			if ((*addr & BRANCH_LR) == BRANCH_LR)
+				return true;
+
+			/* Everything else */
+			return false;
+		}
+
+		/* Link is set */
+		return false;
+	}
+
+	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_IND_CALL) {
+		/* Link is set */
+		if (*addr & BRANCH_LINK) {
+			/*
+			 * Conditional and unconditional
+			 * branch to CTR.
+			 */
+			if ((*addr & BRANCH_CTR) == BRANCH_CTR)
+				return true;
+			/*
+			 * Conditional and unconditional
+			 * branch to LR.
+			 */
+			if ((*addr & BRANCH_LR) == BRANCH_LR)
+				return true;
+			/*
+			 * Conditional and unconditional
+			 * branch to TAR.
+			 */
+			if ((*addr & BRANCH_TAR) == BRANCH_TAR)
+				return true;
+
+			/* Everything else */
+			return false;
+		}
+
+		/* Link is not set */
+		return false;
+	}
+
+	/* No software branch filter, control
+	 * should not have come here.
+	 */
+	return true;
+}
+
+/* Extract the instruction from the address */
+static bool check_instruction(u64 addr, u64 bhrb_sw_filter)
+{
+	unsigned int instr;
+	bool ret;
+
+	if (bhrb_sw_filter == 0)
+		return true;
+
+	if (is_kernel_addr(addr)) {
+		ret = validate_instruction((unsigned int *) addr, bhrb_sw_filter);
+	} else {
+		/*
+		 * Userspace address need to copied first
+		 * before analysis.
+		 */
+		pagefault_disable();
+		ret =  __get_user_inatomic(instr, (unsigned int __user *)addr);
+
+		/*
+		 * If the instruction could not be accessible
+		 * from user space, we still OKAY the entry.
+		 */
+		if (ret) {
+			pagefault_enable();
+			return true;
+		}
+		pagefault_enable();
+		ret = validate_instruction(&instr, bhrb_sw_filter);
+	}
+	return ret;
+}
+
 /* Processing BHRB entries */
 void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
 {
@@ -459,14 +560,28 @@ void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
 					addr = 0;
 				}
 				cpuhw->bhrb_entries[u_index].from = addr;
+
+				/* Apply SW filter */
+				if (!check_instruction(cpuhw->
+						bhrb_entries[u_index].from,
+							cpuhw->bhrb_sw_filter))
+					u_index--;
 			} else {
 				/* Branches to immediate field 
 				   (ie I or B form) */
 				cpuhw->bhrb_entries[u_index].from = addr;
-				cpuhw->bhrb_entries[u_index].to =
-					power_pmu_bhrb_to(addr);
-				cpuhw->bhrb_entries[u_index].mispred = pred;
-				cpuhw->bhrb_entries[u_index].predicted = ~pred;
+				if (check_instruction(cpuhw->
+						bhrb_entries[u_index].from,
+						cpuhw->bhrb_sw_filter)) {
+					cpuhw->bhrb_entries[u_index].
+						to = power_pmu_bhrb_to(addr);
+					cpuhw->bhrb_entries[u_index].
+						mispred = pred;
+					cpuhw->bhrb_entries[u_index].
+						predicted = ~pred;
+				} else {
+					u_index--;
+				}
 			}
 			u_index++;
 
@@ -1159,7 +1274,7 @@ static void power_pmu_enable(struct pmu *pmu)
 
  out:
 	if (cpuhw->bhrb_users)
-		ppmu->config_bhrb(cpuhw->bhrb_filter);
+		ppmu->config_bhrb(cpuhw->bhrb_hw_filter);
 
 	local_irq_restore(flags);
 }
@@ -1191,6 +1306,26 @@ static int collect_events(struct perf_event *group, int max_count,
 	return n;
 }
 
+/* SW based branch filters */
+static u64 branch_filter_map(u64 branch_sample_type, u64 *filter_mask)
+{
+	u64 branch_sw_filter = 0;
+
+	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY) {
+		WARN_ON(*filter_mask != PERF_SAMPLE_BRANCH_ANY);
+		return branch_sw_filter;
+	}
+	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_RETURN) {
+		branch_sw_filter |= PERF_SAMPLE_BRANCH_ANY_RETURN;
+		*filter_mask |= PERF_SAMPLE_BRANCH_ANY_RETURN;
+	}
+	if (branch_sample_type & PERF_SAMPLE_BRANCH_IND_CALL) {
+		branch_sw_filter |= PERF_SAMPLE_BRANCH_IND_CALL;
+		*filter_mask |= PERF_SAMPLE_BRANCH_IND_CALL;
+	}
+	return branch_sw_filter;
+}
+
 /*
  * Add a event to the PMU.
  * If all events are not already frozen, then we disable and
@@ -1254,8 +1389,11 @@ nocheck:
  out:
 	if (has_branch_stack(event)) {
 		power_pmu_bhrb_enable(event);
-		cpuhw->bhrb_filter = ppmu->bhrb_filter_map(
-					event->attr.branch_sample_type);
+
+		cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map
+			(event->attr.branch_sample_type, &cpuhw->filter_mask);
+	        cpuhw->bhrb_sw_filter = branch_filter_map
+			(event->attr.branch_sample_type, &cpuhw->filter_mask);
 	}
 
 	perf_pmu_enable(event->pmu);
@@ -1531,6 +1669,35 @@ static int hw_perf_cache_event(u64 config, u64 *eventp)
 	return 0;
 }
 
+/* Validate requested filters either in PMU or in SW */
+static int match_filters(u64 branch_sample_type, u64 filter_mask)
+{
+	u64 x;
+
+	if (filter_mask == PERF_SAMPLE_BRANCH_ANY)
+		return true;
+
+	for_each_branch_sample_type(x) {
+		if (!(branch_sample_type & x))
+			continue;
+		/*
+		 * Privilege filter requests have been already
+		 * taken care during base PMU configuration.
+		 */
+		if (x == PERF_SAMPLE_BRANCH_USER)
+			continue;
+		if (x == PERF_SAMPLE_BRANCH_KERNEL)
+			continue;
+		if (x == PERF_SAMPLE_BRANCH_HV)
+			continue;
+
+		/* Requested filter not available */
+		if (!(filter_mask & x))
+			return false;
+	}
+	return true;
+}
+
 static int power_pmu_event_init(struct perf_event *event)
 {
 	u64 ev;
@@ -1637,10 +1804,21 @@ static int power_pmu_event_init(struct perf_event *event)
 	err = power_check_constraints(cpuhw, events, cflags, n + 1);
 
 	if (has_branch_stack(event)) {
-		cpuhw->bhrb_filter = ppmu->bhrb_filter_map(
-					event->attr.branch_sample_type);
+		/* PMU supported branch filters */
+		cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map
+			(event->attr.branch_sample_type, &cpuhw->filter_mask);
+
+		/* ABI - PMU does not support filter combination */
+		if (cpuhw->bhrb_hw_filter == -1)
+			return -EOPNOTSUPP;
+
+		/* SW supported branch filters */
+		cpuhw->bhrb_sw_filter = branch_filter_map
+			(event->attr.branch_sample_type, &cpuhw->filter_mask);
 
-		if(cpuhw->bhrb_filter == -1)
+		/* ABI - Requested filters are not present */
+		if(!match_filters(event->attr.branch_sample_type,
+							cpuhw->filter_mask))
 			return -EOPNOTSUPP;
 	}
 
diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 6e28587..e02027b 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -558,9 +558,10 @@ static int power8_generic_events[] = {
 	[PERF_COUNT_HW_BRANCH_MISSES] =			PM_BR_MPRED_CMPL,
 };
 
-static u64 power8_bhrb_filter_map(u64 branch_sample_type)
+static u64 power8_bhrb_filter_map(u64 branch_sample_type, u64 *filter_mask)
 {
 	u64 pmu_bhrb_filter = 0;
+	*filter_mask = 0;
 
 	/* BHRB and regular PMU events share the same privilege state
 	 * filter configuration. BHRB is always recorded along with a
@@ -570,15 +571,10 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type)
 	 */
 
 	/* No branch filter requested */
-	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY)
+	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY) {
+		*filter_mask = PERF_SAMPLE_BRANCH_ANY;
 		return pmu_bhrb_filter;
-
-	/* Invalid branch filter options - HW does not support */
-	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_RETURN)
-		return -1;
-
-	if (branch_sample_type & PERF_SAMPLE_BRANCH_IND_CALL)
-		return -1;
+	}
 
 	/* Invalid branch filter combination - HW does not support */
 	if ((branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) &&
@@ -587,16 +583,17 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type)
 
 	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) {
 		pmu_bhrb_filter |= POWER8_MMCRA_IFM1;
+		*filter_mask    |= PERF_SAMPLE_BRANCH_ANY_CALL;
 		return pmu_bhrb_filter;
 	}
 
 	if (branch_sample_type & PERF_SAMPLE_BRANCH_COND) {
 		pmu_bhrb_filter |= POWER8_MMCRA_IFM3;
+		*filter_mask    |= PERF_SAMPLE_BRANCH_COND;
 		return pmu_bhrb_filter;
 	}
 
-	/* Every thing else is unsupported */
-	return -1;
+	return pmu_bhrb_filter;
 }
 
 static void power8_config_bhrb(u64 pmu_bhrb_filter)
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH V2 3/6] perf, tool: Conditional branch filter 'cond' added to perf record
From: Anshuman Khandual @ 2013-08-30  4:24 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev
  Cc: eranian, acme, ellerman, sukadev, michael.neuling
In-Reply-To: <1377836690-32710-1-git-send-email-khandual@linux.vnet.ibm.com>

Adding perf record support for new branch stack filter criteria
PERF_SAMPLE_BRANCH_COND.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 tools/perf/builtin-record.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index ecca62e..802d11d 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -625,6 +625,7 @@ static const struct branch_mode branch_modes[] = {
 	BRANCH_OPT("any_call", PERF_SAMPLE_BRANCH_ANY_CALL),
 	BRANCH_OPT("any_ret", PERF_SAMPLE_BRANCH_ANY_RETURN),
 	BRANCH_OPT("ind_call", PERF_SAMPLE_BRANCH_IND_CALL),
+	BRANCH_OPT("cond", PERF_SAMPLE_BRANCH_COND),
 	BRANCH_END
 };
 
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH V2 5/6] perf, documentation: Description for conditional branch filter
From: Anshuman Khandual @ 2013-08-30  4:24 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev
  Cc: eranian, acme, ellerman, sukadev, michael.neuling
In-Reply-To: <1377836690-32710-1-git-send-email-khandual@linux.vnet.ibm.com>

Adding documentation support for conditional branch filter.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Reviewed-by: Stephane Eranian <eranian@google.com>
---
 tools/perf/Documentation/perf-record.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index e297b74..59ca8d0 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -163,12 +163,13 @@ following filters are defined:
         - any_call: any function call or system call
         - any_ret: any function return or system call return
         - ind_call: any indirect branch
+        - cond: conditional branches
         - u:  only when the branch target is at the user level
         - k: only when the branch target is in the kernel
         - hv: only when the target is at the hypervisor level
 
 +
-The option requires at least one branch type among any, any_call, any_ret, ind_call.
+The option requires at least one branch type among any, any_call, any_ret, ind_call, cond.
 The privilege levels may be omitted, in which case, the privilege levels of the associated
 event are applied to the branch filter. Both kernel (k) and hypervisor (hv) privilege
 levels are subject to permissions.  When sampling on multiple events, branch stack sampling
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH V2 4/6] x86, perf: Add conditional branch filtering support
From: Anshuman Khandual @ 2013-08-30  4:24 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev
  Cc: eranian, acme, ellerman, sukadev, michael.neuling
In-Reply-To: <1377836690-32710-1-git-send-email-khandual@linux.vnet.ibm.com>

This patch adds conditional branch filtering support,
enabling it for PERF_SAMPLE_BRANCH_COND in perf branch
stack sampling framework by utilizing an available
software filter X86_BR_JCC.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Reviewed-by: Stephane Eranian <eranian@google.com>
---
 arch/x86/kernel/cpu/perf_event_intel_lbr.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index d5be06a..9723773 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -371,6 +371,9 @@ static void intel_pmu_setup_sw_lbr_filter(struct perf_event *event)
 	if (br_type & PERF_SAMPLE_BRANCH_NO_TX)
 		mask |= X86_BR_NO_TX;
 
+	if (br_type & PERF_SAMPLE_BRANCH_COND)
+		mask |= X86_BR_JCC;
+
 	/*
 	 * stash actual user request into reg, it may
 	 * be used by fixup code for some CPU
@@ -665,6 +668,7 @@ static const int nhm_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
 	 * NHM/WSM erratum: must include IND_JMP to capture IND_CALL
 	 */
 	[PERF_SAMPLE_BRANCH_IND_CALL] = LBR_IND_CALL | LBR_IND_JMP,
+	[PERF_SAMPLE_BRANCH_COND]     = LBR_JCC,
 };
 
 static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
@@ -676,6 +680,7 @@ static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
 	[PERF_SAMPLE_BRANCH_ANY_CALL]	= LBR_REL_CALL | LBR_IND_CALL
 					| LBR_FAR,
 	[PERF_SAMPLE_BRANCH_IND_CALL]	= LBR_IND_CALL,
+	[PERF_SAMPLE_BRANCH_COND]       = LBR_JCC,
 };
 
 /* core */
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH V2 2/6] powerpc, perf: Enable conditional branch filter for POWER8
From: Anshuman Khandual @ 2013-08-30  4:24 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev
  Cc: eranian, acme, ellerman, sukadev, michael.neuling
In-Reply-To: <1377836690-32710-1-git-send-email-khandual@linux.vnet.ibm.com>

Enables conditional branch filter support for POWER8
utilizing MMCRA register based filter and also invalidates
a BHRB branch filter combination involving conditional
branches.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 arch/powerpc/perf/power8-pmu.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 2ee4a70..6e28587 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -580,11 +580,21 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type)
 	if (branch_sample_type & PERF_SAMPLE_BRANCH_IND_CALL)
 		return -1;
 
+	/* Invalid branch filter combination - HW does not support */
+	if ((branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) &&
+			(branch_sample_type & PERF_SAMPLE_BRANCH_COND))
+		return -1;
+
 	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) {
 		pmu_bhrb_filter |= POWER8_MMCRA_IFM1;
 		return pmu_bhrb_filter;
 	}
 
+	if (branch_sample_type & PERF_SAMPLE_BRANCH_COND) {
+		pmu_bhrb_filter |= POWER8_MMCRA_IFM3;
+		return pmu_bhrb_filter;
+	}
+
 	/* Every thing else is unsupported */
 	return -1;
 }
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH V2 0/6] perf: New conditional branch filter
From: Anshuman Khandual @ 2013-08-30  4:24 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev
  Cc: eranian, acme, ellerman, sukadev, michael.neuling

	This patchset is the re-spin of the original branch stack sampling
patchset which introduced new PERF_SAMPLE_BRANCH_COND filter. This patchset
also enables SW based branch filtering support for PPC64 platforms which have
branch stack sampling support. With this new enablement, the branch filter support
for PPC64 platforms have been extended to include all these combinations discussed
below with a sample test application program.


(1) perf record -e branch-misses:u -b ./cprog
# Overhead  Command  Source Shared Object          Source Symbol  Target Shared Object          Target Symbol
# ........  .......  ....................  .....................  ....................  .....................
#
     4.42%    cprog  cprog                 [k] sw_4_2             cprog                 [k] lr_addr          
     4.41%    cprog  cprog                 [k] symbol2            cprog                 [k] hw_1_2           
     4.41%    cprog  cprog                 [k] ctr_addr           cprog                 [k] sw_4_1           
     4.41%    cprog  cprog                 [k] lr_addr            cprog                 [k] sw_4_2           
     4.41%    cprog  cprog                 [k] sw_4_2             cprog                 [k] callme           
     4.41%    cprog  cprog                 [k] symbol1            cprog                 [k] hw_1_1           
     4.41%    cprog  cprog                 [k] success_3_1_3      cprog                 [k] sw_3_1           
     2.43%    cprog  cprog                 [k] sw_4_1             cprog                 [k] ctr_addr         
     2.43%    cprog  cprog                 [k] hw_1_2             cprog                 [k] symbol2          
     2.43%    cprog  cprog                 [k] callme             cprog                 [k] hw_1_2           
     2.43%    cprog  cprog                 [k] address1           cprog                 [k] back1            
     2.43%    cprog  cprog                 [k] back1              cprog                 [k] callme           
     2.43%    cprog  cprog                 [k] hw_2_1             cprog                 [k] address1         
     2.43%    cprog  cprog                 [k] sw_3_1_1           cprog                 [k] sw_3_1           
     2.43%    cprog  cprog                 [k] sw_3_1_2           cprog                 [k] sw_3_1           
     2.43%    cprog  cprog                 [k] sw_3_1_3           cprog                 [k] sw_3_1           
     2.43%    cprog  cprog                 [k] sw_3_1             cprog                 [k] sw_3_1_1         
     2.43%    cprog  cprog                 [k] sw_3_1             cprog                 [k] sw_3_1_2         
     2.43%    cprog  cprog                 [k] sw_3_1             cprog                 [k] sw_3_1_3         
     2.43%    cprog  cprog                 [k] callme             cprog                 [k] sw_3_1           
     2.43%    cprog  cprog                 [k] callme             cprog                 [k] sw_4_2           
     2.43%    cprog  cprog                 [k] hw_1_1             cprog                 [k] symbol1          
     2.43%    cprog  cprog                 [k] callme             cprog                 [k] hw_1_1           
     2.42%    cprog  cprog                 [k] sw_3_1             cprog                 [k] callme           
     1.99%    cprog  cprog                 [k] success_3_1_1      cprog                 [k] sw_3_1           
     1.99%    cprog  cprog                 [k] sw_3_1             cprog                 [k] success_3_1_1    
     1.99%    cprog  cprog                 [k] address2           cprog                 [k] back2            
     1.99%    cprog  cprog                 [k] hw_2_2             cprog                 [k] address2         
     1.99%    cprog  cprog                 [k] back2              cprog                 [k] callme           
     1.99%    cprog  cprog                 [k] callme             cprog                 [k] main             
     1.99%    cprog  cprog                 [k] sw_3_1             cprog                 [k] success_3_1_3    
     1.99%    cprog  cprog                 [k] hw_1_1             cprog                 [k] callme           
     1.99%    cprog  cprog                 [k] sw_3_2             cprog                 [k] callme           
     1.99%    cprog  cprog                 [k] callme             cprog                 [k] sw_3_2           
     1.99%    cprog  cprog                 [k] success_3_1_2      cprog                 [k] sw_3_1           
     1.99%    cprog  cprog                 [k] sw_3_1             cprog                 [k] success_3_1_2    
     1.99%    cprog  cprog                 [k] hw_1_2             cprog                 [k] callme           
     1.99%    cprog  cprog                 [k] sw_4_1             cprog                 [k] callme           
     0.02%    cprog  [unknown]             [k] 0xf7ba2328         [unknown]             [k] 0xf7ba2320       
     0.00%    cprog  libc-2.11.2.so        [k] _IO_file_overflow  libc-2.11.2.so        [k] _IO_file_overflow
     0.00%    cprog  libc-2.11.2.so        [k] _IO_file_xsputn    libc-2.11.2.so        [k] _IO_file_overflow
     0.00%    cprog  cprog                 [k] callme             cprog                 [k] hw_2_2       

PMU filters
-----------
(2) perf record -e branch-misses:u -j any_call ./cprog

# Overhead  Command  Source Shared Object            Source Symbol  Target Shared Object           Target Symbol
# ........  .......  ....................  .......................  ....................  ......................
#
     7.82%    cprog  cprog                 [k] sw_3_1               cprog                 [k] success_3_1_2     
     6.88%    cprog  cprog                 [k] sw_3_1               cprog                 [k] sw_3_1_2          
     6.88%    cprog  cprog                 [k] hw_1_1               cprog                 [k] symbol1           
     5.88%    cprog  cprog                 [k] sw_3_1               cprog                 [k] sw_3_1_1          
     5.88%    cprog  cprog                 [k] callme               cprog                 [k] hw_1_1            
     5.88%    cprog  cprog                 [k] sw_3_1               cprog                 [k] success_3_1_1     
     5.88%    cprog  cprog                 [k] sw_3_1               cprog                 [k] sw_3_1_3          
     5.88%    cprog  cprog                 [k] callme               cprog                 [k] hw_1_2            
     5.88%    cprog  cprog                 [k] hw_1_2               cprog                 [k] symbol2           
     5.88%    cprog  cprog                 [k] sw_4_2               cprog                 [k] lr_addr           
     5.88%    cprog  cprog                 [k] callme               cprog                 [k] sw_4_2            
     4.88%    cprog  cprog                 [k] sw_3_1               cprog                 [k] success_3_1_3     
     4.88%    cprog  cprog                 [k] callme               cprog                 [k] sw_3_2            
     4.88%    cprog  cprog                 [k] callme               cprog                 [k] hw_2_2            
     3.94%    cprog  cprog                 [k] callme               cprog                 [k] sw_3_1            
     3.94%    cprog  cprog                 [k] callme               cprog                 [k] hw_2_1            
     2.94%    cprog  cprog                 [k] main                 cprog                 [k] callme            
     2.94%    cprog  cprog                 [k] sw_4_1               cprog                 [k] ctr_addr          
     2.94%    cprog  cprog                 [k] callme               cprog                 [k] sw_4_1            
     0.01%    cprog  [unknown]             [k] 0xf79076c4           [unknown]             [k] 0xf78f22c0        
     0.00%    cprog  libc-2.11.2.so        [k] _IO_file_doallocate  libc-2.11.2.so        [k] _IO_setb          
     0.00%    cprog  libc-2.11.2.so        [k] _IO_file_doallocate  libc-2.11.2.so        [k] mmap              
     0.00%    cprog  libc-2.11.2.so        [k] _IO_file_xsputn      libc-2.11.2.so        [k] _IO_default_xsputn
     0.00%    cprog  libc-2.11.2.so        [k] _IO_file_overflow    libc-2.11.2.so        [k] _IO_do_write      
     0.00%    cprog  ld-2.11.2.so          [k] malloc               [unknown]             [k] 0xf790b380        


(3) perf record -e branch-misses:u -j cond ./cprog
# Overhead  Command  Source Shared Object       Source Symbol  Target Shared Object            Target Symbol
# ........  .......  ....................  ..................  ....................  .......................
#
    24.85%    cprog  [unknown]             [k] 00000000        cprog                 [k] callme             
    15.71%    cprog  cprog                 [k] sw_3_1          cprog                 [k] sw_3_1             
     7.14%    cprog  cprog                 [k] sw_4_2          cprog                 [k] lr_addr            
     6.57%    cprog  [unknown]             [k] 00000000        cprog                 [k] sw_4_2             
     4.57%    cprog  cprog                 [k] hw_2_2          cprog                 [k] callme             
     4.57%    cprog  cprog                 [k] sw_3_1_1        cprog                 [k] sw_3_1             
     4.57%    cprog  cprog                 [k] sw_4_1          cprog                 [k] ctr_addr           
     4.57%    cprog  [unknown]             [k] 00000000        cprog                 [k] sw_4_1             
     4.57%    cprog  cprog                 [k] main            cprog                 [k] hw_1_1             
     4.57%    cprog  cprog                 [k] hw_1_2          cprog                 [k] hw_1_2             
     4.57%    cprog  [unknown]             [k] 00000000        cprog                 [k] main               
     4.57%    cprog  cprog                 [k] hw_2_1          cprog                 [k] callme             
     4.57%    cprog  cprog                 [k] sw_3_1_3        cprog                 [k] sw_3_1             
     4.57%    cprog  cprog                 [k] sw_3_1_2        cprog                 [k] sw_3_1             
     0.01%    cprog  [unknown]             [k] 0xf7aa25dc      [unknown]             [k] 0xf7aa27e4         
     0.00%    cprog  libc-2.11.2.so        [k] _IO_doallocbuf  libc-2.11.2.so        [k] _IO_file_doallocate
     0.00%    cprog  [unknown]             [k] 00000000        libc-2.11.2.so        [k] _IO_file_doallocate
     0.00%    cprog  [unknown]             [k] 00000000        libc-2.11.2.so        [k] _IO_file_stat   

SW filters
----------
(4) perf record -e branch-misses:u -j any_ret ./cprog
# Overhead  Command  Source Shared Object      Source Symbol  Target Shared Object   Target Symbol
# ........  .......  ....................  .................  ....................  ..............
#
     7.91%    cprog  cprog                 [k] symbol1        cprog                 [k] hw_1_1    
     7.91%    cprog  cprog                 [k] success_3_1_3  cprog                 [k] sw_3_1    
     7.91%    cprog  cprog                 [k] ctr_addr       cprog                 [k] sw_4_1    
     7.91%    cprog  cprog                 [k] lr_addr        cprog                 [k] sw_4_2    
     7.91%    cprog  cprog                 [k] symbol2        cprog                 [k] hw_1_2    
     7.90%    cprog  cprog                 [k] sw_4_2         cprog                 [k] callme    
     4.34%    cprog  cprog                 [k] success_3_1_2  cprog                 [k] sw_3_1    
     4.33%    cprog  cprog                 [k] sw_4_1         cprog                 [k] callme    
     4.33%    cprog  cprog                 [k] hw_1_2         cprog                 [k] callme    
     4.33%    cprog  cprog                 [k] success_3_1_1  cprog                 [k] sw_3_1    
     4.33%    cprog  cprog                 [k] sw_3_2         cprog                 [k] callme    
     4.33%    cprog  cprog                 [k] back2          cprog                 [k] callme    
     4.33%    cprog  cprog                 [k] callme         cprog                 [k] main      
     4.33%    cprog  cprog                 [k] hw_1_1         cprog                 [k] callme    
     3.58%    cprog  cprog                 [k] sw_3_1         cprog                 [k] callme    
     3.58%    cprog  cprog                 [k] sw_3_1_1       cprog                 [k] sw_3_1    
     3.58%    cprog  cprog                 [k] sw_3_1_2       cprog                 [k] sw_3_1    
     3.58%    cprog  cprog                 [k] back1          cprog                 [k] callme    
     3.57%    cprog  cprog                 [k] sw_3_1_3       cprog                 [k] sw_3_1    
     0.00%    cprog  [unknown]             [k] 0xf7abacf4     [unknown]             [k] 0xf7abae40


(5) perf record -e branch-misses:u -j ind_call ./cprog
# Overhead  Command  Source Shared Object  Source Symbol  Target Shared Object  Target Symbol
# ........  .......  ....................  .............  ....................  .............
#
    63.56%    cprog  cprog                 [k] sw_4_2     cprog                 [k] lr_addr  
    36.44%    cprog  cprog                 [k] sw_4_1     cprog                 [k] ctr_addr 


Mixed filters
-------------
(6) perf record -e branch-misses:u -j any_call,any_ret ./cprog
Error:
The perf.data file has no samples!

NOTE: As expected. The HW filters all the branches which are calls and SW tries to find return
branches in that given set. Both the filters are mutually exclussive, so obviously no samples
found in the end profile.

(7) perf record -e branch-misses:u -j any_call,ind_call ./cprog
# Overhead  Command  Source Shared Object   Source Symbol  Target Shared Object   Target Symbol
# ........  .......  ....................  ..............  ....................  ..............
#
    66.69%    cprog  cprog                 [k] sw_4_2      cprog                 [k] lr_addr   
    33.31%    cprog  cprog                 [k] sw_4_1      cprog                 [k] ctr_addr  
     0.00%    cprog  [unknown]             [k] 0x0fe7f264  [unknown]             [k] 0x0ff926d0


(8) perf record -e branch-misses:u -j any_call,any_ret,ind_call ./cprog
Error:
The perf.data file has no samples!

(9) perf record -e branch-misses:u -j cond,any_ret ./cprog
# Overhead  Command  Source Shared Object   Source Symbol  Target Shared Object            Target Symbol
# ........  .......  ....................  ..............  ....................  .......................
#
    46.01%    cprog  [unknown]             [k] 00000000    cprog                 [k] callme             
    13.54%    cprog  [unknown]             [k] 00000000    cprog                 [k] sw_4_2             
     8.18%    cprog  cprog                 [k] sw_3_1_2    cprog                 [k] sw_3_1             
     8.07%    cprog  [unknown]             [k] 00000000    cprog                 [k] main               
     8.07%    cprog  cprog                 [k] sw_3_1_1    cprog                 [k] sw_3_1             
     8.07%    cprog  cprog                 [k] sw_3_1_3    cprog                 [k] sw_3_1             
     8.07%    cprog  [unknown]             [k] 00000000    cprog                 [k] sw_4_1             
     0.00%    cprog  [unknown]             [k] 00000000    [unknown]             [k] 0xf7c1480c         
     0.00%    cprog  libc-2.11.2.so        [k] mmap        libc-2.11.2.so        [k] _IO_file_doallocate

(10) perf record -e branch-misses:u -j cond,ind_call ./cprog
# Overhead  Command  Source Shared Object   Source Symbol  Target Shared Object   Target Symbol
# ........  .......  ....................  ..............  ....................  ..............
#
    48.11%    cprog  [unknown]             [k] 00000000    cprog                 [k] callme    
    13.52%    cprog  [unknown]             [k] 00000000    cprog                 [k] sw_4_2    
    12.42%    cprog  cprog                 [k] sw_4_2      cprog                 [k] lr_addr   
     8.65%    cprog  [unknown]             [k] 00000000    cprog                 [k] main      
     8.65%    cprog  cprog                 [k] sw_4_1      cprog                 [k] ctr_addr  
     8.65%    cprog  [unknown]             [k] 00000000    cprog                 [k] sw_4_1    
     0.00%    cprog  [unknown]             [k] 00000000    [unknown]             [k] 0xf7a4581c


(11) perf record -e branch-misses:u -j cond,any_ret,ind_call ./cprog
# Overhead  Command  Source Shared Object   Source Symbol  Target Shared Object      Target Symbol
# ........  .......  ....................  ..............  ....................  .................
#
    45.91%    cprog  [unknown]             [k] 00000000    cprog                 [k] callme       
    13.26%    cprog  [unknown]             [k] 00000000    cprog                 [k] sw_4_2       
     8.17%    cprog  cprog                 [k] sw_3_1_3    cprog                 [k] sw_3_1       
     8.17%    cprog  [unknown]             [k] 00000000    cprog                 [k] sw_4_1       
     8.17%    cprog  cprog                 [k] sw_3_1_2    cprog                 [k] sw_3_1       
     8.17%    cprog  [unknown]             [k] 00000000    cprog                 [k] main         
     8.16%    cprog  cprog                 [k] sw_3_1_1    cprog                 [k] sw_3_1       
     0.00%    cprog  [unknown]             [k] 00000000    [unknown]             [k] 0xf7f87704   
     0.00%    cprog  [unknown]             [k] 00000000    libc-2.11.2.so        [k] _IO_file_sync

Test application program
========================
(1) Makefile:
--------------------------------------------
all: sample.o cprog of.cprog of.sample

sample.o: sample.s
        as -o sample.o sample.s
cprog: cprog.c sample.o
        gcc -o cprog cprog.c sample.o
of.sample: sample.o
        objdump -d sample.o > of.sample
of.cprog: cprog
        objdump -d cprog > of.cprog
clean:
        rm sample.o cprog of.sample of.cprog
---------------------------------------------
(2) cprog.c
---------------------------------------------
#include <stdio.h>
#define LOOP_COUNT 100000

extern void callme(void);

int main(int argc, char *argv[])
{
        int i;
        for(i = 0; i < LOOP_COUNT; i++)
                callme();

        printf("end");
        return 0;
}
---------------------------------------------
(3) sample.S
---------------------------------------------
# r25, r26, r27 will be used as first level, second level
# and third level stack for LR. Register r20, r21, r22, r23
# r24 will be used for general programming purpose.

.data

msg:
	.string "BHRB filter tests\n"
	len = . - msg
msg_1_1:
	.string "Test: hw_1_1\n"
	len_1_1 = 13
msg_1_2:
	.string "Test: hw_1_2\n"
	len_1_2 = 13
msg_2_1:
	.string "Test: hw_2_1\n"
	len_2_1 = 13
msg_2_2:
	.string "Test: hw_2_2\n"
	len_2_2 = 13
msg_3_1:
	.string "Test: sw_3_1\n"
	len_3_1 = 13
msg_3_1_1:
	.string "Test: sw_3_1_1\n"
	len_3_1_1 = 15
msg_3_1_2:
	.string "Test: sw_3_1_2\n"
	len_3_1_2 = 15
msg_3_1_3:
        .string "Test: sw_3_1_3\n"
        len_3_1_3 = 15
msg_3_2:
	.string "Test: sw_3_2\n"
	len_3_3 = 13
msg_4_1:
	.string "Test: sw_4_1\n"
	len_4_1 = 13
msg_4_2:
	.string "Test: sw_4_2\n"
	len_4_2 = 13

hw_3_1_1_passed:
	.string "\thw_3_1_1_passed\n\n"
	len_hw_3_1_1_passed = 18
hw_3_1_2_passed:
	.string "\thw_3_1_2_passed\n\n"
	len_hw_3_1_2_passed = 18
hw_3_1_3_passed:
	.string "\thw_3_1_3_passed\n\n"
	len_hw_3_1_3_passed = 18

hw_2_1_passed:
	.string "\thw_2_1_passed\n\n"
	len_hw_2_1_passed = 16

hw_2_2_passed:
	.string "\thw_2_2_passed\n\n"
	len_hw_2_2_passed = 16

hw_1_1_passed:
	.string "\thw_1_1_passed\n\n"
	len_hw_1_1_passed = 16

hw_1_2_passed:
	.string "\thw_1_2_passed\n\n"
	len_hw_1_2_passed = 16

hw_4_1_passed:
	.string "\thw_4_1_passed\n\n"
	len_hw_4_1_passed = 16

hw_4_2_passed:
	.string "\thw_4_2_passed\n\n"
	len_hw_4_2_passed = 16

msg_error:
	.string "\tError\n"
	len_error = 7
.text
	.global callme
	.global hw_1_1
	.global hw_1_2
	.global hw_2_1
	.global hw_2_2

# HW filter test symbols
symbol1:
	# Print "hw_1_1_passed"
	li      0, 4
	li      3, 1
	lis     4, hw_1_1_passed@ha
	addi    4, 4, hw_1_1_passed@l
	li      5, len_hw_1_1_passed
	sc

	blr				# PERF_SAMPLE_BRANCH_ANY_RET

hw_1_1:
        # Save LR - second level
        mflr 26

	# Print "hw_1_1 called"
	li      0, 4
	li      3, 1
	lis     4, msg_1_1@ha
	addi    4, 4, msg_1_1@l
	li      5, len_1_1
	sc

	bl symbol1			# PERF_SAMPLE_BRANCH_ANY_CALL

	# Restore LR
	mtlr 26
	blr				# PERF_SAMPLE_BRANCH_ANY_RET

symbol2:
        # Print "Symbol2 taken"
        li      0, 4
        li      3, 1
        lis     4, hw_1_2_passed@ha
        addi    4, 4, hw_1_2_passed@l
        li      5, len_hw_1_2_passed
        sc

	blr				# PERF_SAMPLE_BRANCH_ANY_RET
hw_1_2:
	# Save LR - second level
	mflr 26

        # Print "hw_1_2 called"
        li      0, 4
        li      3, 1
        lis     4, msg_1_2@ha
        addi    4, 4, msg_1_2@l
        li      5, len_1_2
        sc

	li 4,20
	cmpi 0,4,20
	bcl 12, 4*cr0+2, symbol2	# PERF_SAMPLE_BRANCH_ANY_CALL | PERF_SAMPLE_BRANCH_COND

	mtlr 26
	blr				# PERF_SAMPLE_BRANCH_ANY_RET

# HW filter test

address1: 
	# Print "hw_2_1_passed"
        li      0, 4
        li      3, 1
        lis     4, hw_2_1_passed@ha
        addi    4, 4, hw_2_1_passed@l
        li      5, len_hw_2_1_passed
        sc
	b  back1			# PERF_SAMPLE_BRANCH_ANY

hw_2_1:
	# Print "hw_2_1 called"
	li      0, 4
	li      3, 1
	lis     4, msg_2_1@ha
	addi    4, 4, msg_2_1@l
	li      5, len_2_1
	sc
	
	# Simple conditional branch (equal)
	li	20, 12
	cmpi	3, 20, 12
	bc	12, 4*cr3+2, address1	# PERF_SAMPLE_BRANCH_COND

back1:
	blr				# PERF_SAMPLE_BRANCH_ANY_RET

address2:
        # Print "hw_2_2_passed"
        li      0, 4
        li      3, 1
        lis     4, hw_2_2_passed@ha
        addi    4, 4, hw_2_2_passed@l
        li      5, len_hw_2_2_passed
        sc
        b  back2			# PERF_SAMPLE_BRANCH_ANY

hw_2_2:
        # Print "hw_2_2 called"
	li      0, 4
	li      3, 1
	lis     4, msg_2_2@ha
	addi    4, 4, msg_2_2@l
	li      5, len_2_2
	sc

	# Simple conditional branch (less than)
	li	20, 12
	cmpi	4, 20, 20
	bc	12, 4*cr4+0, address2	# PERF_SAMPLE_BRANCH_COND
back2:
	blr				# PERF_SAMPLE_BRANCH_ANY_RET

# SW filter test symbols
sw_3_1_1:
	# Print "Test: sw_3_1_1"
        li      0, 4
        li      3, 1
        lis     4, msg_3_1_1@ha
        addi    4, 4, msg_3_1_1@l
        li      5, len_3_1_1
        sc

	li	22,0
	# Test the condition and return
	li	21, 10
	cmpi	0, 21, 10
	bclr	12, 2			# PERF_SAMPLE_BRANCH_ANY_RET | PERF_SAMPLE_BRANCH_COND

	# Should not have come here
	li      0, 4
	li      3, 1
        lis     4, msg_error@ha
        addi    4, 4, msg_error@l
        li      5, len_error
        sc
	
	# Mark the error
	li 	22, 1
	
	# Safe fall back
	blr				# PERF_SAMPLE_BRANCH_ANY_RET

sw_3_1_2:
        # Print "Test: sw_3_1_2"
        li      0, 4
        li      3, 1
        lis     4, msg_3_1_2@ha
        addi    4, 4, msg_3_1_2@l
        li      5, len_3_1_2
        sc

	li	23, 0
	# Test the condition and return
	li	21, 10
	cmpi	0, 21, 20
	bclr	12, 0			# PERF_SAMPLE_BRANCH_ANY_RET | PERF_SAMPLE_BRANCH_COND
        
	# Should not have come here
	li      0, 4
	li      3, 1
        lis     4, msg_error@ha
        addi    4, 4, msg_error@l
        li      5, len_error
        sc

	# Mark the error
	li 	23, 1

	# Safe fall back
	blr				# PERF_SAMPLE_BRANCH_ANY_RET

sw_3_1_3:
	# Print "Test: sw_3_1_3"
        li      0, 4
        li      3, 1
        lis     4, msg_3_1_3@ha
        addi    4, 4, msg_3_1_3@l
        li      5, len_3_1_3
        sc

	li	24, 0
	# Test the condition and return
	li	21, 10
	cmpi	0, 21, 5
	bclr	12, 1			# PERF_SAMPLE_BRANCH_ANY_RET | PERF_SAMPLE_BRANCH_COND
	
	# Mark the error
	li 	24, 1

	# Should not have come here
	li      0, 4
	li      3, 1
        lis     4, msg_error@ha
        addi    4, 4, msg_error@l
        li      5, len_error
        sc

	# Safe fall back
	blr				# PERF_SAMPLE_BRANCH_ANY_RET

success_3_1_1:
	li      0, 4
	li      3, 1
        lis     4, hw_3_1_1_passed@ha
        addi    4, 4, hw_3_1_1_passed@l
        li      5, len_hw_3_1_1_passed
        sc
	blr

success_3_1_2:
	li      0, 4
	li      3, 1
        lis     4, hw_3_1_2_passed@ha
        addi    4, 4, hw_3_1_2_passed@l
        li      5, len_hw_3_1_2_passed
        sc
	blr

success_3_1_3:
	li      0, 4
	li      3, 1
        lis     4, hw_3_1_3_passed@ha
        addi    4, 4, hw_3_1_3_passed@l
        li      5, len_hw_3_1_3_passed
        sc
	blr

sw_3_1:
	# Save LR
	mflr 26

        # Print "Test: sw_3_1"
        li      0, 4
        li      3, 1
        lis     4, msg_3_1@ha
        addi    4, 4, msg_3_1@l
        li      5, len_3_1
        sc

	# Equal comparison condition
	bl sw_3_1_1			# PERF_SAMPLE_BRANCH_ANY_CALL
	cmpi	0, 22, 0
	bcl	12, 2, success_3_1_1	# PERF_SAMPLE_BRANCH_ANY_CALL | PERF_SAMPLE_BRANCH_COND

	# LT comparison condition
	bl sw_3_1_2			# PERF_SAMPLE_BRANCH_ANY_CALL
	cmpi	0, 23, 0
	bcl	12, 2, success_3_1_2	# PERF_SAMPLE_BRANCH_ANY_CALL | PERF_SAMPLE_BRANCH_COND

	# GT comparison condition
	bl sw_3_1_3			# PERF_SAMPLE_BRANCH_ANY_CALL
	cmpi	0, 24, 0
	bcl	12, 2, success_3_1_3	# PERF_SAMPLE_BRANCH_ANY_CALL | PERF_SAMPLE_BRANCH_COND

	mtlr 26
	blr				# PERF_SAMPLE_BRANCH_ANY_RET
sw_3_2:
	# Print "Test: sw_3_2"
	li      0, 4
	li      3, 1
	lis     4, msg_3_2@ha
	addi    4, 4, msg_3_2@l
	li      5, len_3_1
	sc

	# FIXME: Anything more here ?
	blr				# PERF_SAMPLE_BRANCH_ANY_RET

# Indirect call tests

# CTR
ctr_addr:
        # Print "bcctr taken"
        li      0, 4
        li      3, 1
        lis     4, hw_4_1_passed@ha
        addi    4, 4, hw_4_1_passed@l
        li      5, len_hw_4_1_passed
        sc

	blr				# PERF_SAMPLE_BRANCH_ANY_RET
sw_4_1:
	# Save LR
	mflr	26

	# Print "sw_4_1 called"
        li      0, 4
        li      3, 1
        lis     4, msg_4_1@ha
        addi    4, 4, msg_4_1@l
        li      5, len_4_1
        sc

	# Save address in CTR
	lis 	20, ctr_addr@ha
	addi	20, 20, ctr_addr@l
	mtctr   20


	# Compare and jump to CTR
	li 	21, 10
	cmpi	0, 21, 10
	bcctrl  12, 4*cr0+2		# PERF_SAMPLE_BRANCH_IND_CALL

	mtlr	26
	blr				# PERF_SAMPLE_BRANCH_ANY_RET
# LR
lr_addr:
	# Print "bclrl taken"
	li      0, 4
	li      3, 1
	lis     4, hw_4_2_passed@ha
	addi    4, 4, hw_4_2_passed@l
	li      5, len_hw_4_2_passed
	sc

	blr				# PERF_SAMPLE_BRANCH_ANY_RET

sw_4_2:
	# Save LR
	mflr	26

        # Print "Test: sw_4_2"
        li      0, 4
        li      3, 1
        lis     4, msg_4_2@ha
        addi    4, 4, msg_4_2@l
        li      5, len_4_2
        sc

	# Save address in LR
	lis 	20, lr_addr@ha
	addi	20, 20, lr_addr@l
	mtlr	20


	# Compare and jump to CTR
	li 	21, 10
	cmpi	0, 21, 10
	bclrl   12, 4*cr0+2		# PERF_SAMPLE_BRANCH_IND_CALL

	# Restore LR
	mtlr	26	
	blr				# PERF_SAMPLE_BRANCH_ANY_RET

callme:
	# Save LR
	mflr	25

	# Print "Branch filter Test"
	li	0, 4
	li	3, 1
	lis 	4, msg@ha
	addi	4, 4, msg@l
	li	5, len
	sc

	# PERF_SAMPLE_BRANCH_ANY_CALL
	bl hw_1_1			# PERF_SAMPLE_BRANCH_ANY_CALL
	bl hw_1_2			# PERF_SAMPLE_BRANCH_ANY_CALL
	# PERF_SAMPLE_BRANCH_COND
	bl hw_2_1			# PERF_SAMPLE_BRANCH_ANY_CALL
	bl hw_2_2			# PERF_SAMPLE_BRANCH_ANY_CALL

	# PERF_SAMPLE_BRANCH_ANY_RET
	bl sw_3_1			# PERF_SAMPLE_BRANCH_ANY_CALL
	bl sw_3_2			# PERF_SAMPLE_BRANCH_ANY_CALL
	# PERF_SAMPLE_BRANCH_IND_CALL
	bl sw_4_1			# PERF_SAMPLE_BRANCH_ANY_CALL
	bl sw_4_2			# PERF_SAMPLE_BRANCH_ANY_CALL

	# Restore LR
	mtlr 25
	blr				# PERF_SAMPLE_BRANCH_ANY_RET
--------------------------------------------------------------------
                                           
Changes in V2
--------------
(1) Enabled PPC64 SW branch filtering support
(2) Incorporated changes required for all previous comments

Anshuman Khandual (6):
  perf: New conditional branch filter criteria in branch stack sampling
  powerpc, perf: Enable conditional branch filter for POWER8
  perf, tool: Conditional branch filter 'cond' added to perf record
  x86, perf: Add conditional branch filtering support
  perf, documentation: Description for conditional branch filter
  powerpc, perf: Enable SW filtering in branch stack sampling framework

 arch/powerpc/include/asm/perf_event_server.h |   2 +-
 arch/powerpc/perf/core-book3s.c              | 200 +++++++++++++++++++++++++--
 arch/powerpc/perf/power8-pmu.c               |  25 ++--
 arch/x86/kernel/cpu/perf_event_intel_lbr.c   |   5 +
 include/uapi/linux/perf_event.h              |   3 +-
 tools/perf/Documentation/perf-record.txt     |   3 +-
 tools/perf/builtin-record.c                  |   1 +
 7 files changed, 216 insertions(+), 23 deletions(-)

-- 
1.7.11.7

^ permalink raw reply

* [PATCH V2 1/6] perf: New conditional branch filter criteria in branch stack sampling
From: Anshuman Khandual @ 2013-08-30  4:24 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev
  Cc: eranian, acme, ellerman, sukadev, michael.neuling
In-Reply-To: <1377836690-32710-1-git-send-email-khandual@linux.vnet.ibm.com>

POWER8 PMU based BHRB supports filtering for conditional branches.
This patch introduces new branch filter PERF_SAMPLE_BRANCH_COND which
will extend the existing perf ABI. Other architectures can provide
this functionality with either HW filtering support (if present) or
with SW filtering of instructions.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Reviewed-by: Stephane Eranian <eranian@google.com>
---
 include/uapi/linux/perf_event.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 0b1df41..5da52b6 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -160,8 +160,9 @@ enum perf_branch_sample_type {
 	PERF_SAMPLE_BRANCH_ABORT_TX	= 1U << 7, /* transaction aborts */
 	PERF_SAMPLE_BRANCH_IN_TX	= 1U << 8, /* in transaction */
 	PERF_SAMPLE_BRANCH_NO_TX	= 1U << 9, /* not in transaction */
+	PERF_SAMPLE_BRANCH_COND		= 1U << 10, /* conditional branches */
 
-	PERF_SAMPLE_BRANCH_MAX		= 1U << 10, /* non-ABI */
+	PERF_SAMPLE_BRANCH_MAX		= 1U << 11, /* non-ABI */
 };
 
 #define PERF_SAMPLE_BRANCH_PLM_ALL \
-- 
1.7.11.7

^ permalink raw reply related

* Re: [PATCH v2 0/4] Unify CPU hotplug lock interface
From: Yasuaki Ishimatsu @ 2013-08-30  2:44 UTC (permalink / raw)
  To: Toshi Kani
  Cc: fenghua.yu, bp, gregkh, x86, linux-kernel, rjw, linux-acpi, mingo,
	srivatsa.bhat, nfont, tglx, hpa, linuxppc-dev
In-Reply-To: <1377822129-4143-1-git-send-email-toshi.kani@hp.com>

(2013/08/30 9:22), Toshi Kani wrote:
> lock_device_hotplug() was recently introduced to serialize CPU & Memory
> online/offline and hotplug operations, along with sysfs online interface
> restructure (commit 4f3549d7).  With this new locking scheme,
> cpu_hotplug_driver_lock() is redundant and is no longer necessary.
> 
> This patchset makes sure that lock_device_hotplug() covers all CPU online/
> offline interfaces, and then removes cpu_hotplug_driver_lock().
> 
> v2:
>   - Rebased to the pm tree, bleeding-edge.
>   - Changed patch 2/4 to use lock_device_hotplug_sysfs().
> 
> ---
> Toshi Kani (4):
>    hotplug, x86: Fix online state in cpu0 debug interface
>    hotplug, x86: Add hotplug lock to missing places
>    hotplug, x86: Disable ARCH_CPU_PROBE_RELEASE on x86
>    hotplug, powerpc, x86: Remove cpu_hotplug_driver_lock()
> 
> ---
The patch-set looks good to me.

Acked-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

Thanks,
Yasuaki Ishimatsu


>   arch/powerpc/kernel/smp.c              | 12 ----------
>   arch/powerpc/platforms/pseries/dlpar.c | 40 +++++++++++++---------------------
>   arch/x86/Kconfig                       |  4 ----
>   arch/x86/kernel/smpboot.c              | 21 ------------------
>   arch/x86/kernel/topology.c             | 11 ++++++----
>   drivers/base/cpu.c                     | 34 +++++++++++++++++++----------
>   include/linux/cpu.h                    | 13 -----------
>   7 files changed, 45 insertions(+), 90 deletions(-)
> 

^ permalink raw reply

* [PATCH v2 0/4] Unify CPU hotplug lock interface
From: Toshi Kani @ 2013-08-30  0:22 UTC (permalink / raw)
  To: rjw
  Cc: fenghua.yu, bp, Toshi Kani, gregkh, x86, linux-kernel, linux-acpi,
	isimatu.yasuaki, mingo, srivatsa.bhat, nfont, tglx, hpa,
	linuxppc-dev

lock_device_hotplug() was recently introduced to serialize CPU & Memory
online/offline and hotplug operations, along with sysfs online interface
restructure (commit 4f3549d7).  With this new locking scheme,
cpu_hotplug_driver_lock() is redundant and is no longer necessary.

This patchset makes sure that lock_device_hotplug() covers all CPU online/
offline interfaces, and then removes cpu_hotplug_driver_lock().

v2:
 - Rebased to the pm tree, bleeding-edge.
 - Changed patch 2/4 to use lock_device_hotplug_sysfs().

---
Toshi Kani (4):
  hotplug, x86: Fix online state in cpu0 debug interface
  hotplug, x86: Add hotplug lock to missing places
  hotplug, x86: Disable ARCH_CPU_PROBE_RELEASE on x86
  hotplug, powerpc, x86: Remove cpu_hotplug_driver_lock()

---
 arch/powerpc/kernel/smp.c              | 12 ----------
 arch/powerpc/platforms/pseries/dlpar.c | 40 +++++++++++++---------------------
 arch/x86/Kconfig                       |  4 ----
 arch/x86/kernel/smpboot.c              | 21 ------------------
 arch/x86/kernel/topology.c             | 11 ++++++----
 drivers/base/cpu.c                     | 34 +++++++++++++++++++----------
 include/linux/cpu.h                    | 13 -----------
 7 files changed, 45 insertions(+), 90 deletions(-)

^ permalink raw reply

* [PATCH v2 4/4] hotplug, powerpc, x86: Remove cpu_hotplug_driver_lock()
From: Toshi Kani @ 2013-08-30  0:22 UTC (permalink / raw)
  To: rjw
  Cc: fenghua.yu, bp, Toshi Kani, gregkh, x86, linux-kernel, linux-acpi,
	isimatu.yasuaki, mingo, srivatsa.bhat, nfont, tglx, hpa,
	linuxppc-dev
In-Reply-To: <1377822129-4143-1-git-send-email-toshi.kani@hp.com>

cpu_hotplug_driver_lock() serializes CPU online/offline operations
when ARCH_CPU_PROBE_RELEASE is set.  This lock interface is no longer
necessary with the following reason:

 - lock_device_hotplug() now protects CPU online/offline operations,
   including the probe & release interfaces enabled by
   ARCH_CPU_PROBE_RELEASE.  The use of cpu_hotplug_driver_lock() is
   redundant.
 - cpu_hotplug_driver_lock() is only valid when ARCH_CPU_PROBE_RELEASE
   is defined, which is misleading and is only enabled on powerpc.

This patch removes the cpu_hotplug_driver_lock() interface.  As
a result, ARCH_CPU_PROBE_RELEASE only enables / disables the cpu
probe & release interface as intended.  There is no functional change
in this patch.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
Performed build test only on powerpc.
---
 arch/powerpc/kernel/smp.c              |   12 ----------
 arch/powerpc/platforms/pseries/dlpar.c |   40 ++++++++++++--------------------
 arch/x86/kernel/topology.c             |    2 --
 drivers/base/cpu.c                     |   10 +-------
 include/linux/cpu.h                    |   13 ----------
 5 files changed, 16 insertions(+), 61 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 38b0ba6..1667269 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -763,18 +763,6 @@ void __cpu_die(unsigned int cpu)
 		smp_ops->cpu_die(cpu);
 }
 
-static DEFINE_MUTEX(powerpc_cpu_hotplug_driver_mutex);
-
-void cpu_hotplug_driver_lock()
-{
-	mutex_lock(&powerpc_cpu_hotplug_driver_mutex);
-}
-
-void cpu_hotplug_driver_unlock()
-{
-	mutex_unlock(&powerpc_cpu_hotplug_driver_mutex);
-}
-
 void cpu_die(void)
 {
 	if (ppc_md.cpu_die)
diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c
index a1a7b9a..e39325d 100644
--- a/arch/powerpc/platforms/pseries/dlpar.c
+++ b/arch/powerpc/platforms/pseries/dlpar.c
@@ -387,18 +387,13 @@ static ssize_t dlpar_cpu_probe(const char *buf, size_t count)
 	char *cpu_name;
 	int rc;
 
-	cpu_hotplug_driver_lock();
 	rc = strict_strtoul(buf, 0, &drc_index);
-	if (rc) {
-		rc = -EINVAL;
-		goto out;
-	}
+	if (rc)
+		return -EINVAL;
 
 	dn = dlpar_configure_connector(drc_index);
-	if (!dn) {
-		rc = -EINVAL;
-		goto out;
-	}
+	if (!dn)
+		return -EINVAL;
 
 	/* configure-connector reports cpus as living in the base
 	 * directory of the device tree.  CPUs actually live in the
@@ -407,8 +402,7 @@ static ssize_t dlpar_cpu_probe(const char *buf, size_t count)
 	cpu_name = kasprintf(GFP_KERNEL, "/cpus%s", dn->full_name);
 	if (!cpu_name) {
 		dlpar_free_cc_nodes(dn);
-		rc = -ENOMEM;
-		goto out;
+		return -ENOMEM;
 	}
 
 	kfree(dn->full_name);
@@ -417,22 +411,21 @@ static ssize_t dlpar_cpu_probe(const char *buf, size_t count)
 	rc = dlpar_acquire_drc(drc_index);
 	if (rc) {
 		dlpar_free_cc_nodes(dn);
-		rc = -EINVAL;
-		goto out;
+		return -EINVAL;
 	}
 
 	rc = dlpar_attach_node(dn);
 	if (rc) {
 		dlpar_release_drc(drc_index);
 		dlpar_free_cc_nodes(dn);
-		goto out;
+		return rc;
 	}
 
 	rc = dlpar_online_cpu(dn);
-out:
-	cpu_hotplug_driver_unlock();
+	if (rc)
+		return rc;
 
-	return rc ? rc : count;
+	return count;
 }
 
 static int dlpar_offline_cpu(struct device_node *dn)
@@ -505,30 +498,27 @@ static ssize_t dlpar_cpu_release(const char *buf, size_t count)
 		return -EINVAL;
 	}
 
-	cpu_hotplug_driver_lock();
 	rc = dlpar_offline_cpu(dn);
 	if (rc) {
 		of_node_put(dn);
-		rc = -EINVAL;
-		goto out;
+		return -EINVAL;
 	}
 
 	rc = dlpar_release_drc(*drc_index);
 	if (rc) {
 		of_node_put(dn);
-		goto out;
+		return rc;
 	}
 
 	rc = dlpar_detach_node(dn);
 	if (rc) {
 		dlpar_acquire_drc(*drc_index);
-		goto out;
+		return rc;
 	}
 
 	of_node_put(dn);
-out:
-	cpu_hotplug_driver_unlock();
-	return rc ? rc : count;
+
+	return count;
 }
 
 static int __init pseries_dlpar_init(void)
diff --git a/arch/x86/kernel/topology.c b/arch/x86/kernel/topology.c
index a3f35eb..649b010 100644
--- a/arch/x86/kernel/topology.c
+++ b/arch/x86/kernel/topology.c
@@ -66,7 +66,6 @@ int __ref _debug_hotplug_cpu(int cpu, int action)
 		return -EINVAL;
 
 	lock_device_hotplug();
-	cpu_hotplug_driver_lock();
 
 	switch (action) {
 	case 0:
@@ -91,7 +90,6 @@ int __ref _debug_hotplug_cpu(int cpu, int action)
 		ret = -EINVAL;
 	}
 
-	cpu_hotplug_driver_unlock();
 	unlock_device_hotplug();
 
 	return ret;
diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index dc78e45..9745f03 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -46,8 +46,6 @@ static int __ref cpu_subsys_online(struct device *dev)
 	int from_nid, to_nid;
 	int ret;
 
-	cpu_hotplug_driver_lock();
-
 	from_nid = cpu_to_node(cpuid);
 	ret = cpu_up(cpuid);
 	/*
@@ -58,18 +56,12 @@ static int __ref cpu_subsys_online(struct device *dev)
 	if (from_nid != to_nid)
 		change_cpu_under_node(cpu, from_nid, to_nid);
 
-	cpu_hotplug_driver_unlock();
 	return ret;
 }
 
 static int cpu_subsys_offline(struct device *dev)
 {
-	int ret;
-
-	cpu_hotplug_driver_lock();
-	ret = cpu_down(dev->id);
-	cpu_hotplug_driver_unlock();
-	return ret;
+	return cpu_down(dev->id);
 }
 
 void unregister_cpu(struct cpu *cpu)
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 801ff9e..3434ef7 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -185,19 +185,6 @@ extern void cpu_hotplug_enable(void);
 void clear_tasks_mm_cpumask(int cpu);
 int cpu_down(unsigned int cpu);
 
-#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
-extern void cpu_hotplug_driver_lock(void);
-extern void cpu_hotplug_driver_unlock(void);
-#else
-static inline void cpu_hotplug_driver_lock(void)
-{
-}
-
-static inline void cpu_hotplug_driver_unlock(void)
-{
-}
-#endif
-
 #else		/* CONFIG_HOTPLUG_CPU */
 
 static inline void cpu_hotplug_begin(void) {}

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox