LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH 1/2] powerpc/powernv: reduce multi-hit of iommu_add_device()
From: Gavin Shan @ 2014-04-30  0:28 UTC (permalink / raw)
  To: Wei Yang; +Cc: linuxppc-dev, Alexey Kardashevskiy, gwshan
In-Reply-To: <20140429064955.GA5066@richard>

On Tue, Apr 29, 2014 at 02:49:55PM +0800, Wei Yang wrote:
>On Mon, Apr 28, 2014 at 11:35:32PM +1000, Alexey Kardashevskiy wrote:
>>On 04/23/2014 12:26 PM, Wei Yang wrote:

.../...

>Generally, when kernel enumerate on the pci device, following functions will
>be invoked.
>
>     pci_device_add
>        pcibios_setup_bus_device
>        ...
>        set_iommu_table_base_and_group   
>     device_add
>        ...
>        tce_iommu_bus_notifier           
>     pcibios_fixup_bus
>        pcibios_add_pci_devices
>        ...
>        pcibios_setup_bus_devices        
>
>From the call flow, we see for a normall pci device, the
>pcibios_setup_bus_device() will be invoked twice.
>
>At the bootup time, none of them succeed to setup the dma, since the PE is not
>assigned or the tbl is not set. The iommu tbl and group is setup in
>pnv_pci_ioda_setup_DMA().
>

Yes, we don't assign PE# for PCI devices until ppc_md.pcibios_fixup().
We gets IOMMU group and IOMMU group device registered in ppc_md.pcibios_fixup().

As Alexy already pointed out, "tce_iommu_bus_notifier" doesn't take effect
during system boot stage.

>This call flow maintains the same when EEH error happens on Bus PE, while the
>behavior is a little different. 
>
>     pci_device_add
>        pcibios_setup_bus_device
>        ...
>        set_iommu_table_base_and_group   <- fail, kobj->sd is not initialized
>     device_add
>        ...
>        tce_iommu_bus_notifier           <- succeed
>     pcibios_fixp_bus
>        pcibios_add_pci_devices
>        ...
>        pcibios_setup_bus_devices        <- warning, re-attach
>
>While this call flow will change a little on a VF. For a VF,
>pcibios_fixp_bus() will not be invoked. Current behavior is this.
>
>     pci_device_add
>        pcibios_setup_bus_device
>        ...
>        set_iommu_table_base_and_group   <- fail, kobj->sd is not initialized
>     device_add
>        ...
>        tce_iommu_bus_notifier           <- succeed
>

It seems that we have 2 problems here:

- For non-SRIOV case, pcibios_setup_device() is called for towice. That
  seems incorrect. We could simply remove pcibios_setup_bus_devices()
  from pcibios_fixup_bus().

- It's too early to register IOMMU group/device in pnv_pci_ioda_dma_dev_setup()
  because the sysfs entries of the PCI device aren't finalized yet. So we could
  remove all logic we have in pnv_pci_ioda_dma_dev_setup() and just purely rely
  on "tce_iommu_bus_notifier".

By the way, I never tried EEH on SRIOV PF/VFs. However, I never hit similar
issue in non-SRIOV cases.

Thanks,
Gavin

^ permalink raw reply

* Re: [PATCH 0/3] Add new ptrace request macros on PowerPC
From: Michael Neuling @ 2014-04-30  0:29 UTC (permalink / raw)
  To: Anshuman Khandual; +Cc: Linux PPC dev, linux-kernel, avagin, roland, oleg
In-Reply-To: <535F997B.3000500@linux.vnet.ibm.com>

Anshuman Khandual <khandual@linux.vnet.ibm.com> wrote:

> On 04/29/2014 01:52 PM, Michael Neuling wrote:
> > That's not what that patch does. It shouldn't make any user visible changes
> > to DSCR or PPR.
> 
> It may not when it runs uninterrupted but after the tracee process has
> stopped, thread.dscr reflects the default DSCR value as mentioned
> before. This can be proved by changing the "dscr_default" value in
> arch/powerpc/sysfs.c file.

The intention with DSCR is that if the user changes the DSCR, the kernel
should always save/restore it.  If you are seeing something else, then
that is a bug.  Anton has a test case for this here:

  http://ozlabs.org/~anton/junkcode/dscr_explicit_test.c

If that is failing, then there is a bug that we need to fix.

The PPR is the same, except that the kernel can change it over a
syscall.

> > Over syscall PPR and DSCR may change.

Sorry, this should be only PPR.  DSCR shouldn't change over a syscall,
at least that's the intention.

> > Depending on your test case, that may
> > be your problem.
> 
> I would guess when the tracee process stops for ptrace analysis, tm_reclaim or
> tm_recheckpoint path might be crossed which is causing this dscr_default value
> to go into thread_struct.

That shouldn't happen.  If that's happening, it's a bug.

Mikey

^ permalink raw reply

* Re: [PATCH v2 00/21] FDT clean-ups and libfdt support
From: Stephen N Chivers @ 2014-04-30  0:33 UTC (permalink / raw)
  To: Rob Herring
  Cc: devicetree, linux-kernel-owner, Chris Zankel, Aurelien Jacquiot,
	linux-kernel, Ingo Molnar, H. Peter Anvin, Grant Likely,
	linuxppc-dev, James Hogan
In-Reply-To: <1398215901-25609-1-git-send-email-robherring2@gmail.com>

> From: Rob Herring <robherring2@gmail.com>
> To: Grant Likely <grant.likely@linaro.org>, linux-
> kernel@vger.kernel.org, devicetree@vger.kernel.org
> Cc: Rob Herring <robh@kernel.org>, Aurelien Jacquiot <a-
> jacquiot@ti.com>, Benjamin Herrenschmidt <benh@kernel.crashing.org>,
> Chris Zankel <chris@zankel.net>, "H. Peter Anvin" <hpa@zytor.com>, 
> Ingo Molnar <mingo@redhat.com>, James Hogan <james.hogan@imgtec.co>
> Date: 04/30/2014 09:45 AM
> Subject: [PATCH v2 00/21] FDT clean-ups and libfdt support
> Sent by: linux-kernel-owner@vger.kernel.org
> 
> From: Rob Herring <robh@kernel.org>
> 
> This is a series of clean-ups of architecture FDT code and converts the
> core FDT code over to using libfdt functions. This is in preparation
> to add FDT based address translation parsing functions for early
> console support. This series removes direct access to FDT data from all
> arches except powerpc.
> 
> The current MIPS lantiq and xlp DT code is buggy as built-in DTBs need
> to be copied out of init section. Patches 2 and 3 should be applied to
> 3.15.
> 
> Changes in v2 are relatively minor. There was a bug in the unflattening
> code where walking up the tree was not being handled correctly (thanks
> to Michal Simek). I re-worked things a bit to avoid globally adding
> libfdt include paths.
> 
> A branch is available here[1], and I plan to put into linux-next in a 
few
> days. Please test! I've compiled on arm, arm64, mips, microblaze, 
xtensa,
> and powerpc and booted on arm and arm64.
I have tested this for PowerPC using a snapshot of libfdt[1] collected 
from
the today (30/04/2014).

The computers used were 32 bit, Freescale and IBM/AMCC CPUs:

        MVME5100 - Motorola/Fresscale CPU - PPCBug firmware
        SAM440EP - IBM/AMCC - U-Boot firmware

Tested-by: Stephen Chivers <schivers@csc.com>

Stephen Chivers,
CSC Australia Pty. Ltd.

[1] git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git libfdt

^ permalink raw reply

* Re: [PATCH 1/2] powerpc/powernv: reduce multi-hit of iommu_add_device()
From: Wei Yang @ 2014-04-30  1:31 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, Wei Yang, Alexey Kardashevskiy, gwshan
In-Reply-To: <535FA4E9.9050403@ozlabs.ru>

On Tue, Apr 29, 2014 at 11:11:05PM +1000, Alexey Kardashevskiy wrote:
>On 04/29/2014 07:37 PM, Wei Yang wrote:
>> On Tue, Apr 29, 2014 at 05:55:48PM +1000, Alexey Kardashevskiy wrote:
>>> On 04/29/2014 04:49 PM, Wei Yang wrote:
>>>> On Mon, Apr 28, 2014 at 11:35:32PM +1000, Alexey Kardashevskiy wrote:
>>>>> On 04/23/2014 12:26 PM, Wei Yang wrote:
>>>>>> During the EEH hotplug event, iommu_add_device() will be invoked three times
>>>>>> and two of them will trigger warning or error.
>>>>>>
>>>>>> The three times to invoke the iommu_add_device() are:
>>>>>>
>>>>>>     pci_device_add
>>>>>>        ...
>>>>>>        set_iommu_table_base_and_group   <- 1st time, fail
>>>>>>     device_add
>>>>>>        ...
>>>>>>        tce_iommu_bus_notifier           <- 2nd time, succees
>>>>>>     pcibios_add_pci_devices
>>>>>>        ...
>>>>>>        pcibios_setup_bus_devices        <- 3rd time, re-attach
>>>>>>
>>>>>> The first time fails, since the dev->kobj->sd is not initialized. The
>>>>>> dev->kobj->sd is initialized in device_add().
>>>>>> The third time's warning is triggered by the re-attach of the iommu_group.
>>>>>>
>>>>>> After applying this patch, the error
>>>>>
>>>>> Nack.
>>>>>
>>>>> The patch still seems incorrect and we actually need to remove
>>>>> tce_iommu_bus_notifier completely as pcibios_setup_bus_devices is called
>>>> >from another notifier anyway. Could you please test it?
>>>>>
>>>>>
>>>>
>>>> Hi, Alexey,
>>>>
>>>> Nice to see your comment. Let me show what I got fist.
>>>>
>>>> Generally, when kernel enumerate on the pci device, following functions will
>>>> be invoked.
>>>>
>>>>      pci_device_add
>>>>         pcibios_setup_bus_device
>>>>         ...
>>>>         set_iommu_table_base_and_group   
>>>>      device_add
>>>>         ...
>>>>         tce_iommu_bus_notifier           
>>>>      pcibios_fixp_bus
>>>>         pcibios_add_pci_devices
>>>>         ...
>>>>         pcibios_setup_bus_devices        
>>>>
>>>> >From the call flow, we see for a normall pci device, the
>>>> pcibios_setup_bus_device() will be invoked twice.
>>>
>>>
>>> tce_iommu_bus_notifier should not be called for devices during boot-time
>>> PCI discovery as the discovery is done from subsys_initcall handler and
>>> tce_iommu_bus_notifier is registered as subsys_initcall_sync. Are you sure
>>> this is happening? You should see warnings as for PF's EEH but you do not
>>> say that.
>>>
>> 
>> Let me make it clearer. I list the call flow for general purpose to show the
>> sequence of the call flow.
>> 
>> The tce_iommu_bus_notifier is not invoked at the boot time. And none of them
>> do real thing at boot time.
>> 
>> I don't understand your last sentence. I see warning and error during EEH
>> hotplug. If necessary, I will add this in the commit log.
>> 
>>>
>>>> At the bootup time, none of them succeed to setup the dma, since the PE is not
>>>> assigned or the tbl is not set. The iommu tbl and group is setup in
>>>> pnv_pci_ioda_setup_DMA().
>>>>
>>>> This call flow maintains the same when EEH error happens on Bus PE, while the
>>>> behavior is a little different. 
>>>>
>>>>      pci_device_add
>>>>         pcibios_setup_bus_device
>>>>         ...
>>>>         set_iommu_table_base_and_group   <- fail, kobj->sd is not initialized
>>>
>>>
>>> Sorry, what is uninitialized? The only kobj here is the one in iommu_group
>>> and it must have been taken care of before adding a device.
>> 
>> pci_dev->dev->kobj->sd.
>> 
>> I have explained this in previous discussion. This kobj->sd is initialized in
>> device_add().
>> 
>>>
>>>
>>>>      device_add
>>>>         ...
>>>>         tce_iommu_bus_notifier           <- succeed
>>>>      pcibios_fixp_bus
>>>>         pcibios_add_pci_devices
>>>>         ...
>>>>         pcibios_setup_bus_devices        <- warning, re-attach
>>>
>>>
>>> This is why I am suggesting getting rid of tce_iommu_bus_notifier - we will
>>> avoid the warning.
>> 
>> Then how about the first time's error?
>
>
>What do you call "first time error" here?
>

Would you please take a look at my commit log?

Currently, the iommu_group will be added three times. The error happens at the
first time we try to attatch the iommu_group in pci_device_add(). And yes,
this happens at the EEH recovery on PF for a hotplug case.

The error is: 

        iommu_tce: 0003:05:00.0 has not been added, ret=-14

And the reason is:
  
        pci_dev->dev->kobj->sd is not initialized.

>
>>>> While this call flow will change a little on a VF. For a VF,
>>>> pcibios_fixp_bus() will not be invoked. Current behavior is this.
>>>
>>>
>>> You meant pcibios_fixup_bus? I would expect it to get called (from
>>> pci_rescan_bus() or something like that?) and this seems to be the problem
>>> here.
>> 
>> When EEH happens and hotplug the pci bus, we need to do the pci_scan_bridge()
>> which will do the pcibios_fixup_bus().
>
>Are you talking now about EEH on PF (physical function) or EEH on VF
>(virtual function)?
>
>Are you calling pci_scan_bridge()
>

Yes, it is pcibios_add_pci_devices() do the hotplug and invoke the
pci_scan_bridge().

>
>> So you suggest to remove it? This is the generic code.
>
>
>I suggest removing tce_iommu_bus_notifier only. It must go. It was my
>mistake at the first place.
>
>
>>> How are VFs added in terms of code flow? Is there any git tree to look at?
>>>
>> 
>> VF add code flow is a generic code in drivers/pci/iov.c, virtfn_add().
>
>virtfn_add -> pci_device_add -> pcibios_add_device -> pcibios_setup_device
>-> ppc_md.pci_dma_dev_setup -> pnv_pci_dma_dev_setup ->
>pnv_pci_ioda_dma_dev_setup ->
>set_iommu_table_base.
>
>
>What part of this is missing?
>

Currently, you will do set_iommu_table_base_and_group() in
pnv_pci_ioda_dma_dev_setup(). And this will fail and return an error.

The error reason is the kobj->sd is not initialized yet.
I hope it is clear this time.

>
>> 
>>>
>>>
>>>>      pci_device_add
>>>>         pcibios_setup_bus_device
>>>>         ...
>>>>         set_iommu_table_base_and_group   <- fail, kobj->sd is not initialized
>>>>      device_add
>>>>         ...
>>>>         tce_iommu_bus_notifier           <- succeed
>>>>
>>>> And if an EEH error happens just on a VF, I believe the same call flow should
>>>> go for recovery. (This is not set down, still under discussion with Gavin.)
>>>>
>>>> My conclusion is:
>>>> 1. remove the tce_iommu_bus_notifier completely will make the VF not work.
>>>>    So I choose to revert the code and attach make the iommu group attachment
>>>>    just happens in one place.
>>>>
>>>> BTW, I know my patch is not a perfect one. For a PF, the tbl will still be set
>>>> twice. I am not sure why we need to invoke pcibios_setup_device() twice on a
>>>> PF, maybe some platform need to fixup some thing after the pci_bus is added.
>>>> So I don't remove one of them to solve the problem.
>>>>
>>>> If you have a better idea, I am glad to take it.
>
>
>
>-- 
>Alexey

-- 
Richard Yang
Help you, Help me

^ permalink raw reply

* Re: [PATCH v2 2/2] fsl/mpic_timer: make mpic_timer to support deep sleep feature
From: Scott Wood @ 2014-04-30  1:39 UTC (permalink / raw)
  To: Dongsheng Wang; +Cc: linuxppc-dev, chenhui.zhao, jason.jin
In-Reply-To: <1398319939-36348-1-git-send-email-dongsheng.wang@freescale.com>

On Thu, 2014-04-24 at 14:12 +0800, Dongsheng Wang wrote:
> From: Wang Dongsheng <dongsheng.wang@freescale.com>
> 
> At T104x platfrom the timer clock will be changed from platform_clock
> to sys_ref_clock when system going to deep sleep.
> 
> So before system going to deep sleep, we need to change time to adapt
> to the new frequency that is sys_ref_clock. And after resume from deep
> sleep, restore the time that based on platform_clock.
> 
> Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
> ---
> *v2*
> Remove some unnecessary warning message.
> Remove "switch_freq_flag", it's unnecessary.
> 
> Modify the description of the patch.

The patch needs better code comments throughout to explain the flow of
what's happening.  Note that doesn't mean "more comments", but clearer
and more useful comments.

> diff --git a/arch/powerpc/sysdev/mpic_timer.c b/arch/powerpc/sysdev/mpic_timer.c
> index 9d9b062..71ad368 100644
> --- a/arch/powerpc/sysdev/mpic_timer.c
> +++ b/arch/powerpc/sysdev/mpic_timer.c
> @@ -18,6 +18,7 @@
>  #include <linux/mm.h>
>  #include <linux/interrupt.h>
>  #include <linux/slab.h>
> +#include <linux/suspend.h>
>  #include <linux/of.h>
>  #include <linux/of_address.h>
>  #include <linux/of_device.h>
> @@ -26,7 +27,9 @@
>  #include <sysdev/fsl_soc.h>
>  #include <asm/io.h>
>  
> +#include <asm/mpc85xx.h>
>  #include <asm/mpic_timer.h>
> +#include <asm/pm.h>
>  
>  #d	efine FSL_GLOBAL_TIMER		0x1
>  
> @@ -72,6 +75,8 @@ struct timer_group_priv {
>  	struct mpic_timer		timer[TIMERS_PER_GROUP];
>  	struct list_head		node;
>  	unsigned int			timerfreq;
> +	unsigned int			suspended_timerfreq;
> +	unsigned int			resume_timerfreq;
>  	unsigned int			idle;
>  	unsigned int			flags;
>  	spinlock_t			lock;
> @@ -423,6 +428,33 @@ struct mpic_timer *mpic_request_timer(irq_handler_t fn, void *dev,
>  }
>  EXPORT_SYMBOL(mpic_request_timer);
>  
> +static void timer_group_get_suspended_freq(struct timer_group_priv *priv)
> +{
> +	struct device_node *np;
> +
> +	np = of_find_compatible_node(NULL, NULL, "fsl,qoriq-clockgen-2.0");
> +	if (!np)
> +		return;
> +
> +	of_property_read_u32(np, "clock-frequency", &priv->suspended_timerfreq);
> +	of_node_put(np);
> +
> +	if (!priv->suspended_timerfreq)
> +		pr_warn("Mpic timer will not be accurate during deep sleep.\n");
> +}
> +
> +static int need_to_switch_freq(void)

/*
 * Returns true if the timers operate on a special frequency
 * when the system is suspended.
 */
static bool has_suspend_freq(void)

> @@ -437,6 +469,13 @@ static int timer_group_get_freq(struct device_node *np,
>  					&priv->timerfreq);
>  			of_node_put(dn);
>  		}
> +
> +		/*
> +		 * For deep sleep, if system goes to deep sleep,
> +		 * timer freq will be changed.
> +		 */

"For deep sleep" is redundant here.

> +		if (need_to_switch_freq())
> +			timer_group_get_suspended_freq(priv);
>  	}
>  
>  	if (priv->timerfreq <= 0)
> @@ -445,6 +484,7 @@ static int timer_group_get_freq(struct device_node *np,
>  	if (priv->flags & FSL_GLOBAL_TIMER) {
>  		div = (1 << (MPIC_TIMER_TCR_CLKDIV >> 8)) * 8;
>  		priv->timerfreq /= div;
> +		priv->suspended_timerfreq /= div;
>  	}
>  
>  	return 0;
> @@ -564,14 +604,182 @@ out:
>  	kfree(priv);
>  }
>  
> +static void mpic_reset_time(struct mpic_timer *handle, struct timeval *bcr_time,
> +				struct timeval *ccr_time)
> +{
> +	struct timer_group_priv *priv = container_of(handle,
> +			struct timer_group_priv, timer[handle->num]);
> +
> +	u64 ccr_ticks = 0;
> +	u64 bcr_ticks = 0;
> +
> +	/* switch bcr time */
> +	convert_time_to_ticks(priv, bcr_time, &bcr_ticks);
> +
> +	/* switch ccr time */
> +	convert_time_to_ticks(priv, ccr_time, &ccr_ticks);

Redundant comments.

> +	if (handle->cascade_handle) {
> +		u32 tmp_ticks;
> +		u32 rem_ticks;
> +
> +		/* reset ccr ticks to bcr */
> +		tmp_ticks = div_u64_rem(ccr_ticks, MAX_TICKS_CASCADE,
> +					&rem_ticks)
> +		out_be32(&priv->regs[handle->num].gtbcr,
> +			tmp_ticks | TIMER_STOP);
> +		out_be32(&priv->regs[handle->num - 1].gtbcr, rem_ticks);

The comment should explain why you're doing this -- that the hardware
copies bcr to ccr whenever TIMER_STOP is cleared.

> +		/* start timer */
> +		clrbits32(&priv->regs[handle->num].gtbcr, TIMER_STOP);
> +
> +		/* reset bcr */
> +		tmp_ticks = div_u64_rem(bcr_ticks, MAX_TICKS_CASCADE,
> +					&rem_ticks);
> +		out_be32(&priv->regs[handle->num].gtbcr,
> +			tmp_ticks & ~TIMER_STOP);
> +		out_be32(&priv->regs[handle->num - 1].gtbcr, rem_ticks);

Depending on the clock frequency it may be possible that bcr will be
copied again to ccr before the real bcr is written, if the old ccr was
very small.  This could result in two timer expirations where one was
expected.

It also looks like we start using the new timer counts before we
actually enter deep sleep, so the timeout could end up being shorter
than expected.

I'm not sure there's much we can do about either of these given the
hardware design, but it should at least be noted in a comment.

> +	} else {
> +		/* reset ccr ticks to bcr */
> +		out_be32(&priv->regs[handle->num].gtbcr,
> +			ccr_ticks | TIMER_STOP);
> +		/* start timer */
> +		clrbits32(&priv->regs[handle->num].gtbcr, TIMER_STOP);
> +		/* reset bcr */
> +		out_be32(&priv->regs[handle->num].gtbcr,
> +			bcr_ticks & ~TIMER_STOP);
> +	}
> +}
> +
> +static void do_switch_time(struct mpic_timer *handle, unsigned int new_freq)

timer_set_freq()

> +{
> +	struct timer_group_priv *priv = container_of(handle,
> +			struct timer_group_priv, timer[handle->num]);
> +	struct timeval ccr_time;
> +	struct timeval bcr_time;
> +	unsigned int timerfreq;
> +	u32 test_stop;
> +	u64 ticks;
> +
> +	test_stop = in_be32(&priv->regs[handle->num].gtbcr);
> +	test_stop &= TIMER_STOP;
> +	if (test_stop)
> +		return;

	if (in_be32(&priv->regs[handle->num].gtbcr) & TIMER_STOP)
		return;

> +	/* stop timer, prepare reset time */
> +	setbits32(&priv->regs[handle->num].gtbcr, TIMER_STOP);

"stop timer" is redundant.
What does "prepare reset time" mean here?

> +	/* get bcr time */

I can see that it reads "bcr".  "Get base time" would at least remind
the reader what bcr stands for.

> +	if (handle->cascade_handle) {
> +		u32 tmp_ticks;
> +
> +		tmp_ticks = in_be32(&priv->regs[handle->num].gtbcr);
> +		tmp_ticks &= ~TIMER_STOP;
> +		ticks = ((u64)tmp_ticks & UINT_MAX) * (u64)MAX_TICKS_CASCADE;

What does this "& UINT_MAX" accomplish?

> +		tmp_ticks = in_be32(&priv->regs[handle->num - 1].gtbcr);
> +		ticks += tmp_ticks;
> +	} else {
> +		ticks = in_be32(&priv->regs[handle->num].gtbcr);
> +		ticks &= ~TIMER_STOP;
> +	}
> +	convert_ticks_to_time(priv, ticks, &bcr_time);

It would be cleaner to calculate the suspend-freq base count when
configuring the timer, rather than try to work backwards from the
register values.

> +	/* get ccr time */
> +	mpic_get_remain_time(handle, &ccr_time);

> +	/* recalculate timer time */
> +	timerfreq = priv->timerfreq;
> +	priv->timerfreq = new_freq;
> +	mpic_reset_time(handle, &bcr_time, &ccr_time);
> +	priv->timerfreq = timerfreq;
> +}
> +
> +static void switch_group_timer(struct timer_group_priv *priv,
> +				unsigned int new_freq)

timer_group_set_freq()

> +{
> +	int i, num;
> +
> +	for (i = 0; i < TIMERS_PER_GROUP; i++) {
> +		num = TIMERS_PER_GROUP - 1 - i;
> +		/* cascade */
> +		if ((i + 1) < TIMERS_PER_GROUP &&
> +				priv->timer[num].cascade_handle) {

This comment is redundant -- "cascade" is already in "cascade_handle".

How about a comment explaining how you handle cascaded interrupts?

> +			do_switch_time(&priv->timer[num], new_freq);
> +			i++;
> +			continue;
> +		}

Are you sure it works to convert each timer of a cascade separately?

How is this different from letting the main loop do it, other than that
you don't test idle?  The do_switch_time() call is identical, the
increminting of i is identical...

I think you meant to call do_switch_time() on only one of the halves of
the cascade (I'm not sure which one) and skip the other, but that's not
what the code does.

> +		if (!test_bit(i, (unsigned long *)&priv->idle))
> +			do_switch_time(&priv->timer[num], new_freq);

That cast is broken.  If test_bit() wants a long, you need to provide an
actual long.  On 64-bit this code will be looking at "flags" rather than
"idle".

Is testing idle even necessary, or would TIMER_STOP do the job?

> +	}
> +}
> +
> +static int mpic_timer_suspend(void)
> +{
> +	struct timer_group_priv *priv;
> +	suspend_state_t pm_state;
> +
> +	pm_state = pm_suspend_state();
> +
> +	list_for_each_entry(priv, &timer_group_list, node) {
> +		/* timer not be used */
> +		if (priv->idle == 0xf)
> +			continue;

"The timer isn't used"

> +		switch (pm_state) {
> +		case PM_SUSPEND_STANDBY:
> +			break;

Unnecessary

> +		case PM_SUSPEND_MEM:
> +			if (!priv->suspended_timerfreq)
> +				continue;
> +
> +			/* will switch timers, a set of timer */
> +			switch_group_timer(priv, priv->suspended_timerfreq);

What does "a set of timer" mean here?  If you mean that it operates on a
timer group, please use the same terminology as elsewhere (plus it's
implied by the name of the function).

"will switch timers" is implied by the name of the function.  Comments
should explain things, not just restate them.

> +			/* Software: switch timerfreq to suspended freq */

What does "Software:" mean here?

> +			priv->resume_timerfreq = priv->timerfreq;
> +			priv->timerfreq = priv->suspended_timerfreq;

Why doesn't "switch_group_timer" handle this?

> +			break;
> +		default:
> +			break;

Unnecessary default

There's no need for a switch at all -- there's only one case in which
you're ever going to do anything here.

-Scott

^ permalink raw reply

* RE: [PATCH] powerpc/fsl-rio: Fix fsl_rio_setup error paths and use-after-unmap
From: Gang.Liu @ 2014-04-30  2:58 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev@lists.ozlabs.org
In-Reply-To: <1398791552.24575.56.camel@snotra.buserror.net>

DQo+IC0tLS0tT3JpZ2luYWwgTWVzc2FnZS0tLS0tDQo+IEZyb206IFdvb2QgU2NvdHQtQjA3NDIx
DQo+IFNlbnQ6IFdlZG5lc2RheSwgQXByaWwgMzAsIDIwMTQgMToxMyBBTQ0KPiBUbzogTGl1IEdh
bmctQjM0MTgyDQo+IENjOiBsaW51eHBwYy1kZXZAbGlzdHMub3psYWJzLm9yZw0KPiBTdWJqZWN0
OiBSZTogW1BBVENIXSBwb3dlcnBjL2ZzbC1yaW86IEZpeCBmc2xfcmlvX3NldHVwIGVycm9yIHBh
dGhzIGFuZA0KPiB1c2UtYWZ0ZXItdW5tYXANCj4gDQo+IE9uIE1vbiwgMjAxNC0wNC0yOCBhdCAy
MzowNCAtMDUwMCwgTGl1IEdhbmctQjM0MTgyIHdyb3RlOg0KPiA+ID4gLS0tLS1PcmlnaW5hbCBN
ZXNzYWdlLS0tLS0NCj4gPiA+IEZyb206IFdvb2QgU2NvdHQtQjA3NDIxDQo+ID4gPiBTZW50OiBU
dWVzZGF5LCBBcHJpbCAyOSwgMjAxNCA5OjMyIEFNDQo+ID4gPiBUbzogbGludXhwcGMtZGV2QGxp
c3RzLm96bGFicy5vcmcNCj4gPiA+IENjOiBXb29kIFNjb3R0LUIwNzQyMTsgTGl1IEdhbmctQjM0
MTgyDQo+ID4gPiBTdWJqZWN0OiBbUEFUQ0hdIHBvd2VycGMvZnNsLXJpbzogRml4IGZzbF9yaW9f
c2V0dXAgZXJyb3IgcGF0aHMgYW5kDQo+ID4gPiB1c2UtIGFmdGVyLXVubWFwDQo+ID4gPg0KPiA+
ID4gU2V2ZXJhbCBvZiB0aGUgZXJyb3IgcGF0aHMgZnJvbSBmc2xfcmlvX3NldHVwIGFyZSBtaXNz
aW5nIGVycm9yDQo+IG1lc3NhZ2VzLg0KPiA+ID4NCj4gPiA+IFdvcnNlLCBmc2xfcmlvX3NldHVw
IGluaXRpYWxpemVzIHNldmVyYWwgZ2xvYmFsIHBvaW50ZXJzIGFuZCBkb2VzDQo+ID4gPiBub3Qg
TlVMTCB0aGVtIG91dCBhZnRlciBmcmVlaW5nL3VubWFwcGluZyBvbiBlcnJvci4gIFRoaXMgY2F1
c2VkDQo+ID4gPiBmc2xfcmlvX21jaGVja19leGNlcHRpb24oKSB0byBjcmFzaCB3aGVuIGFjY2Vz
c2luZyByaW9fcmVnc193aW4NCj4gPiA+IHdoaWNoIHdhcyBub24tTlVMTCBidXQgaGFkIGJlZW4g
dW5tYXBwZWQuDQo+ID4gPg0KPiA+ID4gU2lnbmVkLW9mZi1ieTogU2NvdHQgV29vZCA8c2NvdHR3
b29kQGZyZWVzY2FsZS5jb20+DQo+ID4gPiBDYzogTGl1IEdhbmcgPEdhbmcuTGl1QGZyZWVzY2Fs
ZS5jb20+DQo+ID4gPiAtLS0NCj4gPiA+IExpdSBHYW5nLCBhcmUgeW91IHN1cmUgYWxsIG9mIHRo
ZXNlIGVycm9yIGNvbmRpdGlvbnMgYXJlIGZhdGFsPyAgV2h5DQo+ID4gPiBkb2VzIHRoZSByaW8g
ZHJpdmVyIGZhaWwgaWYgcm11IGlzIG5vdCBwcmVzZW50IChlLmcuICBvbiB0NDI0MCk/DQo+ID4N
Cj4gPiBIaSBTY290dCwgSSB0aGluayB0aGUgZXJyb3JzIHlvdSBtb2RpZmllZCBpbiB0aGUgcGF0
Y2ggYXJlIHNlcmlvdXMgYW5kDQo+ID4gc2hvdWxkIGJlIGZpeGVkLiBUaGFua3MgdmVyeSBtdWNo
IQ0KPiA+IEFuZCBpbiBmYWN0LCB0aGUgcmlvIGRyaXZlciBjYW4gYmUgdXNlZCBqdXN0IGZvciB0
aGUgc3VibW9kdWxlIG9mIHRoZQ0KPiBTUklPOiBSTVUuDQo+ID4gSXQgc2hvdWxkIGJlIHVzZWQg
d2l0aCBhcmNoL3Bvd2VycGMvc3lzZGV2L2ZzbF9ybXUuYyBhbmQgdGhlcmUgc2hvdWxkDQo+ID4g
aGF2ZSB0aGUgUk1VIG1vZHVsZS4NCj4gPiBUaGUgZnNsX3Jpby5jIGltcGxlbWVudHMgc29tZSBi
YXNpYyBhbmQgbmVlZGVkIHdvcmtzIHRvIHN1cHBvcnQgdGhlIFJNVQ0KPiBydW5uaW5nIHdlbGwu
DQo+IA0KPiBJIGRvbid0IHF1aXRlIGZvbGxvdyAtLSBpcyBpdCBleHBlY3RlZCB0aGF0IHJpbyBj
YW4gd29yayB3aXRob3V0IHJtdSwgb3INCj4gbm90PyAgQXMgaXMsIGZzbF9yaW9fc2V0dXAoKSB3
aWxsIGVycm9yIG91dCBpZiBpdCBkb2Vzbid0IGZpbmQgYW4NCj4gZnNsLHNyaW8tcm11LWhhbmRs
ZSBwcm9wZXJ0eS4NCg0KZnNsX3Jpb19zZXR1cCgpIGRvZXNuJ3QgZXhwZWN0IHRoYXQgcmlvIGNh
biB3b3JrIHdpdGhvdXQgcm11LiBBbGwgdGhlIHJpbw0KZHJpdmVycyBqdXN0IGhhcyBvbmUgcHVy
cG9zZSwgaXQncyBybXUuIEJ1dCBybXUgaXMgYSBzdWJtb2R1bGUgb2YgdGhlIHJpbywNCnNvIHRo
ZSBkcml2ZXIgc2hvdWxkIHBhcnNlIHJpbyBpbiBkdGIgYW5kIGZpbmlzaCBzb21lIGluaXRpYWwg
d29ya3MgZmlyc3QsDQphbmQgdGhlbiB0byBzZXR1cCBybXUuDQpUaGF0J3Mgd2h5IHRoZSBkcml2
ZXJzIGNhbm5vdCBqdXN0IGhhdmUgYSBzdWNoIGFzIHJtdV9zZXR1cCgpIHRvIHBhcnNlDQpmc2ws
c3Jpby1ybXUtaGFuZGxlIHByb3BlcnR5Lg0KDQpUaGFua3MsDQpMaXUgR2FuZw0KIA0K

^ permalink raw reply

* Re: [PATCH 1/2] powerpc/powernv: reduce multi-hit of iommu_add_device()
From: Wei Yang @ 2014-04-30  3:28 UTC (permalink / raw)
  To: Gavin Shan; +Cc: linuxppc-dev, Wei Yang, Alexey Kardashevskiy
In-Reply-To: <20140430002812.GA13183@shangw>

On Wed, Apr 30, 2014 at 10:28:12AM +1000, Gavin Shan wrote:
>
>It seems that we have 2 problems here:
>
>- For non-SRIOV case, pcibios_setup_device() is called for towice. That
>  seems incorrect. We could simply remove pcibios_setup_bus_devices()
>  from pcibios_fixup_bus().

I have thought about this solution before, but I guess this would have some
side effect on other platforms.

Well, just did a test by removing this line in pcibios_fixup_bus(), the result
is:
1. system up, thanks god.
2. no one invoke the pcibios_setup_device() at bootup time.

The reason for no one invoke the pcibios_setup_device() is: in
pcibios_add_device() it will check whether the bus is added before calling
pcibios_setup_device(). And at this time, the bus is not added.

Still wierd, why the system could be up. But one thing for sure is, no one
invoke the pcibios_setup_device() at bootup stage. So this solution may not
work.

>
>- It's too early to register IOMMU group/device in pnv_pci_ioda_dma_dev_setup()
>  because the sysfs entries of the PCI device aren't finalized yet. So we could
>  remove all logic we have in pnv_pci_ioda_dma_dev_setup() and just purely rely
>  on "tce_iommu_bus_notifier".

This would be another solution. 

One concern:

    If we want to do like this, we need to retrieve the pe information and
    get the tce32_table base in tce_iommu_bus_notifier. Hmm... looks a little
    not that nice.

>
>By the way, I never tried EEH on SRIOV PF/VFs. However, I never hit similar
>issue in non-SRIOV cases.

I have test this case on a PF with no VF enabled.

I just make the mlx4_pci_err_detected() return PCI_ERS_RESULT_NONE and do
nothing. So this will force the eeh to do a hotplug recovery.

You could have a try on your machine too.

>
>Thanks,
>Gavin

-- 
Richard Yang
Help you, Help me

^ permalink raw reply

* Re: [PATCH] powerpc 32: Provides VIRT_CPU_ACCOUNTING
From: Benjamin Herrenschmidt @ 2014-04-30  4:56 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev, Paul Mackerras, linux-kernel
In-Reply-To: <1395266717.12479.293.camel@snotra.buserror.net>

On Wed, 2014-03-19 at 17:05 -0500, Scott Wood wrote:
> On Wed, 2014-03-19 at 22:52 +0100, Christophe Leroy wrote:
> > This patch provides VIRT_CPU_ACCOUTING to PPC32 architecture.
> > Unlike PPC64, PPC32 doesn't provide the PACA register. Therefore the
> > implementation is similar to the one done in the IA64 architecture.
> > It is based on additional information added to the Task Info structure.
> 
> PACA isn't a register -- just a convention for how Linux uses a GPR.
> Maybe it's time to use it on PPC32 as well?

PACA is actually a data structure and you really really don't want it
on ppc32 :-) Having a register point to current works, having a register
point to per-cpu data instead works too (ie, change what we do today),
but don't introduce a PACA *please* :-)

> 
> > Index: b/arch/powerpc/kernel/asm-offsets.c
> > ===================================================================
> > --- b/arch/powerpc/kernel/asm-offsets.c	(revision 5607)
> > +++ b/arch/powerpc/kernel/asm-offsets.c	(revision 5608)
> > @@ -167,6 +167,10 @@
> >  	DEFINE(TI_PREEMPT, offsetof(struct thread_info, preempt_count));
> >  	DEFINE(TI_TASK, offsetof(struct thread_info, task));
> >  	DEFINE(TI_CPU, offsetof(struct thread_info, cpu));
> > +	DEFINE(TI_AC_STAMP, offsetof(struct thread_info, ac_stamp));
> > +	DEFINE(TI_AC_LEAVE, offsetof(struct thread_info, ac_leave));
> > +	DEFINE(TI_AC_STIME, offsetof(struct thread_info, ac_stime));
> > +	DEFINE(TI_AC_UTIME, offsetof(struct thread_info, ac_utime));
> 
> Doesn't this need to be protected by #ifdef
> CONFIG_VIRT_CPU_ACCOUNTING_NATIVE?
> 
> >  
> >  #ifdef CONFIG_PPC64
> >  	DEFINE(DCACHEL1LINESIZE, offsetof(struct ppc64_caches, dline_size));
> > Index: b/arch/powerpc/include/asm/thread_info.h
> > ===================================================================
> > --- b/arch/powerpc/include/asm/thread_info.h	(revision 5607)
> > +++ b/arch/powerpc/include/asm/thread_info.h	(revision 5608)
> > @@ -43,6 +43,12 @@
> >  	int		cpu;			/* cpu we're on */
> >  	int		preempt_count;		/* 0 => preemptable,
> >  						   <0 => BUG */
> > +#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
> > +	__u32 ac_stamp;
> > +	__u32 ac_leave;
> > +	__u32 ac_stime;
> > +	__u32 ac_utime;
> > +#endif
> 
> This isn't uapi; why not use "u32"?
> 
> Plus, it should be made clear that this is only used on 32-bit.
> 
> >  	struct restart_block restart_block;
> >  	unsigned long	local_flags;		/* private flags for thread */
> >  
> > @@ -58,6 +64,8 @@
> >  	.task =		&tsk,			\
> >  	.exec_domain =	&default_exec_domain,	\
> >  	.cpu =		0,			\
> > +	.ac_stime =	0,			\
> > +	.ac_utime =	0,			\
> 
> Also needs to be ifdeffed -- which isn't going to work in a macro, so
> maybe remove the ifdef from the variable declarations, or just let the
> fields be initialized to zero by default.  Or add PACA to 32-bit. :-)
> 
> -Scott
> 

^ permalink raw reply

* Re: [PATCH 0/6] Implement split core for POWER8
From: Michael Neuling @ 2014-04-30  5:09 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, kvm-ppc, Paul Mackerras, kvm
In-Reply-To: <1398303165-6576-1-git-send-email-mikey@neuling.org>

> This patch series implements split core mode on POWER8.  This enables up to 4
> subcores per core which can each independently run guests (per guest SPRs like
> SDR1, LPIDR etc are replicated per subcore).  Lots more documentation on this
> feature in the code and commit messages.
> 
> Most of this code is in the powernv platform but there's a couple of KVM
> specific patches too.
> 
> Alex: If you're happy with the KVM patches, please ACK them and benh can hold
> this series.

Alex,

Any chance we can get an ACK on these two KVM patches so benh can put
this series in his next branch?

Mikey

^ permalink raw reply

* Re: [PATCH 1/1] powerpc: crtsaveres.o needed only when -Os flag is enabled
From: Benjamin Herrenschmidt @ 2014-04-30  5:14 UTC (permalink / raw)
  To: Ram Pai; +Cc: linuxppc-dev, Brian W Hart
In-Reply-To: <20140429153806.GA25743@oc3347516403.ibm.com>

On Tue, 2014-04-29 at 10:38 -0500, Brian W Hart wrote:

> >  CHECKFLAGS	+= -m$(CONFIG_WORD_SIZE) -D__powerpc__ -D__powerpc$(CONFIG_WORD_SIZE)__
> >  
> > +ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
> >  KBUILD_LDFLAGS_MODULE += arch/powerpc/lib/crtsavres.o
> > +endif
> > +
> >  
> >  # No AltiVec or VSX instructions when building kernel
> >  KBUILD_CFLAGS += $(call cc-option,-mno-altivec)
> 
> I didn't try building a kernel or in-tree modules, but I confirmed
> that it allows building of out-of-tree modules when crtsavres.o is
> not present (e.g. as for a distro install where the kernel headers
> are provided by package, rather than being manually prepared from
> the sources).
> 
> Tested-by: Brian W Hart <hartb@linux.vnet.ibm.com>

I still don't like it. What guarantee do we have that gcc will never
call into this with other optimisation settings ? It might decide
one day that calling out for saving a large pile of registers
is still more efficient than unrolling the whole lot, including
for speed.

Besides that doesn't fix the root problem. We want to be able to
build the kernel with CONFIG_CC_OPTIMIZE_FOR_SIZE and still have
modules.

So a better solution needs to be found. I don't know what that
solution is (we might want to look at what other archs are doing
maybe ?), could be to include crtsaveres.S in the build of every
module (we really don't want to EXPORT_SYMBOL these guys), but
that would mean having it installed somewhere with the kernel
headers for out-of-tree modules...

If necessary, involve lkml, Rusty etc... but this patch is crap.

Ben.

^ permalink raw reply

* Re: [PATCH v2] powerpc 32: Provides VIRT_CPU_ACCOUNTING
From: Benjamin Herrenschmidt @ 2014-04-30  4:57 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: scottwood, linuxppc-dev, Paul Mackerras, linux-kernel, alistair
In-Reply-To: <20140407073147.6F87E1A4BDE@localhost.localdomain>

On Mon, 2014-04-07 at 09:31 +0200, Christophe Leroy wrote:
> This patch provides VIRT_CPU_ACCOUTING to PPC32 architecture.
> Unlike PPC64, PPC32 doesn't use the PACA convention. Therefore the
> implementation is taken from the IA64 architecture.
> It is based on additional information added to the Task Info structure.
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>

Scott, Can you review/ack this (or get somebody to) ?

It looks like a great idea but I really don't have the bandwidth to
review in detail and test right now.

(Adding Alister as well who maintains our 4xx 32-bit stuff nowadays).

Cheers,
Ben.

> Index: b/arch/powerpc/Kconfig
> ===================================================================
> --- a/arch/powerpc/Kconfig	(revision 5607)
> +++ b/arch/powerpc/Kconfig	(revision 5611)
> @@ -138,6 +138,7 @@
>  	select OLD_SIGSUSPEND
>  	select OLD_SIGACTION if PPC32
>  	select HAVE_DEBUG_STACKOVERFLOW
> +	select HAVE_VIRT_CPU_ACCOUNTING
>  
>  config EARLY_PRINTK
>  	bool
> Index: a/arch/powerpc/kernel/time.c
> ===================================================================
> --- a/arch/powerpc/kernel/time.c	(revision 5607)
> +++ b/arch/powerpc/kernel/time.c	(revision 5611)
> @@ -162,7 +162,9 @@
>  
>  cputime_t cputime_one_jiffy;
>  
> +#ifdef CONFIG_PPC_SPLPAR
>  void (*dtl_consumer)(struct dtl_entry *, u64);
> +#endif
>  
>  static void calc_cputime_factors(void)
>  {
> @@ -178,6 +180,7 @@
>  	__cputime_clockt_factor = res.result_low;
>  }
>  
> +#ifdef CONFIG_PPC64
>  /*
>   * Read the SPURR on systems that have it, otherwise the PURR,
>   * or if that doesn't exist return the timebase value passed in.
> @@ -190,6 +193,7 @@
>  		return mfspr(SPRN_PURR);
>  	return tb;
>  }
> +#endif
>  
>  #ifdef CONFIG_PPC_SPLPAR
>  
> @@ -291,6 +295,7 @@
>   * Account time for a transition between system, hard irq
>   * or soft irq state.
>   */
> +#ifdef CONFIG_PPC64
>  static u64 vtime_delta(struct task_struct *tsk,
>  			u64 *sys_scaled, u64 *stolen)
>  {
> @@ -377,7 +382,70 @@
>  	get_paca()->utime_sspurr = 0;
>  	account_user_time(tsk, utime, utimescaled);
>  }
> +#else
>  
> +void vtime_account_user(struct task_struct *tsk)
> +{
> +	cputime_t delta_utime;
> +	struct thread_info *ti = task_thread_info(tsk);
> +
> +	if (ti->ac_utime) {
> +		delta_utime = ti->ac_utime;
> +		account_user_time(tsk, delta_utime, delta_utime);
> +		ti->ac_utime = 0;
> +	}
> +}
> +
> +/*
> + * Called from the context switch with interrupts disabled, to charge all
> + * accumulated times to the current process, and to prepare accounting on
> + * the next process.
> + */
> +void arch_vtime_task_switch(struct task_struct *prev)
> +{
> +	struct thread_info *pi = task_thread_info(prev);
> +	struct thread_info *ni = task_thread_info(current);
> +
> +	ni->ac_stamp = pi->ac_stamp;
> +	ni->ac_stime = ni->ac_utime = 0;
> +}
> +
> +/*
> + * Account time for a transition between system, hard irq or soft irq state.
> + * Note that this function is called with interrupts enabled.
> + */
> +static cputime_t vtime_delta(struct task_struct *tsk)
> +{
> +	struct thread_info *ti = task_thread_info(tsk);
> +	__u32 delta_stime;
> +	__u32 now;
> +
> +	WARN_ON_ONCE(!irqs_disabled());
> +
> +	now = mftbl();
> +
> +	delta_stime = ti->ac_stime + (now - ti->ac_stamp);
> +	ti->ac_stime = 0;
> +	ti->ac_stamp = now;
> +
> +	return (cputime_t)delta_stime;
> +}
> +
> +void vtime_account_system(struct task_struct *tsk)
> +{
> +	cputime_t delta = vtime_delta(tsk);
> +
> +	account_system_time(tsk, 0, delta, delta);
> +}
> +EXPORT_SYMBOL_GPL(vtime_account_system);
> +
> +void vtime_account_idle(struct task_struct *tsk)
> +{
> +	account_idle_time(vtime_delta(tsk));
> +}
> +
> +#endif
> +
>  #else /* ! CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
>  #define calc_cputime_factors()
>  #endif
> @@ -871,6 +939,8 @@
>  		       ppc_proc_freq / 1000000, ppc_proc_freq % 1000000);
>  	}
>  
> +	mttbl(0);
> +	mttbu(0);
>  	tb_ticks_per_jiffy = ppc_tb_freq / HZ;
>  	tb_ticks_per_sec = ppc_tb_freq;
>  	tb_ticks_per_usec = ppc_tb_freq / 1000000;
> Index: b/arch/powerpc/kernel/entry_32.S
> ===================================================================
> --- a/arch/powerpc/kernel/entry_32.S	(revision 5607)
> +++ b/arch/powerpc/kernel/entry_32.S	(revision 5611)
> @@ -177,6 +177,12 @@
>  	addi	r12,r12,-1
>  	stw	r12,4(r11)
>  #endif
> +#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
> +	CURRENT_THREAD_INFO(r9, r1)
> +	tophys(r9, r9)
> +	ACCOUNT_CPU_USER_ENTRY(r9, r11, r12)
> +#endif
> +
>  	b	3f
>  
>  2:	/* if from kernel, check interrupted DOZE/NAP mode and
> @@ -406,6 +412,13 @@
>  	lwarx	r7,0,r1
>  END_FTR_SECTION_IFSET(CPU_FTR_NEED_PAIRED_STWCX)
>  	stwcx.	r0,0,r1			/* to clear the reservation */
> +#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
> +	andi.	r4,r8,MSR_PR
> +	beq	3f
> +	CURRENT_THREAD_INFO(r4, r1)
> +	ACCOUNT_CPU_USER_EXIT(r4, r5, r7)
> +3:
> +#endif
>  	lwz	r4,_LINK(r1)
>  	lwz	r5,_CCR(r1)
>  	mtlr	r4
> @@ -841,6 +854,10 @@
>  	andis.	r10,r0,DBCR0_IDM@h
>  	bnel-	load_dbcr0
>  #endif
> +#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
> +	CURRENT_THREAD_INFO(r9, r1)
> +	ACCOUNT_CPU_USER_EXIT(r9, r10, r11)
> +#endif
>  
>  	b	restore
>  
> Index: b/arch/powerpc/kernel/asm-offsets.c
> ===================================================================
> --- a/arch/powerpc/kernel/asm-offsets.c	(revision 5607)
> +++ b/arch/powerpc/kernel/asm-offsets.c	(revision 5611)
> @@ -167,6 +167,12 @@
>  	DEFINE(TI_PREEMPT, offsetof(struct thread_info, preempt_count));
>  	DEFINE(TI_TASK, offsetof(struct thread_info, task));
>  	DEFINE(TI_CPU, offsetof(struct thread_info, cpu));
> +#if defined(CONFIG_VIRT_CPU_ACCOUNTING_NATIVE) && defined(CONFIG_PPC32)
> +	DEFINE(TI_AC_STAMP, offsetof(struct thread_info, ac_stamp));
> +	DEFINE(TI_AC_LEAVE, offsetof(struct thread_info, ac_leave));
> +	DEFINE(TI_AC_STIME, offsetof(struct thread_info, ac_stime));
> +	DEFINE(TI_AC_UTIME, offsetof(struct thread_info, ac_utime));
> +#endif
>  
>  #ifdef CONFIG_PPC64
>  	DEFINE(DCACHEL1LINESIZE, offsetof(struct ppc64_caches, dline_size));
> Index: b/arch/powerpc/include/asm/thread_info.h
> ===================================================================
> --- a/arch/powerpc/include/asm/thread_info.h	(revision 5607)
> +++ b/arch/powerpc/include/asm/thread_info.h	(revision 5611)
> @@ -43,6 +43,12 @@
>  	int		cpu;			/* cpu we're on */
>  	int		preempt_count;		/* 0 => preemptable,
>  						   <0 => BUG */
> +#if defined(CONFIG_VIRT_CPU_ACCOUNTING_NATIVE) && defined(CONFIG_PPC32)
> +	u32 ac_stamp;
> +	u32 ac_leave;
> +	u32 ac_stime;
> +	u32 ac_utime;
> +#endif
>  	struct restart_block restart_block;
>  	unsigned long	local_flags;		/* private flags for thread */
>  
> Index: b/arch/powerpc/include/asm/cputime.h
> ===================================================================
> --- a/arch/powerpc/include/asm/cputime.h	(revision 5607)
> +++ b/arch/powerpc/include/asm/cputime.h	(revision 5611)
> @@ -228,7 +228,11 @@
>  
>  #define cputime64_to_clock_t(ct)	cputime_to_clock_t((cputime_t)(ct))
>  
> +#ifdef CONFIG_PPC64
>  static inline void arch_vtime_task_switch(struct task_struct *tsk) { }
> +#else
> +extern void arch_vtime_task_switch(struct task_struct *tsk);
> +#endif
>  
>  #endif /* __KERNEL__ */
>  #endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
> Index: b/arch/powerpc/include/asm/ppc_asm.h
> ===================================================================
> --- a/arch/powerpc/include/asm/ppc_asm.h	(revision 5607)
> +++ b/arch/powerpc/include/asm/ppc_asm.h	(revision 5611)
> @@ -25,10 +25,16 @@
>   */
>  
>  #ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
> +#ifdef CONFIG_PPC64
>  #define ACCOUNT_CPU_USER_ENTRY(ra, rb)
>  #define ACCOUNT_CPU_USER_EXIT(ra, rb)
> +#else /* CONFIG_PPC64 */
> +#define ACCOUNT_CPU_USER_ENTRY(ti, ra, rb)
> +#define ACCOUNT_CPU_USER_EXIT(ti, ra, rb)
> +#endif /* CONFIG_PPC64 */
>  #define ACCOUNT_STOLEN_TIME
> -#else
> +#else /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
> +#ifdef CONFIG_PPC64
>  #define ACCOUNT_CPU_USER_ENTRY(ra, rb)					\
>  	MFTB(ra);			/* get timebase */		\
>  	ld	rb,PACA_STARTTIME_USER(r13);				\
> @@ -68,7 +74,27 @@
>  #define ACCOUNT_STOLEN_TIME
>  
>  #endif /* CONFIG_PPC_SPLPAR */
> +#else /* CONFIG_PPC64 */
> +#define ACCOUNT_CPU_USER_ENTRY(ti, ra, rb)				\
> +	MFTB(ra);							\
> +	lwz rb, TI_AC_LEAVE(ti);					\
> +	stw ra, TI_AC_STAMP(ti);	/* AC_STAMP = NOW */		\
> +	subf rb, rb, ra;		/* R = NOW - AC_LEAVE */	\
> +	lwz ra, TI_AC_UTIME(ti);					\
> +	add ra, rb, ra;			/* AC_UTIME += R */		\
> +	stw ra, TI_AC_UTIME(ti);					\
>  
> +#define ACCOUNT_CPU_USER_EXIT(ti, ra, rb)				\
> +	MFTB(ra);							\
> +	lwz rb, TI_AC_STAMP(ti);					\
> +	stw ra, TI_AC_LEAVE(ti);					\
> +	subf rb, rb, ra;		/* R = NOW - AC_STAMP */	\
> +	lwz ra, TI_AC_STIME(ti);					\
> +	add ra, rb, ra;			/* AC_STIME += R */		\
> +	stw ra, TI_AC_STIME(ti);					\
> +
> +#endif /* CONFIG_PPC64 */
> +
>  #endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
>  
>  /*
> Index: b/arch/powerpc/platforms/Kconfig.cputype
> ===================================================================
> --- a/arch/powerpc/platforms/Kconfig.cputype	(revision 5607)
> +++ b/arch/powerpc/platforms/Kconfig.cputype	(revision 5611)
> @@ -1,7 +1,6 @@
>  config PPC64
>  	bool "64-bit kernel"
>  	default n
> -	select HAVE_VIRT_CPU_ACCOUNTING
>  	help
>  	  This option selects whether a 32-bit or a 64-bit kernel
>  	  will be built.

^ permalink raw reply

* Re: [PATCH 1/2] powerpc: mm: use macro PGTABLE_EADDR_SIZE instead of digital
From: Aneesh Kumar K.V @ 2014-04-30  5:30 UTC (permalink / raw)
  To: Liu Ping Fan, linuxppc-dev; +Cc: Paul Mackerras
In-Reply-To: <1385000275-5988-1-git-send-email-pingfank@linux.vnet.ibm.com>

Liu Ping Fan <kernelfans@gmail.com> writes:

> In case of extending the eaddr in future, use this macro
> PGTABLE_EADDR_SIZE to ease the maintenance of the code.
>
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

> ---
>  arch/powerpc/mm/slb_low.S | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/mm/slb_low.S b/arch/powerpc/mm/slb_low.S
> index 17aa6df..e0b3cf4 100644
> --- a/arch/powerpc/mm/slb_low.S
> +++ b/arch/powerpc/mm/slb_low.S
> @@ -35,7 +35,7 @@ _GLOBAL(slb_allocate_realmode)
>  	 * check for bad kernel/user address
>  	 * (ea & ~REGION_MASK) >= PGTABLE_RANGE
>  	 */
> -	rldicr. r9,r3,4,(63 - 46 - 4)
> +	rldicr. r9,r3,4,(63 - PGTABLE_EADDR_SIZE - 4)
>  	bne-	8f
>
>  	srdi	r9,r3,60		/* get region */
> -- 
> 1.8.1.4
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply

* Re: [PATCH RFC v12 0/7] MPC512x DMA slave s/g support, OF DMA lookup
From: Alexander Popov @ 2014-04-30  5:46 UTC (permalink / raw)
  To: Gerhard Sittig, Dan Williams, Vinod Koul, Lars-Peter Clausen,
	Arnd Bergmann, Anatolij Gustschin, Andy Shevchenko,
	Alexander Popov, linuxppc-dev, dmaengine
  Cc: devicetree
In-Reply-To: <1398261209-5578-1-git-send-email-a13xp0p0v88@gmail.com>

Hello

2014-04-23 17:53 GMT+04:00 Alexander Popov <a13xp0p0v88@gmail.com>:
> Changes in v12:
>  A new patch (part 2/7) is added to this series.
>  Part 6/7:
>  - change the description of 'compatible' property according part 2/7;
>  - improve the document according Gerhard's feedback;

Could I have a feedback? Is the binding document fine now?
Thanks!

> Alexander Popov (7):
>   dma: mpc512x: reorder mpc8308 specific instructions
>   dma: mpc512x: separate 'compatible' values for MPC512x and MPC8308
>   dma: mpc512x: add support for peripheral transfers
>   dma: mpc512x: fix freeing resources in mpc_dma_probe() and
>     mpc_dma_remove()
>   dma: of: add common xlate function for matching by channel id
>   dma: mpc512x: add device tree binding document
>   dma: mpc512x: register for device tree channel lookup
>
>  .../devicetree/bindings/dma/mpc512x-dma.txt        |  40 +++
>  arch/powerpc/boot/dts/mpc5121.dtsi                 |   1 +
>  arch/powerpc/boot/dts/mpc8308_p1m.dts              |   2 +-
>  arch/powerpc/boot/dts/mpc8308rdb.dts               |   2 +-
>  drivers/dma/mpc512x_dma.c                          | 346 ++++++++++++++++++---
>  drivers/dma/of-dma.c                               |  35 +++
>  include/linux/of_dma.h                             |   4 +
>  7 files changed, 390 insertions(+), 40 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/dma/mpc512x-dma.txt
>
> --
> 1.8.4.2
>

Best regards,
Alexander

^ permalink raw reply

* [PATCH] powerpc: fix skipping call to early_init_fdt_scan_reserved_mem
From: Rob Herring @ 2014-04-30  6:03 UTC (permalink / raw)
  To: linux-kernel, devicetree
  Cc: Rob Herring, Paul Mackerras, Grant Likely, linuxppc-dev,
	Marek Szyprowski

From: Rob Herring <robh@kernel.org>

The call to early_init_fdt_scan_reserved_mem will be skipped if
reserved-ranges is not found. Move the call earlier so that it is called
unconditionally.

Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Tested-by: Stephen Chivers <schivers@csc.com>
---
I found this issue in testing of my fdt clean-up series (thanks to 
Stephen). Since the reserved memory support is new, I don't think it is 
critical to fix this for 3.15. I plan to include this with my fdt series 
for 3.16 unless I hear otherwise.

Rob

 arch/powerpc/kernel/prom.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 668aa47..d657549 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -567,6 +567,8 @@ static void __init early_reserve_mem_dt(void)
 	unsigned long i, len, dt_root;
 	const __be32 *prop;
 
+	early_init_fdt_scan_reserved_mem();
+
 	dt_root = of_get_flat_dt_root();
 
 	prop = of_get_flat_dt_prop(dt_root, "reserved-ranges", &len);
@@ -589,8 +591,6 @@ static void __init early_reserve_mem_dt(void)
 			memblock_reserve(base, size);
 		}
 	}
-
-	early_init_fdt_scan_reserved_mem();
 }
 
 static void __init early_reserve_mem(void)
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH v2] powerpc: Add cpu family documentation
From: Michael Ellerman @ 2014-04-30  6:45 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev, Stephen Rothwell
In-Reply-To: <1391553822.6733.189.camel@snotra.buserror.net>

On Tue, 2014-02-04 at 16:43 -0600, Scott Wood wrote:
> On Sat, 2014-02-01 at 15:35 +1100, Michael Ellerman wrote:
> > This patch adds some documentation on the different cpu families
> > supported by arch/powerpc.
> > 
> > Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
> > ---
> > v2: Reworked formatting to avoid wrapping.
> >     Fixed up Freescale details.
> > 
> > 
> >  Documentation/powerpc/cpu_families.txt | 227 +++++++++++++++++++++++++++++++++
> >  1 file changed, 227 insertions(+)
> >  create mode 100644 Documentation/powerpc/cpu_families.txt
> > 
> > diff --git a/Documentation/powerpc/cpu_families.txt b/Documentation/powerpc/cpu_families.txt
> > new file mode 100644
> > index 0000000..fa4f159
> > --- /dev/null
> > +++ b/Documentation/powerpc/cpu_families.txt
> > @@ -0,0 +1,227 @@
> > +CPU Families
> > +============
> > +
> > +This document tries to summarise some of the different cpu families that exist
> > +and are supported by arch/powerpc.
> > +
> > +
> > +Book3S (aka sPAPR)
> > +------------------
> > +
> > + - Hash MMU
> > + - Mix of 32 & 64 bit
> > +
> > +   +--------------+                              +----------------+
> > +   |  Old POWER   | ---------------------------> | RS64 (threads) |
> > +   +--------------+                              +----------------+
> > +          |
> > +          |
> > +          v
> > +   +--------------+                              +----------------+    +-------+
> > +   |     601      | ---------------------------> |      603       | -> |  740  |
> > +   +--------------+                              +----------------+    +-------+
> > +          |                                              |
> > +          |                                              |
> > +          v                                              v
> > +   +--------------+                              +----------------+    +-------+
> > +   |     604      |                              |    750 (G3)    | -> | 750CX |
> > +   +--------------+                              +----------------+    +-------+
> > +          |                                              |                 |
> > +          |                                              |                 |
> > +          v                                              v                 v
> > +   +--------------+                              +----------------+    +-------+
> > +   | 620 (64 bit) |                              |      7400      |    | 750CL |
> > +   +--------------+                              +----------------+    +-------+
> > +          |                                              |                 |
> > +          |                                              |                 |
> > +          v                                              v                 v
> > +   +--------------+                              +----------------+    +-------+
> > +   |  POWER3/630  |                              |      7410      |    | 750FX |
> > +   +--------------+                              +----------------+    +-------+
> > +          |                                              |
> > +          |                                              |
> > +          v                                              v
> > +   +--------------+                              +----------------+
> > +   |   POWER3+    |                              |      7450      |
> > +   +--------------+                              +----------------+
> > +          |                                              |
> > +          |                                              |
> > +          v                                              v
> > +   +--------------+                              +----------------+
> > +   |    POWER4    |                              |      7455      |
> > +   +--------------+                              +----------------+
> > +          |                                              |
> > +          |                                              |
> > +          v                                              v
> > +   +--------------+                  +-------+   +----------------+
> > +   |   POWER4+    | ---------------> |  970  |   |      7447      |
> > +   +--------------+                  +-------+   +----------------+
> > +          |                              |               |
> > +          |                              |               |
> > +          v                              v               v
> > +   +--------------+     +-------+    +-------+   +----------------+
> > +   |    POWER5    | --> | Cell  |    | 970FX |   |      7448      |
> > +   +--------------+     +-------+    +-------+   +----------------+
> > +          |                              |
> > +          |                              |
> > +          v                              v
> > +   +--------------+                  +-------+
> > +   |   POWER5+    |                  | 970MP |
> > +   +--------------+                  +-------+
> > +          |
> > +          |
> > +          v
> > +   +--------------+
> > +   |   POWER5++   |
> > +   +--------------+
> > +          |
> > +          |
> > +          v
> > +   +--------------+
> > +   |    POWER6    |
> > +   +--------------+
> > +          |
> > +          |
> > +          v
> > +   +--------------+
> > +   |    POWER7    |
> > +   +--------------+
> > +          |
> > +          |
> > +          v
> > +   +--------------+
> > +   |   POWER7+    |
> > +   +--------------+
> > +          |
> > +          |
> > +          v
> > +   +--------------+
> > +   |    POWER8    |
> > +   +--------------+
> > +
> > +
> > +   +---------------+
> > +   | PA6T (64 bit) |
> > +   +---------------+
> 
> Missing e300 (603 derivative) and e600 (7448 derivative).

Happy to add them, where do they hang off?

> > +IBM BookE
> > +---------
> > +
> > + - Software loaded TLB.
> > + - All 32 bit
> > +
> > +   +--------------+
> > +   |     401      |
> > +   +--------------+
> > +          |
> > +          |
> > +          v
> > +   +--------------+
> > +   |     403      |
> > +   +--------------+
> > +          |
> > +          |
> > +          v
> > +   +--------------+
> > +   |     405      |
> > +   +--------------+
> > +          |
> > +          |
> > +          v
> 
> Are 40x considered booke?

You tell me.
 
> > +Motorola/Freescale 8xx
> > +----------------------
> > +
> > + - Software loaded with hardware assist.
> > + - All 32 bit
> > +
> > +   +--------------+
> > +   |     8xx      |
> > +   +--------------+
> > +          |
> > +          |
> > +          v
> > +   +--------------+
> > +   |     850      |
> > +   +--------------+
> 
> Is the core of MPC850 different from other MPC8xx?

Dunno, maybe someone who works at Freescale knows ;)

> > +Freescale BookE
> > +---------------
> > +
> > + - Software loaded TLB.
> > + - e6500 adds HW loaded indirect TLB entries.
> > + - Mix of 32 & 64 bit
> > +
> > +   +--------------+
> > +   |     e200     |
> > +   +--------------+
> > +
> > +
> > +   +--------------------------------+
> > +   |              e500              |
> > +   +--------------------------------+
> > +                   |
> > +                   |
> > +                   v
> > +   +--------------------------------+
> > +   |             e500v2             |
> > +   +--------------------------------+
> > +                   |
> > +                   |
> > +                   v
> > +   +--------------------------------+
> > +   |             e500mc             |
> > +   +--------------------------------+
> > +                   |
> > +                   |
> > +                   v
> > +   +--------------------------------+
> > +   |    e5500 (Book3e) (64 bit)     |
> > +   +--------------------------------+
> > +                   |
> > +                   |
> > +                   v
> > +   +--------------------------------+
> > +   | e6500 (HW TLB) (Multithreaded) |
> > +   +--------------------------------+
> 
> Why (Book3e) on e5500?  e500mc is also an ISA 2.06 book3e core.

OK. I'll move Book3e to e500mc. It's implied that it continues to apply to the
derivatives.

cheers

^ permalink raw reply

* Re: [PATCH V3 2/2] powerpc/pseries: init fault_around_order for pseries
From: Rusty Russell @ 2014-04-30  7:04 UTC (permalink / raw)
  To: Ingo Molnar, Madhavan Srinivasan
  Cc: linux-arch, riel, ak, dave.hansen, peterz, x86, linux-kernel,
	linux-mm, paulus, mgorman, Linus Torvalds, akpm, linuxppc-dev,
	kirill.shutemov
In-Reply-To: <20140429070632.GB27951@gmail.com>

Ingo Molnar <mingo@kernel.org> writes:
> * Madhavan Srinivasan <maddy@linux.vnet.ibm.com> wrote:
>
>> Performance data for different FAULT_AROUND_ORDER values from 4 socket
>> Power7 system (128 Threads and 128GB memory). perf stat with repeat of 5
>> is used to get the stddev values. Test ran in v3.14 kernel (Baseline) and
>> v3.15-rc1 for different fault around order values.
>> 
>> FAULT_AROUND_ORDER      Baseline        1               3               4               5               8
>> 
>> Linux build (make -j64)
>> minor-faults            47,437,359      35,279,286      25,425,347      23,461,275      22,002,189      21,435,836
>> times in seconds        347.302528420   344.061588460   340.974022391   348.193508116   348.673900158   350.986543618
>>  stddev for time        ( +-  1.50% )   ( +-  0.73% )   ( +-  1.13% )   ( +-  1.01% )   ( +-  1.89% )   ( +-  1.55% )
>>  %chg time to baseline                  -0.9%           -1.8%           0.2%            0.39%           1.06%
>
> Probably too noisy.

A little, but 3 still looks like the winner.

>> Linux rebuild (make -j64)
>> minor-faults            941,552         718,319         486,625         440,124         410,510         397,416
>> times in seconds        30.569834718    31.219637539    31.319370649    31.434285472    31.972367174    31.443043580
>>  stddev for time        ( +-  1.07% )   ( +-  0.13% )   ( +-  0.43% )   ( +-  0.18% )   ( +-  0.95% )   ( +-  0.58% )
>>  %chg time to baseline                  2.1%            2.4%            2.8%            4.58%           2.85%
>
> Here it looks like a speedup. Optimal value: 5+.

No, lower time is better.  Baseline (no faultaround) wins.


etc.

It's not a huge surprise that a 64k page arch wants a smaller value than
a 4k system.  But I agree: I don't see much upside for FAO > 0, but I do
see downside.

Most extreme results:
Order 1: 2% loss on recompile.  10% win 4% loss on seq.  9% loss random.
Order 3: 2% loss on recompile.  6% win 5% loss on seq.  14% loss on random.
Order 4: 2.8% loss on recompile. 10% win 7% loss on seq.  9% loss on random.

> I'm starting to suspect that maybe workloads ought to be given a 
> choice in this matter, via madvise() or such.

I really don't think they'll be able to use it; it'll change far too
much with machine and kernel updates.  I think we should apply patch #1
(with fixes) to make it a variable, then set it to 0 for PPC.

Cheers,
Rusty.

^ permalink raw reply

* [PATCH] PPC: BOOK3S: Disable/Enable TM looking at the ibm, pa-features device tree entry
From: Aneesh Kumar K.V @ 2014-04-30  8:11 UTC (permalink / raw)
  To: benh, paulus; +Cc: linuxppc-dev, Aneesh Kumar K.V

Runtime disable transactional memory feature looking at pa-features
device tree entry. This provides a mechanism to disable TM on P8
systems.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/prom.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 668aa4791fd7..537bd7e7db0b 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -161,6 +161,11 @@ static struct ibm_pa_feature {
 	{CPU_FTR_NODSISRALIGN, 0, 0,	1, 1, 1},
 	{0, MMU_FTR_CI_LARGE_PAGE, 0,	1, 2, 0},
 	{CPU_FTR_REAL_LE, PPC_FEATURE_TRUE_LE, 5, 0, 0},
+	/*
+	 * We should use CPU_FTR_TM_COMP so that if we disable TM, it won't get
+	 * enabled via device tree
+	 */
+	{CPU_FTR_TM_COMP, 0, 0,		22, 0, 0},
 };
 
 static void __init scan_features(unsigned long node, unsigned char *ftrs,
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH V3 2/2] powerpc/pseries: init fault_around_order for pseries
From: Madhavan Srinivasan @ 2014-04-30  8:15 UTC (permalink / raw)
  To: Rusty Russell, Ingo Molnar
  Cc: linux-arch, riel, ak, dave.hansen, peterz, x86, linux-kernel,
	linux-mm, paulus, mgorman, Linus Torvalds, akpm, linuxppc-dev,
	kirill.shutemov
In-Reply-To: <87d2fz47tg.fsf@rustcorp.com.au>

On Wednesday 30 April 2014 12:34 PM, Rusty Russell wrote:
> Ingo Molnar <mingo@kernel.org> writes:
>> * Madhavan Srinivasan <maddy@linux.vnet.ibm.com> wrote:
>>
>>> Performance data for different FAULT_AROUND_ORDER values from 4 socket
>>> Power7 system (128 Threads and 128GB memory). perf stat with repeat of 5
>>> is used to get the stddev values. Test ran in v3.14 kernel (Baseline) and
>>> v3.15-rc1 for different fault around order values.
>>>
>>> FAULT_AROUND_ORDER      Baseline        1               3               4               5               8
>>>
>>> Linux build (make -j64)
>>> minor-faults            47,437,359      35,279,286      25,425,347      23,461,275      22,002,189      21,435,836
>>> times in seconds        347.302528420   344.061588460   340.974022391   348.193508116   348.673900158   350.986543618
>>>  stddev for time        ( +-  1.50% )   ( +-  0.73% )   ( +-  1.13% )   ( +-  1.01% )   ( +-  1.89% )   ( +-  1.55% )
>>>  %chg time to baseline                  -0.9%           -1.8%           0.2%            0.39%           1.06%
>>
>> Probably too noisy.
> 
> A little, but 3 still looks like the winner.
> 
>>> Linux rebuild (make -j64)
>>> minor-faults            941,552         718,319         486,625         440,124         410,510         397,416
>>> times in seconds        30.569834718    31.219637539    31.319370649    31.434285472    31.972367174    31.443043580
>>>  stddev for time        ( +-  1.07% )   ( +-  0.13% )   ( +-  0.43% )   ( +-  0.18% )   ( +-  0.95% )   ( +-  0.58% )
>>>  %chg time to baseline                  2.1%            2.4%            2.8%            4.58%           2.85%
>>
>> Here it looks like a speedup. Optimal value: 5+.
> 
> No, lower time is better.  Baseline (no faultaround) wins.
> 
> 
> etc.
> 
> It's not a huge surprise that a 64k page arch wants a smaller value than
> a 4k system.  But I agree: I don't see much upside for FAO > 0, but I do
> see downside.
> 
> Most extreme results:
> Order 1: 2% loss on recompile.  10% win 4% loss on seq.  9% loss random.
> Order 3: 2% loss on recompile.  6% win 5% loss on seq.  14% loss on random.
> Order 4: 2.8% loss on recompile. 10% win 7% loss on seq.  9% loss on random.
> 
>> I'm starting to suspect that maybe workloads ought to be given a 
>> choice in this matter, via madvise() or such.
> 
> I really don't think they'll be able to use it; it'll change far too
> much with machine and kernel updates.  I think we should apply patch #1
> (with fixes) to make it a variable, then set it to 0 for PPC.
> 

Ok. Will do.

Thanks for review
With regards
Maddy


> Cheers,
> Rusty.
> 

^ permalink raw reply

* Re: [PATCH 0/3] Add new ptrace request macros on PowerPC
From: Anshuman Khandual @ 2014-04-30  8:16 UTC (permalink / raw)
  To: Michael Neuling; +Cc: Linux PPC dev, oleg, linux-kernel, roland, avagin
In-Reply-To: <20730.1398817745@ale.ozlabs.ibm.com>

On 04/30/2014 05:59 AM, Michael Neuling wrote:
> Anshuman Khandual <khandual@linux.vnet.ibm.com> wrote:
> 
>> On 04/29/2014 01:52 PM, Michael Neuling wrote:
>>> That's not what that patch does. It shouldn't make any user visible changes
>>> to DSCR or PPR.
>>
>> It may not when it runs uninterrupted but after the tracee process has
>> stopped, thread.dscr reflects the default DSCR value as mentioned
>> before. This can be proved by changing the "dscr_default" value in
>> arch/powerpc/sysfs.c file.
> 
> The intention with DSCR is that if the user changes the DSCR, the kernel
> should always save/restore it.  If you are seeing something else, then
> that is a bug.  Anton has a test case for this here:
> 
>   http://ozlabs.org/~anton/junkcode/dscr_explicit_test.c
> 
> If that is failing, then there is a bug that we need to fix.
> 

Anton's above DSCR test passed.

> The PPR is the same, except that the kernel can change it over a
> syscall.
> 
>>> Over syscall PPR and DSCR may change.
> 
> Sorry, this should be only PPR.  DSCR shouldn't change over a syscall,
> at least that's the intention.
> 
>>> Depending on your test case, that may
>>> be your problem.
>>
>> I would guess when the tracee process stops for ptrace analysis, tm_reclaim or
>> tm_recheckpoint path might be crossed which is causing this dscr_default value
>> to go into thread_struct.
> 
> That shouldn't happen.  If that's happening, it's a bug.

I would believe this is happening. Also after reverting the commit
e9bdc3d6143d1c4b8d8ce5231, thread.dscr reflects the same value as that
of thread.tm_dscr which is the check pointed DSCR register value just
before the transaction started. So even the NIP has moved passed the point
where the user changes DSCR inside the transaction, thread.dscr is unable
to capture that latest value. But thread.dscr must contain the latest user
changed value of DSCR which is definitely not happening here. So there is
a problem we need to fix. 

^ permalink raw reply

* Re: [PATCH 0/6] Implement split core for POWER8
From: Alexander Graf @ 2014-04-30  9:52 UTC (permalink / raw)
  To: Michael Neuling; +Cc: linuxppc-dev, kvm-ppc, Paul Mackerras, kvm
In-Reply-To: <9524.1398834577@ale.ozlabs.ibm.com>


On 30.04.14 07:09, Michael Neuling wrote:
>> This patch series implements split core mode on POWER8.  This enables up to 4
>> subcores per core which can each independently run guests (per guest SPRs like
>> SDR1, LPIDR etc are replicated per subcore).  Lots more documentation on this
>> feature in the code and commit messages.
>>
>> Most of this code is in the powernv platform but there's a couple of KVM
>> specific patches too.
>>
>> Alex: If you're happy with the KVM patches, please ACK them and benh can hold
>> this series.
> Alex,
>
> Any chance we can get an ACK on these two KVM patches so benh can put
> this series in his next branch?

For patches 1/6 and 5/6:

   Acked-by: Alexander Graf <agraf@suse.de>

However, please make sure to get Paul's ack as well.


Alex

^ permalink raw reply

* GNU binutils & PowerPC VLE
From: Martin Hinner @ 2014-04-30  9:59 UTC (permalink / raw)
  To: linuxppc-dev

Hello,

  I know this topic does not exactly fit this mailing list, but I
think it's better to ask here than on binutils.

  I need to assemble and disassemble some PowerPC VLE (variable length
encoding)  e200 code. Binutils as of 2.23.2 support VLE. But when I
try to use objdump -D with  -m powerpc:vle or -M vle I get plain
32-bit powerpc ISA disassembly (instead of 16bit instructions). Tried
to change input file format, endianity, etc. What am I doing wrong?

Thank you,

Martin

^ permalink raw reply

* Re: [PATCH 00/13] Refactor pci_is_brdige() to simplify code
From: Bjorn Helgaas @ 2014-04-30 16:29 UTC (permalink / raw)
  To: Yijing Wang
  Cc: Tony Luck, linux-ia64@vger.kernel.org, x86@kernel.org,
	linux-kernel@vger.kernel.org, sparclinux, Thomas Gleixner,
	linuxppc-dev, David S. Miller
In-Reply-To: <1398417515-8740-1-git-send-email-wangyijing@huawei.com>

On Fri, Apr 25, 2014 at 3:18 AM, Yijing Wang <wangyijing@huawei.com> wrote:
> This patchset rename the current pci_is_bridge() to pci_has_subordinate(),
> and introduce a new pci_is_bridge() which determine pci bridge by check
> dev->hdr_type. The new one is more accurate. PCIe Spec define the pci
> device is a bridge by the dev->hdr_type = 0x01 || 0x02.

This needs to be posted to the linux-pci list.  The fact that it
wasn't means it's not in patchwork, so it's not on my to-do list.

Currently we have one interface: pci_is_bridge().

After your series, we would have two interfaces: pci_is_bridge() and
pci_has_subordinate().  Presumably, both are used, and you should
explain how you decided which to use at each place.

I assume the difference is that the old pci_is_bridge() is true for a
bridge that has a subordinate bus.  The new pci_is_bridge() is true
for any bridge, even if there is no subordinate bus.  When do we even
have a bridge with no subordinate bus?  This is the sort of stuff you
need to explain so we know why we should apply these patches.

Bjorn

> Yijing Wang (13):
>   PCI: rename pci_is_bridge() to pci_has_subordinate()
>   PCI: Introduce new pci_is_bridge() helper function
>   PCI: Use new pci_is_bridge() to simplify code
>   x86/PCI: Use new pci_is_bridge() to simplify code
>   IA64/PCI: Use new pci_is_bridge() to simplify code
>   powerpc/PCI: Use new pci_is_bridge() to simplify code
>   sparc/PCI: Use new pci_is_bridge() to simplify code
>   PCI, rpaphp: Use new pci_is_bridge() to simplify code
>   PCI, shpchp: Use new pci_is_bridge() to simplify code
>   PCI, cpcihp: Use new pci_is_bridge() to simplify code
>   PCI, acpiphp: Use new pci_is_bridge() to simplify code
>   PCI, pcmcia: Use new pci_is_bridge() to simplify code
>   PCI, pciehp: Use new pci_is_bridge() to simplify code
>
>  arch/ia64/pci/fixup.c                  |    4 +---
>  arch/powerpc/kernel/pci-hotplug.c      |    3 +--
>  arch/powerpc/kernel/pci_of_scan.c      |    3 +--
>  arch/sparc/kernel/pci.c                |    3 +--
>  arch/x86/pci/fixup.c                   |    4 +---
>  drivers/pci/hotplug/acpiphp_glue.c     |    3 +--
>  drivers/pci/hotplug/cpci_hotplug_pci.c |    3 +--
>  drivers/pci/hotplug/pciehp_pci.c       |    3 +--
>  drivers/pci/hotplug/rpadlpar_core.c    |    3 +--
>  drivers/pci/hotplug/shpchp_pci.c       |    3 +--
>  drivers/pci/pci-acpi.c                 |    8 +-------
>  drivers/pci/pci-driver.c               |    8 ++++----
>  drivers/pci/pci.h                      |    2 +-
>  drivers/pci/probe.c                    |    3 +--
>  drivers/pci/setup-bus.c                |    4 +---
>  drivers/pcmcia/cardbus.c               |    3 +--
>  include/linux/pci.h                    |    6 ++++++
>  17 files changed, 25 insertions(+), 41 deletions(-)
>
>

^ permalink raw reply

* Re: [PATCH v2] powerpc: Add cpu family documentation
From: Tom Musta @ 2014-04-30 18:14 UTC (permalink / raw)
  To: Michael Ellerman, Scott Wood; +Cc: linuxppc-dev, Stephen Rothwell
In-Reply-To: <1398840308.5722.5.camel@concordia>

On 4/30/2014 1:45 AM, Michael Ellerman wrote:
>> > Are 40x considered booke?
> You tell me.
>  

The original 401, 403 and 405 cores predate the actual existence of what we now call Book E.
But they most certainly contained features that would eventually become Book E (different timers,
software managed TLB, etc.)  For the sake of this diagram, I would say "yes".

^ permalink raw reply

* Re: [PATCH] powerpc 32: Provides VIRT_CPU_ACCOUNTING
From: Scott Wood @ 2014-04-30 18:14 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Paul Mackerras, linux-kernel
In-Reply-To: <1398833787.31586.8.camel@pasglop>

On Wed, 2014-04-30 at 14:56 +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2014-03-19 at 17:05 -0500, Scott Wood wrote:
> > On Wed, 2014-03-19 at 22:52 +0100, Christophe Leroy wrote:
> > > This patch provides VIRT_CPU_ACCOUTING to PPC32 architecture.
> > > Unlike PPC64, PPC32 doesn't provide the PACA register. Therefore the
> > > implementation is similar to the one done in the IA64 architecture.
> > > It is based on additional information added to the Task Info structure.
> > 
> > PACA isn't a register -- just a convention for how Linux uses a GPR.
> > Maybe it's time to use it on PPC32 as well?
> 
> PACA is actually a data structure and you really really don't want it
> on ppc32 :-) Having a register point to current works, having a register
> point to per-cpu data instead works too (ie, change what we do today),
> but don't introduce a PACA *please* :-)

What is special about 64-bit that warrants doing things differently from
32-bit?

What is the difference between PACA and "per-cpu data", other than the
obscure name?
 
-Scott

^ permalink raw reply

* Re: [PATCH v2] powerpc: Add cpu family documentation
From: Scott Wood @ 2014-04-30 18:26 UTC (permalink / raw)
  To: Tom Musta; +Cc: Stephen Rothwell, linuxppc-dev
In-Reply-To: <53613D7B.8090001@gmail.com>

On Wed, 2014-04-30 at 13:14 -0500, Tom Musta wrote:
> On 4/30/2014 1:45 AM, Michael Ellerman wrote:
> >> > Are 40x considered booke?
> > You tell me.
> >  
> 
> The original 401, 403 and 405 cores predate the actual existence of what we now call Book E.
> But they most certainly contained features that would eventually become Book E (different timers,
> software managed TLB, etc.)  For the sake of this diagram, I would say "yes".

CONFIG_BOOKE doesn't get set on 40x builds...

-Scott

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox