LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 0/3] Enable multiple MSI feature in pSeries
From: Mike Qiu @ 2013-05-22  5:57 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <20130521144548.GB21632@dhcp-26-207.brq.redhat.com>

于 2013/5/21 22:45, Alexander Gordeev 写道:
> On Tue, Jan 15, 2013 at 03:38:53PM +0800, Mike Qiu wrote:
>> The test results is shown by 'cat /proc/interrups':
>>            CPU0       CPU1       CPU2       CPU3
>> 16:     240458     261601     226310     200425      XICS Level     IPI
>> 17:          0          0          0          0      XICS Level     RAS_EPOW
>> 18:         10          0          3          2      XICS Level     hvc_console
>> 19:     122182      28481      28527      28864      XICS Level     ibmvscsi
>> 20:        506    7388226        108        118      XICS Level     eth0
>> 21:          6          5          5          5      XICS Level     host1-0
>> 22:        817        814        816        813      XICS Level     host1-1
> Hi Mike,
>
> I am curious if pSeries firmware allows changing affinity masks independently
> for multiple MSIs? I.e. in your example, would it be possible to assign IRQ21
> and IRQ22 to different CPUs?
Yes, as Ben says, this is very different from other firmware :)

Thanks
Mike
>
> Thanks!
>
>> LOC:     398077     316725     231882     203049   Local timer interrupts
>> SPU:       1659        919        961        903   Spurious interrupts
>> CNT:          0          0          0          0   Performance
>> monitoring interrupts
>> MCE:          0          0          0          0   Machine check exceptions

^ permalink raw reply

* RE: SATA hang on 8315E triggered by heavy flash write?
From: Xie Shaohui-B21989 @ 2013-05-22  6:15 UTC (permalink / raw)
  To: Anthony Foiani; +Cc: Wood Scott-B07421, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <gppwj3ayu.fsf@dworkin.scrye.com>

Hi, Anthony Foiani,

Please confirm what is the key operation to reproduce the error.
1. only update NOR for a long enough time, for ex. tens of seconds, see if =
error happens;
2. only r/w SSD without NOR operation, see if error happens;
3. r/w SSD first and keep it run, then start to read NOR, if no error for a=
 long time, then start to write NOR, see how long the error will happen.

Best Regards,=20
Shaohui Xie


> -----Original Message-----
> From: Anthony Foiani [mailto:tkil@scrye.com]
> Sent: Wednesday, May 22, 2013 12:17 PM
> To: Wood Scott-B07421
> Cc: linuxppc-dev@lists.ozlabs.org; Xie Shaohui-B21989
> Subject: Re: SATA hang on 8315E triggered by heavy flash write?
>=20
>=20
> Scott --
>=20
> Scott Wood <scottwood@freescale.com> writes:
>=20
> > On 05/15/2013 03:12:21 AM, Anthony Foiani wrote:
> >> At this point, /dev/sda is pretty much unusable, and I have to do at
> >> least a reboot to recover.  (I don't recall if I had to do a power
> >> cycle at this point, though.)
>=20
> For whatever it's worth, a hard boot (full power cycle) is indeed
> necessary at this point.
>=20
> >> I suspect that it is related to errata eLBC-A001 (from MPC8315E Chip
> >> Errata, Rev. 3, 09/2011):
> >> ...
> >> But it seems that erratum is already fixed:
> >>
> >>   http://patchwork.ozlabs.org/patch/96339/
> >>   (git patch d08e44570e)
> >>
> >> Am I reading that correctly?
> >
> > Yes, that erratum has been worked around.
>=20
> Ok, thanks for the confirmation.
>=20
> >> (I'm already writing only one flash sector at a time, but it might be
> >> that even a single 0x10000-byte sector takes long enough to trigger
> >> the issue.)
> >
> > I don't think this erratum is relevant.  Unlike NAND, NOR flash does
> > not involve holding the localbus for extended periods of time.
>=20
> I wasn't sure about the mechanism of the erratum, and it seemed awfully
> close, so I thought I'd go fishing.  Guess I missed.  :(
>=20
> It is NOR writes, btw; I do both in my application, but the initial error
> always seems to occur during a NOR write.  (In this device, kernel +
> devtree go into NOR flash, ramdisk goes into NAND flash, and data goes to
> SSD... stop laughing.)
>=20
> Here's the most recent hang.  First, to compare the application log
> timestamps with the kernel log timestamps:
>=20
>   # mix of kernel and application log, note that kernel is about +12s.
>   +0.537506 main.0 [0]: rc: fork took 9.376ms
>   [   12.892323] PHY: mdio@e0024520:01 - Link is Up - 100/Full
>   +1.603034 main.0 [0]: schs: ctor: done
>=20
> The console output is:
>=20
>   # console log
>   [318334.294126] ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x0 action
> 0xe frozen
>   [318334.301515] ata2.00: PHY RDY changed
>   [318334.305301] ata2.00: failed command: WRITE DMA
>   [318334.309991] ata2.00: cmd ca/00:08:b0:00:18/00:00:00:00:00/e1 tag 0
> dma 4096 out
>   [318334.310015]          res 50/00:00:08:61:25/00:00:00:00:00/e1 Emask
> 0x10 (ATA bus error)
>   [318334.325689] ata2.00: status: { DRDY }
>   [318334.329717] ata2: hard resetting link
>   [318334.836038] ata2: Hardreset failed, not off-lined 0
>   [318334.848407] ata2: setting speed (in hard reset)
>   [318344.456050] ata2: No Signature Update
>   [318344.631916] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
>   [318344.638354] ata2.00: link online but device misclassified
>   [318349.643897] ata2.00: qc timeout (cmd 0xec)
>   [318349.648268] ata2.00: failed to IDENTIFY (I/O error, err_mask=3D0x4)
>   [318349.654562] ata2.00: revalidation failed (errno=3D-5)
>   [318349.659667] ata2: hard resetting link
>   [318350.163864] ata2: Hardreset failed, not off-lined 0
>   [318350.175869] ata2: setting speed (in hard reset)
>   [318359.771956] ata2: No Signature Update
>   [318359.947901] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
>   [318359.954342] ata2.00: link online but device misclassified
>   [318369.959921] ata2.00: qc timeout (cmd 0xec)
>   [318369.964279] ata2.00: failed to IDENTIFY (I/O error, err_mask=3D0x4)
>   [318369.970567] ata2.00: revalidation failed (errno=3D-5)
>   [318369.975658] ata2: hard resetting link
>   [318370.479933] ata2: Hardreset failed, not off-lined 0
>   [318370.491880] ata2: setting speed (in hard reset)
>   [318380.083892] ata2: No Signature Update
>=20
> And my application log:
>=20
>   # application log
>   +318320.957019 sw-upd.0 [29]: fm: nor0: write: writing 0x10000
> @0x180000 from buf[0x80000]; attempt 1/3
>   +318322.498346 sw-upd.0 [29]: fm: nor0: write: writing 0x10000
> @0x190000 from buf[0x90000]; attempt 1/3
>   +318323.849995 sw-upd.0 [29]: fm: nor0: write: writing 0x10000
> @0x1a0000 from buf[0xa0000]; attempt 1/3
>   +318325.262559 sw-upd.0 [29]: fm: nor0: write: writing 0x10000
> @0x1b0000 from buf[0xb0000]; attempt 1/3
>   +318326.703213 sw-upd.0 [29]: fm: nor0: write: writing 0x10000
> @0x1c0000 from buf[0xc0000]; attempt 1/3
>=20
> > I also don't see how it would interact with SATA, which is separate
> > from the localbus.
>=20
> No idea.  Is there some other shared resource that might be taxed by this
> type of load?
>=20
> I do get a few other errors, usually just once or twice per boot:
>=20
>   [ 4231.619368] NOHZ: local_softirq_pending 100
>   [ 4232.249935] NOHZ: local_softirq_pending 100
>   [ 4232.312241] NOHZ: local_softirq_pending 100
>   [ 4232.424523] NOHZ: local_softirq_pending 100
>   [ 4233.139146] NOHZ: local_softirq_pending 100
>   [ 4233.328540] NOHZ: local_softirq_pending 100
>   [ 4233.655909] NOHZ: local_softirq_pending 100
>   [ 4234.106578] NOHZ: local_softirq_pending 100
>   [ 4234.853966] NOHZ: local_softirq_pending 100
>   [ 4235.375208] NOHZ: local_softirq_pending 100
>   [11072.027818] hrtimer: interrupt took 126210 ns
>=20
> They seem harmless, though, and (as the timestamps indicate) the machine
> happily ran for 3-4 days after those issues.
>=20
> > Are you seeing any errors on the localbus, or just on SATA?
>=20
> I'm not seeing any errors in the console log -- but I'm not using the LBC
> for anything other than flash writes, SFAIK.  (Unless I2C is handled
> through the LBC, in which case, I have frequent (~50-100/s) small
> transactions all the time -- but the hangs always coincide with flash
> writes, and not with the I2C traffic that is going on all the
> time...)
>=20
> > Hopefully Shaohui (our SATA person) can answer these.  If you don't
> > get an answer, go ahead and open an official support request.
>=20
> I have a (lousy) workaround in hand: don't touch the disk during flash
> updates.  (The flash writes are software updates, which will hopefully be
> fairly rare once I'm done developing this thing.  Until then, though, I'm
> updating it multiple times a day, and have hit this quite a few times by
> now.)
>=20
> So there's no great hurry.  If Shaohui can find something in the next
> week or so, that'd be fantastic; otherwise, I'll open a request.
>=20
> Thanks again!
>=20
> Best regards,
> Anthony Foiani

^ permalink raw reply

* Re: [PATCH 0/3] Enable multiple MSI feature in pSeries
From: Mike Qiu @ 2013-05-22  6:16 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: tglx, Alexander Gordeev, linuxppc-dev, linux-kernel
In-Reply-To: <1369181713.6387.79.camel@pasglop>

于 2013/5/22 8:15, Benjamin Herrenschmidt 写道:
> On Tue, 2013-05-21 at 16:45 +0200, Alexander Gordeev wrote:
>> On Tue, Jan 15, 2013 at 03:38:53PM +0800, Mike Qiu wrote:
>>> The test results is shown by 'cat /proc/interrups':
>>>            CPU0       CPU1       CPU2       CPU3
>>> 16:     240458     261601     226310     200425      XICS Level     IPI
>>> 17:          0          0          0          0      XICS Level     RAS_EPOW
>>> 18:         10          0          3          2      XICS Level     hvc_console
>>> 19:     122182      28481      28527      28864      XICS Level     ibmvscsi
>>> 20:        506    7388226        108        118      XICS Level     eth0
>>> 21:          6          5          5          5      XICS Level     host1-0
>>> 22:        817        814        816        813      XICS Level     host1-1
>> Hi Mike,
>>
>> I am curious if pSeries firmware allows changing affinity masks independently
>> for multiple MSIs? I.e. in your example, would it be possible to assign IRQ21
>> and IRQ22 to different CPUs?
> Yes. Each interrupt has its own affinity, whether it's an MSI or not,
> the affinity is not driven by the address.
>
> Cheers,
> Ben.
Hi Ben,

May this patch be accepted? if so I will send out the 3.9 version.

As Michael Ellerman says, he want to see the performance data,

but this depends on the driver.

It is something like MSI, and the driver can use more than 1 MSI.

That is to say, the driver has more interrupt resource to use,
but whether the driver is full use of the resource, is out of
  this patch's control.

I test this patch use ipr driver, which add multiple MSI
  support by others. and it can work.

Thanks
Mike
>> Thanks!
>>
>>> LOC:     398077     316725     231882     203049   Local timer interrupts
>>> SPU:       1659        919        961        903   Spurious interrupts
>>> CNT:          0          0          0          0   Performance
>>> monitoring interrupts
>>> MCE:          0          0          0          0   Machine check exceptions
>
>

^ permalink raw reply

* [PATCH 4/5] x86, perf: Add conditional branch filtering support
From: Anshuman Khandual @ 2013-05-22  6:22 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel; +Cc: mikey, ak, peterz, eranian, mingo
In-Reply-To: <1369203761-12649-1-git-send-email-khandual@linux.vnet.ibm.com>

From: Peter Zijlstra <a.p.zijlstra@chello.nl>

This patch adds conditional branch filtering support,
enabling it for PERF_SAMPLE_BRANCH_COND in perf branch
stack sampling framework by utilizing an available
software filter X86_BR_JCC.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 arch/x86/kernel/cpu/perf_event_intel_lbr.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index d978353..a0d6387 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -337,6 +337,10 @@ static int intel_pmu_setup_sw_lbr_filter(struct perf_event *event)
 
 	if (br_type & PERF_SAMPLE_BRANCH_IND_CALL)
 		mask |= X86_BR_IND_CALL;
+
+	if (br_type & PERF_SAMPLE_BRANCH_COND)
+		mask |= X86_BR_JCC;
+
 	/*
 	 * stash actual user request into reg, it may
 	 * be used by fixup code for some CPU
@@ -626,6 +630,7 @@ static const int nhm_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
 	 * NHM/WSM erratum: must include IND_JMP to capture IND_CALL
 	 */
 	[PERF_SAMPLE_BRANCH_IND_CALL] = LBR_IND_CALL | LBR_IND_JMP,
+	[PERF_SAMPLE_BRANCH_COND]     = LBR_JCC,
 };
 
 static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
@@ -637,6 +642,7 @@ static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
 	[PERF_SAMPLE_BRANCH_ANY_CALL]	= LBR_REL_CALL | LBR_IND_CALL
 					| LBR_FAR,
 	[PERF_SAMPLE_BRANCH_IND_CALL]	= LBR_IND_CALL,
+	[PERF_SAMPLE_BRANCH_COND]	= LBR_JCC,
 };
 
 /* core */
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH 2/5] powerpc, perf: Enable conditional branch filter for POWER8
From: Anshuman Khandual @ 2013-05-22  6:22 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel; +Cc: mikey, ak, peterz, eranian, mingo
In-Reply-To: <1369203761-12649-1-git-send-email-khandual@linux.vnet.ibm.com>

Enables conditional branch filter support for POWER8
utilizing MMCRA register based filter and also invalidates
a BHRB branch filter combination involving conditional
branches.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 arch/powerpc/perf/power8-pmu.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 8ed323d..e60b38f 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -548,11 +548,21 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type)
 	if (branch_sample_type & PERF_SAMPLE_BRANCH_IND_CALL)
 		return -1;
 
+	/* Invalid branch filter combination - HW does not support */
+	if ((branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) &&
+			(branch_sample_type & PERF_SAMPLE_BRANCH_COND))
+		return -1;
+
 	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) {
 		pmu_bhrb_filter |= POWER8_MMCRA_IFM1;
 		return pmu_bhrb_filter;
 	}
 
+	if (branch_sample_type & PERF_SAMPLE_BRANCH_COND) {
+		pmu_bhrb_filter |= POWER8_MMCRA_IFM3;
+		return pmu_bhrb_filter;
+	}
+
 	/* Every thing else is unsupported */
 	return -1;
 }
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH 5/5] perf, documentation: Description for conditional branch filter
From: Anshuman Khandual @ 2013-05-22  6:22 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel; +Cc: mikey, ak, peterz, eranian, mingo
In-Reply-To: <1369203761-12649-1-git-send-email-khandual@linux.vnet.ibm.com>

Adding documentation support for conditional branch filter.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 tools/perf/Documentation/perf-record.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index d4da111..8b5e1ed 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -169,12 +169,13 @@ following filters are defined:
         - any_call: any function call or system call
         - any_ret: any function return or system call return
         - ind_call: any indirect branch
+        - cond: conditional branches
         - u:  only when the branch target is at the user level
         - k: only when the branch target is in the kernel
         - hv: only when the target is at the hypervisor level
 
 +
-The option requires at least one branch type among any, any_call, any_ret, ind_call.
+The option requires at least one branch type among any, any_call, any_ret, ind_call, cond.
 The privilege levels may be omitted, in which case, the privilege levels of the associated
 event are applied to the branch filter. Both kernel (k) and hypervisor (hv) privilege
 levels are subject to permissions.  When sampling on multiple events, branch stack sampling
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH 3/5] perf, tool: Conditional branch filter 'cond' added to perf record
From: Anshuman Khandual @ 2013-05-22  6:22 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel; +Cc: mikey, ak, peterz, eranian, mingo
In-Reply-To: <1369203761-12649-1-git-send-email-khandual@linux.vnet.ibm.com>

Adding perf record support for new branch stack filter criteria
PERF_SAMPLE_BRANCH_COND.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 tools/perf/builtin-record.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index cdf58ec..833743a 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -676,6 +676,7 @@ static const struct branch_mode branch_modes[] = {
 	BRANCH_OPT("any_call", PERF_SAMPLE_BRANCH_ANY_CALL),
 	BRANCH_OPT("any_ret", PERF_SAMPLE_BRANCH_ANY_RETURN),
 	BRANCH_OPT("ind_call", PERF_SAMPLE_BRANCH_IND_CALL),
+	BRANCH_OPT("cond", PERF_SAMPLE_BRANCH_CONDITIONAL),
 	BRANCH_END
 };
 
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH 1/5] perf: New conditional branch filter criteria in branch stack sampling
From: Anshuman Khandual @ 2013-05-22  6:22 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel; +Cc: mikey, ak, peterz, eranian, mingo
In-Reply-To: <1369203761-12649-1-git-send-email-khandual@linux.vnet.ibm.com>

POWER8 PMU based BHRB supports filtering for conditional branches.
This patch introduces new branch filter PERF_SAMPLE_BRANCH_COND which
will extend the existing perf ABI. Other architectures can provide
this functionality with either HW filtering support (if present) or
with SW filtering of instructions.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 include/uapi/linux/perf_event.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index fb104e5..cb0de86 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -157,8 +157,9 @@ enum perf_branch_sample_type {
 	PERF_SAMPLE_BRANCH_ANY_CALL	= 1U << 4, /* any call branch */
 	PERF_SAMPLE_BRANCH_ANY_RETURN	= 1U << 5, /* any return branch */
 	PERF_SAMPLE_BRANCH_IND_CALL	= 1U << 6, /* indirect calls */
+	PERF_SAMPLE_BRANCH_COND		= 1U << 7, /* conditional branches */
 
-	PERF_SAMPLE_BRANCH_MAX		= 1U << 7, /* non-ABI */
+	PERF_SAMPLE_BRANCH_MAX		= 1U << 8, /* non-ABI */
 };
 
 #define PERF_SAMPLE_BRANCH_PLM_ALL \
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH 0/5] perf: Introducing new conditional branch filter
From: Anshuman Khandual @ 2013-05-22  6:22 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel; +Cc: mikey, ak, peterz, eranian, mingo

This patchset introduces conditional branch filter in perf branch stack
sampling framework incorporating review comments from Michael Neuling,
Peter Zijlstra and Stephane Eranian.

Anshuman Khandual (5):
  perf: New conditional branch filter criteria in branch stack sampling
  powerpc, perf: Enable conditional branch filter for POWER8
  perf, tool: Conditional branch filter 'cond' added to perf record
  x86, perf: Add conditional branch filtering support
  perf, documentation: Description for conditional branch filter

 arch/powerpc/perf/power8-pmu.c             | 10 ++++++++++
 arch/x86/kernel/cpu/perf_event_intel_lbr.c |  6 ++++++
 include/uapi/linux/perf_event.h            |  3 ++-
 tools/perf/Documentation/perf-record.txt   |  3 ++-
 tools/perf/builtin-record.c                |  1 +
 5 files changed, 21 insertions(+), 2 deletions(-)

-- 
1.7.11.7

^ permalink raw reply

* Re: [PATCH 1/1] powerpc: Force 32 bit MSIs on systems lacking firmware support
From: Mike Qiu @ 2013-05-22  6:28 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <201305212154.r4LLs4Zu026123@d01av03.pok.ibm.com>

于 2013/5/22 5:54, Brian King 写道:
> Recent commit e61133dda480062d221f09e4fc18f66763f8ecd0 added support
> for a new firmware feature to force an adapter to use 32 bit MSIs.
> However, this firmware is not available for all systems. The hack below
> allows devices needing 32 bit MSIs to work on these systems as well.
> It is careful to only enable this on Gen2 slots, which should limit
> this to configurations where this hack is needed and tested to work.
>
> Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
> ---
>
>   arch/powerpc/platforms/pseries/msi.c |   31 +++++++++++++++++++++++++++----
>   1 file changed, 27 insertions(+), 4 deletions(-)
>
> diff -puN arch/powerpc/platforms/pseries/msi.c~powerpc_32bit_msi_hack_on_papr arch/powerpc/platforms/pseries/msi.c
> --- linux/arch/powerpc/platforms/pseries/msi.c~powerpc_32bit_msi_hack_on_papr	2013-05-15 10:44:46.000000000 -0500
> +++ linux-bjking1/arch/powerpc/platforms/pseries/msi.c	2013-05-20 15:24:52.000000000 -0500
> @@ -397,10 +397,11 @@ static int check_msix_entries(struct pci
>   static int rtas_setup_msi_irqs(struct pci_dev *pdev, int nvec_in, int type)
>   {
>   	struct pci_dn *pdn;
> -	int hwirq, virq, i, rc;
> +	int hwirq, virq, i, rc = -1;
>   	struct msi_desc *entry;
>   	struct msi_msg msg;
>   	int nvec = nvec_in;
> +	int use_32bit_msi_hack = 0;
>
>   	pdn = get_pdn(pdev);
>   	if (!pdn)
> @@ -428,15 +429,37 @@ static int rtas_setup_msi_irqs(struct pc
>   	 */
>   again:
>   	if (type == PCI_CAP_ID_MSI) {
> -		if (pdn->force_32bit_msi)
> +		if (pdn->force_32bit_msi) {
>   			rc = rtas_change_msi(pdn, RTAS_CHANGE_32MSI_FN, nvec);
> -		else
> +			if (rc < 0) {
> +				/* We only want to run the 32 bit MSI hack below if
> +				 the max bus speed is Gen2 speed. */
> +				if (pdev->bus->max_bus_speed != PCIE_SPEED_5_0GT)
> +					return rc;
> +
> +				use_32bit_msi_hack = 1;
> +			}
> +		}
> +
> +		if (rc < 0)
>   			rc = rtas_change_msi(pdn, RTAS_CHANGE_MSI_FN, nvec);
>
> -		if (rc < 0 && !pdn->force_32bit_msi) {
> +		if (rc < 0) {
>   			pr_debug("rtas_msi: trying the old firmware call.\n");
>   			rc = rtas_change_msi(pdn, RTAS_CHANGE_FN, nvec);
>   		}
> +
> +		if (use_32bit_msi_hack && rc > 0) {
> +			int pos;
> +			u32 addr_hi, addr_lo;
> +
> +			dev_info(&pdev->dev, "rtas_msi: No 32 bit MSI firmware support, forcing 32 bit MSI\n");
> +			pos = pci_find_capability(pdev, PCI_CAP_ID_MSI);
> +			pci_read_config_dword(pdev, pos + PCI_MSI_ADDRESS_HI, &addr_hi);
> +			addr_lo = 0xffff0000 | ((addr_hi >> (48 - 32)) << 4);
> +			pci_write_config_dword(pdev, pos + PCI_MSI_ADDRESS_LO, addr_lo);
> +			pci_write_config_dword(pdev, pos + PCI_MSI_ADDRESS_HI, 0);
I think here you can use catched dev->msi_cap for better.

Thanks
Mike
> +		}
>   	} else
>   		rc = rtas_change_msi(pdn, RTAS_CHANGE_MSIX_FN, nvec);
>
> _
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
>
>

^ permalink raw reply

* Re: [PATCH 3/3] perf, x86, lbr: Demand proper privileges for PERF_SAMPLE_BRANCH_KERNEL
From: Anshuman Khandual @ 2013-05-22  6:43 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Michael Neuling, ak@linux.intel.com, Peter Zijlstra, LKML,
	Linux PPC dev, Ingo Molnar
In-Reply-To: <CABPqkBRpbh4BO7cMBysE+E+n5E_o_KesEhEZ3Wq49B3MeW786A@mail.gmail.com>

On 05/21/2013 07:25 PM, Stephane Eranian wrote:
> On Thu, May 16, 2013 at 12:15 PM, Michael Neuling <mikey@neuling.org> wrote:
>> Peter Zijlstra <peterz@infradead.org> wrote:
>>
>>> On Wed, May 15, 2013 at 03:37:22PM +0200, Stephane Eranian wrote:
>>>> On Fri, May 3, 2013 at 2:11 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>>>>> We should always have proper privileges when requesting kernel data.
>>>>>
>>>>> Cc: Andi Kleen <ak@linux.intel.com>
>>>>> Cc: eranian@google.com
>>>>> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
>>>>> Link: http://lkml.kernel.org/n/tip-v0x9ky3ahzr6nm3c6ilwrili@git.kernel.org
>>>>> ---
>>>>>  arch/x86/kernel/cpu/perf_event_intel_lbr.c |    5 ++++-
>>>>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>>>>
>>>>> --- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
>>>>> +++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
>>>>> @@ -318,8 +318,11 @@ static void intel_pmu_setup_sw_lbr_filte
>>>>>         if (br_type & PERF_SAMPLE_BRANCH_USER)
>>>>>                 mask |= X86_BR_USER;
>>>>>
>>>>> -       if (br_type & PERF_SAMPLE_BRANCH_KERNEL)
>>>>> +       if (br_type & PERF_SAMPLE_BRANCH_KERNEL) {
>>>>> +               if (perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN))
>>>>> +                       return -EACCES;
>>>>>                 mask |= X86_BR_KERNEL;
>>>>> +       }
>>>>>
>>>> This will prevent regular users from capturing kernel -> kernel branches.
>>>> But it won't prevent users from getting kernel -> user branches. Thus
>>>> some kernel address will still be captured. I guess they could be eliminated
>>>> by the sw_filter.
>>>>
>>>> When using LBR priv level filtering, the filter applies to the branch target
>>>> only.
>>>
>>> How about something like the below? It also adds the branch flags
>>> Mikey wanted for PowerPC.
>>
>> Peter,
>>
>> BTW PowerPC also has the ability to filter on conditional branches.  Any
>> chance we could add something like the follow to perf also?
>>
>> Mikey
>>
>> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
>> index fb104e5..891c769 100644
>> --- a/include/uapi/linux/perf_event.h
>> +++ b/include/uapi/linux/perf_event.h
>> @@ -157,8 +157,9 @@ enum perf_branch_sample_type {
>>         PERF_SAMPLE_BRANCH_ANY_CALL     = 1U << 4, /* any call branch */
>>         PERF_SAMPLE_BRANCH_ANY_RETURN   = 1U << 5, /* any return branch */
>>         PERF_SAMPLE_BRANCH_IND_CALL     = 1U << 6, /* indirect calls */
>> +       PERF_SAMPLE_BRANCH_CONDITIONAL  = 1U << 7, /* conditional branches */
>>
> I would use PERF_SAMPLE_BRANCH_COND here.
> 
>> -       PERF_SAMPLE_BRANCH_MAX          = 1U << 7, /* non-ABI */
>> +       PERF_SAMPLE_BRANCH_MAX          = 1U << 8, /* non-ABI */
>>  };
>>
>>  #define PERF_SAMPLE_BRANCH_PLM_ALL \
>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
>> index cdf58ec..5b0b89d 100644
>> --- a/tools/perf/builtin-record.c
>> +++ b/tools/perf/builtin-record.c
>> @@ -676,6 +676,7 @@ static const struct branch_mode branch_modes[] = {
>>         BRANCH_OPT("any_call", PERF_SAMPLE_BRANCH_ANY_CALL),
>>         BRANCH_OPT("any_ret", PERF_SAMPLE_BRANCH_ANY_RETURN),
>>         BRANCH_OPT("ind_call", PERF_SAMPLE_BRANCH_IND_CALL),
>> +       BRANCH_OPT("cnd", PERF_SAMPLE_BRANCH_CONDITIONAL),
> 
> use "cond"
> 
>>         BRANCH_END
>>  };
>>
> 
> And if you do this, you also need to update the x86
> perf_event_intel_lbr.c mapping
> tables to fill out the entries for PERF_SAMPLE_BRANCH_COND:
> 
>         [PERF_SAMPLE_BRANCH_COND]       = LBR_JCC,
> 
> And you also need to update intel_pmu_setup_sw_lbr_filter()
> to handle the conversion to x86 instructions:
> 
>        if (br_type & PERF_SAMPLE_BRANCH_COND)
>                 mask |= X86_BR_JCC;
> 
> 
> You also need to update the perf-record.txt documentation to list cond
> as a possible
> branch filter.

Hey Stephane,

I have incorporated all the review comments into the patch series
https://lkml.org/lkml/2013/5/22/51.

Regards
Anshuman

^ permalink raw reply

* Re: [PATCH 1/2] powerpc, perf: Ignore separate BHRB privilege state filter request
From: Michael Ellerman @ 2013-05-22  7:14 UTC (permalink / raw)
  To: Anshuman Khandual; +Cc: linuxppc-dev, mikey, linux-kernel
In-Reply-To: <1369201667-9048-2-git-send-email-khandual@linux.vnet.ibm.com>

On Wed, 2013-05-22 at 11:17 +0530, Anshuman Khandual wrote:
> Completely ignore BHRB privilege state filter request as we are
> already configuring MMCRA register with privilege state filtering
> attribute for the accompanying PMU event. This would help achieve
> cleaner user space interaction for BHRB.

Your description from patch 0 should be here.


> diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
> index f7d1c4f..8ed323d 100644
> --- a/arch/powerpc/perf/power8-pmu.c
> +++ b/arch/powerpc/perf/power8-pmu.c
> @@ -525,16 +525,17 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type)
>  	u64 pmu_bhrb_filter = 0;
>  	u64 br_privilege = branch_sample_type & ONLY_PLM;
>  
> -	/* BHRB and regular PMU events share the same prvillege state
> +	/* BHRB and regular PMU events share the same prvilege state

Please spell "privilege" correctly.

>  	 * filter configuration. BHRB is always recorded along with a
> -	 * regular PMU event. So privilege state filter criteria for BHRB
> -	 * and the companion PMU events has to be the same. As a default
> -	 * "perf record" tool sets all privillege bits ON when no filter
> -	 * criteria is provided in the command line. So as along as all
> -	 * privillege bits are ON or they are OFF, we are good to go.
> +	 * regular PMU event. So privilege state filter criteria for
> +	 * the BHRB and the companion PMU events has to be the same.
> +	 * Separate BHRB privillege state filter requests would be
> +	 * ignored.
>  	 */

This comment doesn't make sense to me with the updated code.

It still says "privilege state filter criteria for the BHRB and the
companion PMU events has to be the same".

But they don't, right?

What it should say is "we ignore the privilege bits in the branch sample
type because they are handled by the underlying PMC configuration" - or
something like that.

> -	if ((br_privilege != 7) && (br_privilege != 0))
> -		return -1;
> +
> +	if (br_privilege)
> +		pr_info("BHRB privilege state filter request %llx ignored\n",
> +								br_privilege);

Don't do that. Ignoring the br_privilege is either the right thing to do
in which case we do it and print nothing, or it doesn't make sense and
we reject it.

cheers

^ permalink raw reply

* Re: [PATCH 2/5] powerpc, perf: Enable conditional branch filter for POWER8
From: Peter Zijlstra @ 2013-05-22  7:53 UTC (permalink / raw)
  To: Anshuman Khandual; +Cc: mikey, ak, linux-kernel, eranian, linuxppc-dev, mingo
In-Reply-To: <1369203761-12649-3-git-send-email-khandual@linux.vnet.ibm.com>

On Wed, May 22, 2013 at 11:52:38AM +0530, Anshuman Khandual wrote:
> Enables conditional branch filter support for POWER8
> utilizing MMCRA register based filter and also invalidates
> a BHRB branch filter combination involving conditional
> branches.
> 
> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
> ---
>  arch/powerpc/perf/power8-pmu.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
> index 8ed323d..e60b38f 100644
> --- a/arch/powerpc/perf/power8-pmu.c
> +++ b/arch/powerpc/perf/power8-pmu.c
> @@ -548,11 +548,21 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type)
>  	if (branch_sample_type & PERF_SAMPLE_BRANCH_IND_CALL)
>  		return -1;
>  
> +	/* Invalid branch filter combination - HW does not support */
> +	if ((branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) &&
> +			(branch_sample_type & PERF_SAMPLE_BRANCH_COND))
> +		return -1;
> +
>  	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) {
>  		pmu_bhrb_filter |= POWER8_MMCRA_IFM1;
>  		return pmu_bhrb_filter;
>  	}
>  
> +	if (branch_sample_type & PERF_SAMPLE_BRANCH_COND) {
> +		pmu_bhrb_filter |= POWER8_MMCRA_IFM3;
> +		return pmu_bhrb_filter;
> +	}
> +
>  	/* Every thing else is unsupported */
>  	return -1;
>  }

So I suppose you've seen what x86 does in this case? ;-) I'm not saying
you _have_ to do the software filter, but I would like the changelog to at
least mention the issue.

In fact, I suppose that should have been in the original patches :/ as
this patch series only adds the conditional branch support. 

^ permalink raw reply

* [PATCH] clk/mpc85xx: Update the compatible string
From: Yuantian.Tang @ 2013-05-22  8:22 UTC (permalink / raw)
  To: mturquette; +Cc: Tang Yuantian, devicetree-discuss, linuxppc-dev

From: Tang Yuantian <yuantian.tang@freescale.com>

The compatible string of clock is changed from *-2 to *-2.0
on chassis 2. So updated it accordingly.

Signed-off-by: Tang Yuantian <Yuantian.Tang@freescale.com>
---
 drivers/clk/clk-ppc-corenet.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clk/clk-ppc-corenet.c b/drivers/clk/clk-ppc-corenet.c
index a2d483f..e958707 100644
--- a/drivers/clk/clk-ppc-corenet.c
+++ b/drivers/clk/clk-ppc-corenet.c
@@ -260,7 +260,7 @@ static int __init ppc_corenet_clk_probe(struct platform_device *pdev)
 
 static const struct of_device_id ppc_clk_ids[] __initconst = {
 	{ .compatible = "fsl,qoriq-clockgen-1.0", },
-	{ .compatible = "fsl,qoriq-clockgen-2", },
+	{ .compatible = "fsl,qoriq-clockgen-2.0", },
 	{}
 };
 
-- 
1.8.0

^ permalink raw reply related

* [PATCH] powerpc: Make radeon 32-bit MSI quirk work on powernv
From: Benjamin Herrenschmidt @ 2013-05-22  8:58 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Brian King, klebers

This moves the quirk itself to pci_64.c as to get built on all ppc64
platforms (the only ones with a pci_dn), factors the two implementations
of get_pdn() into a single pci_get_dn() and use the quirk to do 32-bit
MSIs on IODA based powernv platforms.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

(We really need to rename that pci_dn structure to something a bit more
telling one of these days, such as pci_dn_auxdata)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 8b11b5b..2c1d8cb 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -174,6 +174,8 @@ struct pci_dn {
 /* Get the pointer to a device_node's pci_dn */
 #define PCI_DN(dn)	((struct pci_dn *) (dn)->data)
 
+extern struct pci_dn *pci_get_pdn(struct pci_dev *pdev);
+
 extern void * update_dn_pci_info(struct device_node *dn, void *data);
 
 static inline int pci_device_from_OF_node(struct device_node *np,
diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
index 873050d..2e86296 100644
--- a/arch/powerpc/kernel/pci_64.c
+++ b/arch/powerpc/kernel/pci_64.c
@@ -266,3 +266,13 @@ int pcibus_to_node(struct pci_bus *bus)
 }
 EXPORT_SYMBOL(pcibus_to_node);
 #endif
+
+static void quirk_radeon_32bit_msi(struct pci_dev *dev)
+{
+	struct pci_dn *pdn = pci_get_pdn(dev);
+
+	if (pdn)
+		pdn->force_32bit_msi = 1;
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x68f2, quirk_radeon_32bit_msi);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0xaa68, quirk_radeon_32bit_msi);
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index e7af165..df03844 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -32,6 +32,14 @@
 #include <asm/ppc-pci.h>
 #include <asm/firmware.h>
 
+struct pci_dn *pci_get_pdn(struct pci_dev *pdev)
+{
+	struct device_node *dn = pci_device_to_OF_node(pdev);
+	if (!dn)
+		return NULL;
+	return PCI_DN(dn);
+}
+
 /*
  * Traverse_func that inits the PCI fields of the device node.
  * NOTE: this *must* be done before read/write config to the device.
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 194e921..49c4c57 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -68,16 +68,6 @@ define_pe_printk_level(pe_err, KERN_ERR);
 define_pe_printk_level(pe_warn, KERN_WARNING);
 define_pe_printk_level(pe_info, KERN_INFO);
 
-static struct pci_dn *pnv_ioda_get_pdn(struct pci_dev *dev)
-{
-	struct device_node *np;
-
-	np = pci_device_to_OF_node(dev);
-	if (!np)
-		return NULL;
-	return PCI_DN(np);
-}
-
 static int pnv_ioda_alloc_pe(struct pnv_phb *phb)
 {
 	unsigned long pe;
@@ -110,7 +100,7 @@ static struct pnv_ioda_pe *pnv_ioda_get_pe(struct pci_dev *dev)
 {
 	struct pci_controller *hose = pci_bus_to_host(dev->bus);
 	struct pnv_phb *phb = hose->private_data;
-	struct pci_dn *pdn = pnv_ioda_get_pdn(dev);
+	struct pci_dn *pdn = pci_get_pdn(dev);
 
 	if (!pdn)
 		return NULL;
@@ -173,7 +163,7 @@ static int pnv_ioda_configure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
 
 	/* Add to all parents PELT-V */
 	while (parent) {
-		struct pci_dn *pdn = pnv_ioda_get_pdn(parent);
+		struct pci_dn *pdn = pci_get_pdn(parent);
 		if (pdn && pdn->pe_number != IODA_INVALID_PE) {
 			rc = opal_pci_set_peltv(phb->opal_id, pdn->pe_number,
 						pe->pe_number, OPAL_ADD_PE_TO_DOMAIN);
@@ -252,7 +242,7 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
 {
 	struct pci_controller *hose = pci_bus_to_host(dev->bus);
 	struct pnv_phb *phb = hose->private_data;
-	struct pci_dn *pdn = pnv_ioda_get_pdn(dev);
+	struct pci_dn *pdn = pci_get_pdn(dev);
 	struct pnv_ioda_pe *pe;
 	int pe_num;
 
@@ -323,7 +313,7 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
 	struct pci_dev *dev;
 
 	list_for_each_entry(dev, &bus->devices, bus_list) {
-		struct pci_dn *pdn = pnv_ioda_get_pdn(dev);
+		struct pci_dn *pdn = pci_get_pdn(dev);
 
 		if (pdn == NULL) {
 			pr_warn("%s: No device node associated with device !\n",
@@ -436,7 +426,7 @@ static void pnv_pci_ioda_setup_PEs(void)
 
 static void pnv_pci_ioda_dma_dev_setup(struct pnv_phb *phb, struct pci_dev *pdev)
 {
-	struct pci_dn *pdn = pnv_ioda_get_pdn(pdev);
+	struct pci_dn *pdn = pci_get_pdn(pdev);
 	struct pnv_ioda_pe *pe;
 
 	/*
@@ -768,6 +758,7 @@ static int pnv_pci_ioda_msi_setup(struct pnv_phb *phb, struct pci_dev *dev,
 				  unsigned int is_64, struct msi_msg *msg)
 {
 	struct pnv_ioda_pe *pe = pnv_ioda_get_pe(dev);
+	struct pci_dn *pdn = pci_get_pdn(dev);
 	struct irq_data *idata;
 	struct irq_chip *ichip;
 	unsigned int xive_num = hwirq - phb->msi_base;
@@ -783,6 +774,10 @@ static int pnv_pci_ioda_msi_setup(struct pnv_phb *phb, struct pci_dev *dev,
 	if (pe->mve_number < 0)
 		return -ENXIO;
 
+	/* Force 32-bit MSI on some broken devices */
+	if (pdn && pdn->force_32bit_msi)
+		is_64 = 0;
+
 	/* Assign XIVE to PE */
 	rc = opal_pci_set_xive_pe(phb->opal_id, pe->pe_number, xive_num);
 	if (rc) {
@@ -1035,7 +1030,7 @@ static int pnv_pci_enable_device_hook(struct pci_dev *dev)
 	if (!phb->initialized)
 		return 0;
 
-	pdn = pnv_ioda_get_pdn(dev);
+	pdn = pci_get_pdn(dev);
 	if (!pdn || pdn->pe_number == IODA_INVALID_PE)
 		return -EINVAL;
 
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index 098d357..277343c 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -47,6 +47,10 @@ static int pnv_msi_check_device(struct pci_dev* pdev, int nvec, int type)
 {
 	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
 	struct pnv_phb *phb = hose->private_data;
+	struct pci_dn *pdn = pci_get_pdn(pdev);
+
+	if (pdn && pdn->force_32bit_msi && !phb->msi32_support)
+		return -ENODEV;
 
 	return (phb && phb->msi_bmp.bitmap) ? 0 : -ENODEV;
 }
diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c
index 420524e..d34f4ff 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -26,26 +26,6 @@ static int query_token, change_token;
 #define RTAS_CHANGE_MSIX_FN	4
 #define RTAS_CHANGE_32MSI_FN	5
 
-static struct pci_dn *get_pdn(struct pci_dev *pdev)
-{
-	struct device_node *dn;
-	struct pci_dn *pdn;
-
-	dn = pci_device_to_OF_node(pdev);
-	if (!dn) {
-		dev_dbg(&pdev->dev, "rtas_msi: No OF device node\n");
-		return NULL;
-	}
-
-	pdn = PCI_DN(dn);
-	if (!pdn) {
-		dev_dbg(&pdev->dev, "rtas_msi: No PCI DN\n");
-		return NULL;
-	}
-
-	return pdn;
-}
-
 /* RTAS Helpers */
 
 static int rtas_change_msi(struct pci_dn *pdn, u32 func, u32 num_irqs)
@@ -91,7 +71,7 @@ static void rtas_disable_msi(struct pci_dev *pdev)
 {
 	struct pci_dn *pdn;
 
-	pdn = get_pdn(pdev);
+	pdn = pci_get_pdn(pdev);
 	if (!pdn)
 		return;
 
@@ -152,7 +132,7 @@ static int check_req(struct pci_dev *pdev, int nvec, char *prop_name)
 	struct pci_dn *pdn;
 	const u32 *req_msi;
 
-	pdn = get_pdn(pdev);
+	pdn = pci_get_pdn(pdev);
 	if (!pdn)
 		return -ENODEV;
 
@@ -402,7 +382,7 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int nvec_in, int type)
 	struct msi_msg msg;
 	int nvec = nvec_in;
 
-	pdn = get_pdn(pdev);
+	pdn = pci_get_pdn(pdev);
 	if (!pdn)
 		return -ENODEV;
 
@@ -518,12 +498,3 @@ static int rtas_msi_init(void)
 }
 arch_initcall(rtas_msi_init);
 
-static void quirk_radeon(struct pci_dev *dev)
-{
-	struct pci_dn *pdn = get_pdn(dev);
-
-	if (pdn)
-		pdn->force_32bit_msi = 1;
-}
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x68f2, quirk_radeon);
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0xaa68, quirk_radeon);

^ permalink raw reply related

* Re: [PATCH 1/2] powerpc, perf: Ignore separate BHRB privilege state filter request
From: Anshuman Khandual @ 2013-05-22  8:59 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, mikey, linux-kernel
In-Reply-To: <1369206855.12874.9.camel@concordia>

> 
> Your description from patch 0 should be here.
>

Sure, will bring it here.

> 
>> diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
>> index f7d1c4f..8ed323d 100644
>> --- a/arch/powerpc/perf/power8-pmu.c
>> +++ b/arch/powerpc/perf/power8-pmu.c
>> @@ -525,16 +525,17 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type)
>>  	u64 pmu_bhrb_filter = 0;
>>  	u64 br_privilege = branch_sample_type & ONLY_PLM;
>>  
>> -	/* BHRB and regular PMU events share the same prvillege state
>> +	/* BHRB and regular PMU events share the same prvilege state
> 
> Please spell "privilege" correctly.
> 

My bad, will fix it.


>>  	 * filter configuration. BHRB is always recorded along with 
> It still says "privilege state filter criteria for the BHRB and the
> companion PMU events has to be the same".
> 
> But they don't, right?
> 

Right


> What it should say is "we ignore the privilege bits in the branch sample
> type because they are handled by the underlying PMC configuration" - or
> something like that.

Here is the latest description for the code block

	/* BHRB and regular PMU events share the same privilege state
	 * filter configuration. BHRB is always recorded along with a
	 * regular PMU event. As the privilege state filter is handled
	 * in the basic PMC configuration of the accompanying regular
	 * PMU event, we ignore any separate BHRB specific request.
	 */

Does it sound better ?

> 
>> -	if ((br_privilege != 7) && (br_privilege != 0))
>> -		return -1;
>> +
>> +	if (br_privilege)
>> +		pr_info("BHRB privilege state filter request %llx ignored\n",
>> +								br_privilege);
> 
> Don't do that. Ignoring the br_privilege is either the right thing to do
> in which case we do it and print nothing,


I thought the informational print would at least make the user aware
of the fact that the separate filter request for BHRB went ignored.
Can we add this some where in the documentation ?

  
 or it doesn't make sense and
> we reject it.
>

 
> cheers
> 

^ permalink raw reply

* [PATCH] powerpc/powernv: Build a zImage.epapr
From: Benjamin Herrenschmidt @ 2013-05-22  9:00 UTC (permalink / raw)
  To: linuxppc-dev

The zImage.epapr wrapper allows to use zImages when booting via a flat
device-tree which can be used on powernv.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

TODO: Make the main zImage.pseries cope with an epapr boot...

diff --git a/arch/powerpc/platforms/powernv/Kconfig b/arch/powerpc/platforms/powernv/Kconfig
index 91bec0e..a18e9b7 100644
--- a/arch/powerpc/platforms/powernv/Kconfig
+++ b/arch/powerpc/platforms/powernv/Kconfig
@@ -6,6 +6,7 @@ config PPC_POWERNV
 	select PPC_ICP_NATIVE
 	select PPC_P7_NAP
 	select PPC_PCI_CHOICE if EMBEDDED
+	select EPAPR_BOOT
 	default y
 
 config POWERNV_MSI

^ permalink raw reply related

* Re: [PATCH v2 00/10] uaccess: better might_sleep/might_fault behavior
From: Arnd Bergmann @ 2013-05-22  9:25 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-m32r-ja, kvm, Peter Zijlstra, Catalin Marinas, Will Deacon,
	David Howells, linux-mm, Paul Mackerras, H. Peter Anvin,
	linux-arch, linux-am33-list, Hirokazu Takata, x86, Ingo Molnar,
	microblaze-uclinux, Chris Metcalf, Thomas Gleixner,
	linux-arm-kernel, Michal Simek, linux-m32r, linux-kernel,
	Koichi Yasutake, linuxppc-dev
In-Reply-To: <cover.1368702323.git.mst@redhat.com>

On Thursday 16 May 2013, Michael S. Tsirkin wrote:
> This improves the might_fault annotations used
> by uaccess routines:
> 
> 1. The only reason uaccess routines might sleep
>    is if they fault. Make this explicit for
>    all architectures.
> 2. Accesses (e.g through socket ops) to kernel memory
>    with KERNEL_DS like net/sunrpc does will never sleep.
>    Remove an unconditinal might_sleep in the inline
>    might_fault in kernel.h
>    (used when PROVE_LOCKING is not set).
> 3. Accesses with pagefault_disable return EFAULT
>    but won't cause caller to sleep.
>    Check for that and avoid might_sleep when
>    PROVE_LOCKING is set.
> 
> I'd like these changes to go in for the benefit of
> the vhost driver where we want to call socket ops
> under a spinlock, and fall back on slower thread handler
> on error.

Hi Michael,

I have recently stumbled over a related topic, which is the highly
inconsistent placement of might_fault() or might_sleep() in certain
classes of uaccess functions. Your patches seem completely reasonable,
but it would be good to also fix the other problem, at least on
the architectures we most care about.

Given the most commonly used functions and a couple of architectures
I'm familiar with, these are the ones that currently call might_fault()

			x86-32	x86-64	arm	arm64	powerpc	s390	generic
copy_to_user		-	x	-	-	-	x	x
copy_from_user		-	x	-	-	-	x	x
put_user		x	x	x	x	x	x	x
get_user		x	x	x	x	x	x	x
__copy_to_user		x	x	-	-	x	-	-
__copy_from_user	x	x	-	-	x	-	-
__put_user		-	-	x	-	x	-	-
__get_user		-	-	x	-	x	-	-

WTF?

Calling might_fault() for every __get_user/__put_user is rather expensive
because it turns what should be a single instruction (plus fixup) into an
external function call.

My feeling is that we should do might_fault() only in access_ok() to get
the right balance.

	Arnd

^ permalink raw reply

* Re: [PATCH v4 06/12] ARM: dove: add gigabit ethernet and mvmdio device tree nodes
From: Sebastian Hesselbarth @ 2013-05-22  9:43 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Jason Cooper, linux-kernel, Lennert Buytenhek, netdev,
	linuxppc-dev, David Miller, linux-arm-kernel
In-Reply-To: <20130521174849.GL26249@lunn.ch>

On 05/21/2013 07:48 PM, Andrew Lunn wrote:
> On Tue, May 21, 2013 at 06:41:44PM +0200, Sebastian Hesselbarth wrote:
>> This patch adds orion-eth and mvmdio device tree nodes for DT enabled
>> Dove boards. As there is only one ethernet controller on Dove, a default
>> phy node is also added with a note to set its reg property on a per-board
>> basis.
>>
>> Signed-off-by: Sebastian Hesselbarth<sebastian.hesselbarth@gmail.com>
>> ---
...
>> +			ethernet-port@0 {
>> +				device_type = "network";
>> +				compatible = "marvell,orion-eth-port";
>> +				reg =<0>;
>> +				interrupts =<29>;
>> +				/* overwrite MAC address in bootloader */
>> +				local-mac-address = [00 00 00 00 00 00];
>
> Hi Sebastian
>
> Its probably a good idea to set the local administration bit in this
> MAC address. i.e. first byte is 02.

Andrew,

we just need an invalid address here to trigger the default behavior of
the driver and load the MAC address from its register. As PPC binding
documentation also has all zero, I just took it.

Sebastian

^ permalink raw reply

* Re: [PATCH v2 10/10] kernel: might_fault does not imply might_sleep
From: Michael S. Tsirkin @ 2013-05-22  9:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-m32r-ja, kvm, Catalin Marinas, Will Deacon, David Howells,
	linux-mm, Paul Mackerras, H. Peter Anvin, linux-arch,
	linux-am33-list, Hirokazu Takata, x86, Ingo Molnar, Arnd Bergmann,
	microblaze-uclinux, Chris Metcalf, rostedt, Thomas Gleixner,
	linux-arm-kernel, Michal Simek, linux-m32r, linux-kernel,
	Koichi Yasutake, linuxppc-dev
In-Reply-To: <20130521115734.GA9554@twins.programming.kicks-ass.net>

On Tue, May 21, 2013 at 01:57:34PM +0200, Peter Zijlstra wrote:
> On Sun, May 19, 2013 at 12:35:26PM +0300, Michael S. Tsirkin wrote:
> > > > --- a/include/linux/kernel.h
> > > > +++ b/include/linux/kernel.h
> > > > @@ -198,7 +198,6 @@ void might_fault(void);
> > > >  #else
> > > >  static inline void might_fault(void)
> > > >  {
> > > > -	might_sleep();
> > > 
> > > This removes potential resched points for PREEMPT_VOLUNTARY -- was that
> > > intentional?
> > 
> > No it's a bug. Thanks for pointing this out.
> > OK so I guess it should be might_sleep_if(!in_atomic())
> > and this means might_fault would have to move from linux/kernel.h to
> > linux/uaccess.h, since in_atomic() is in linux/hardirq.h
> > 
> > Makes sense?
> 
> So the only difference between PROVE_LOCKING and not should be the
> might_lock_read() thing; so how about something like this?
> 
> ---
>  include/linux/kernel.h  |  7 ++-----
>  include/linux/uaccess.h | 26 ++++++++++++++++++++++++++
>  mm/memory.c             | 14 ++------------
>  3 files changed, 30 insertions(+), 17 deletions(-)
> 
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index e96329c..70812f4 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -194,12 +194,9 @@ extern int _cond_resched(void);
>  	})
>  
>  #ifdef CONFIG_PROVE_LOCKING
> -void might_fault(void);
> +void might_fault_lockdep(void);
>  #else
> -static inline void might_fault(void)
> -{
> -	might_sleep();
> -}
> +static inline void might_fault_lockdep(void) { }
>  #endif
>  
>  extern struct atomic_notifier_head panic_notifier_list;
> diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
> index 5ca0951..50a2cc9 100644
> --- a/include/linux/uaccess.h
> +++ b/include/linux/uaccess.h
> @@ -38,6 +38,32 @@ static inline void pagefault_enable(void)
>  	preempt_check_resched();
>  }
>  
> +static inline bool __can_fault(void)
> +{
> +	/*
> +	 * Some code (nfs/sunrpc) uses socket ops on kernel memory while
> +	 * holding the mmap_sem, this is safe because kernel memory doesn't
> +	 * get paged out, therefore we'll never actually fault, and the
> +	 * below annotations will generate false positives.
> +	 */
> +	if (segment_eq(get_fs(), KERNEL_DS))
> +		return false;
> +
> +	if (in_atomic() /* || pagefault_disabled() */)

One question here: I'm guessing you put this comment here
for illustrative purposes, implying code that will
be enabled in -rt?
We don't want it upstream I think, right?


> +		return false;
> +
> +	return true;
> +}
> +
> +static inline void might_fault(void)
> +{
> +	if (!__can_fault())
> +		return;
> +
> +	might_sleep();
> +	might_fault_lockdep();
> +}
> +
>  #ifndef ARCH_HAS_NOCACHE_UACCESS
>  
>  static inline unsigned long __copy_from_user_inatomic_nocache(void *to,
> diff --git a/mm/memory.c b/mm/memory.c
> index 6dc1882..266610c 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4211,19 +4211,9 @@ void print_vma_addr(char *prefix, unsigned long ip)
>  }
>  
>  #ifdef CONFIG_PROVE_LOCKING
> -void might_fault(void)
> +void might_fault_lockdep(void)
>  {
>  	/*
> -	 * Some code (nfs/sunrpc) uses socket ops on kernel memory while
> -	 * holding the mmap_sem, this is safe because kernel memory doesn't
> -	 * get paged out, therefore we'll never actually fault, and the
> -	 * below annotations will generate false positives.
> -	 */
> -	if (segment_eq(get_fs(), KERNEL_DS))
> -		return;
> -
> -	might_sleep();
> -	/*
>  	 * it would be nicer only to annotate paths which are not under
>  	 * pagefault_disable, however that requires a larger audit and
>  	 * providing helpers like get_user_atomic.
> @@ -4231,7 +4221,7 @@ void might_fault(void)
>  	if (!in_atomic() && current->mm)
>  		might_lock_read(&current->mm->mmap_sem);
>  }
> -EXPORT_SYMBOL(might_fault);
> +EXPORT_SYMBOL(might_fault_lockdep);
>  #endif
>  
>  #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS)

^ permalink raw reply

* Re: [PATCH v2 00/10] uaccess: better might_sleep/might_fault behavior
From: Michael S. Tsirkin @ 2013-05-22  9:58 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-m32r-ja, kvm, Peter Zijlstra, Catalin Marinas, Will Deacon,
	David Howells, linux-mm, Paul Mackerras, H. Peter Anvin,
	linux-arch, linux-am33-list, Hirokazu Takata, x86, Ingo Molnar,
	microblaze-uclinux, Chris Metcalf, Thomas Gleixner,
	linux-arm-kernel, Michal Simek, linux-m32r, linux-kernel,
	Koichi Yasutake, linuxppc-dev
In-Reply-To: <201305221125.36284.arnd@arndb.de>

On Wed, May 22, 2013 at 11:25:36AM +0200, Arnd Bergmann wrote:
> On Thursday 16 May 2013, Michael S. Tsirkin wrote:
> > This improves the might_fault annotations used
> > by uaccess routines:
> > 
> > 1. The only reason uaccess routines might sleep
> >    is if they fault. Make this explicit for
> >    all architectures.
> > 2. Accesses (e.g through socket ops) to kernel memory
> >    with KERNEL_DS like net/sunrpc does will never sleep.
> >    Remove an unconditinal might_sleep in the inline
> >    might_fault in kernel.h
> >    (used when PROVE_LOCKING is not set).
> > 3. Accesses with pagefault_disable return EFAULT
> >    but won't cause caller to sleep.
> >    Check for that and avoid might_sleep when
> >    PROVE_LOCKING is set.
> > 
> > I'd like these changes to go in for the benefit of
> > the vhost driver where we want to call socket ops
> > under a spinlock, and fall back on slower thread handler
> > on error.
> 
> Hi Michael,
> 
> I have recently stumbled over a related topic, which is the highly
> inconsistent placement of might_fault() or might_sleep() in certain
> classes of uaccess functions. Your patches seem completely reasonable,
> but it would be good to also fix the other problem, at least on
> the architectures we most care about.
> 
> Given the most commonly used functions and a couple of architectures
> I'm familiar with, these are the ones that currently call might_fault()
> 
> 			x86-32	x86-64	arm	arm64	powerpc	s390	generic
> copy_to_user		-	x	-	-	-	x	x
> copy_from_user		-	x	-	-	-	x	x
> put_user		x	x	x	x	x	x	x
> get_user		x	x	x	x	x	x	x
> __copy_to_user		x	x	-	-	x	-	-
> __copy_from_user	x	x	-	-	x	-	-
> __put_user		-	-	x	-	x	-	-
> __get_user		-	-	x	-	x	-	-
> 
> WTF?

Yea.

> Calling might_fault() for every __get_user/__put_user is rather expensive
> because it turns what should be a single instruction (plus fixup) into an
> external function call.

You mean _cond_resched with CONFIG_PREEMPT_VOLUNTARY? Or do you
mean when we build with PROVE_LOCKING?

> My feeling is that we should do might_fault() only in access_ok() to get
> the right balance.
> 
> 	Arnd

Well access_ok is currently non-blocking I think - we'd have to audit
all callers. There are some 200 of these in drivers and some
1000 total so ... a bit risky.

-- 
MST

^ permalink raw reply

* Re: [PATCH v4 06/12] ARM: dove: add gigabit ethernet and mvmdio device tree nodes
From: tiejun.chen @ 2013-05-22 10:04 UTC (permalink / raw)
  To: Sebastian Hesselbarth
  Cc: Andrew Lunn, Jason Cooper, netdev, linux-kernel, linux-arm-kernel,
	linuxppc-dev, David Miller, Lennert Buytenhek
In-Reply-To: <519C9333.20609@gmail.com>

On 05/22/2013 05:43 PM, Sebastian Hesselbarth wrote:
> On 05/21/2013 07:48 PM, Andrew Lunn wrote:
>> On Tue, May 21, 2013 at 06:41:44PM +0200, Sebastian Hesselbarth wrote:
>>> This patch adds orion-eth and mvmdio device tree nodes for DT enabled
>>> Dove boards. As there is only one ethernet controller on Dove, a default
>>> phy node is also added with a note to set its reg property on a per-board
>>> basis.
>>>
>>> Signed-off-by: Sebastian Hesselbarth<sebastian.hesselbarth@gmail.com>
>>> ---
> ...
>>> +            ethernet-port@0 {
>>> +                device_type = "network";
>>> +                compatible = "marvell,orion-eth-port";
>>> +                reg =<0>;
>>> +                interrupts =<29>;
>>> +                /* overwrite MAC address in bootloader */
>>> +                local-mac-address = [00 00 00 00 00 00];
>>
>> Hi Sebastian
>>
>> Its probably a good idea to set the local administration bit in this
>> MAC address. i.e. first byte is 02.
>
> Andrew,
>
> we just need an invalid address here to trigger the default behavior of
> the driver and load the MAC address from its register. As PPC binding
> documentation also has all zero, I just took it.

The truth is in PPC case, often we set the real mac address with some variables 
like 'eth[x]addr' in u-boot prompt, then u-boot will parse that value to fill 
the dtb. At last the associated driver can get the actual mac address from the 
dtb. And especially for those older u-boot version, even you have to reset the 
'local-mac-address' property in dts directly with the real mac address before 
generate the dtb since the older u-boot have no this ability to fill dtb again 
before pass the kernel.

Tiejun

^ permalink raw reply

* Re: [PATCH v4 06/12] ARM: dove: add gigabit ethernet and mvmdio device tree nodes
From: Sebastian Hesselbarth @ 2013-05-22 10:13 UTC (permalink / raw)
  To: tiejun.chen
  Cc: Andrew Lunn, Jason Cooper, netdev, linux-kernel, linux-arm-kernel,
	linuxppc-dev, David Miller, Lennert Buytenhek
In-Reply-To: <519C9822.9040909@windriver.com>

On 05/22/2013 12:04 PM, tiejun.chen wrote:
> On 05/22/2013 05:43 PM, Sebastian Hesselbarth wrote:
>> On 05/21/2013 07:48 PM, Andrew Lunn wrote:
>>> On Tue, May 21, 2013 at 06:41:44PM +0200, Sebastian Hesselbarth wrote:
>>>> This patch adds orion-eth and mvmdio device tree nodes for DT enabled
>>>> Dove boards. As there is only one ethernet controller on Dove, a
>>>> default
>>>> phy node is also added with a note to set its reg property on a
>>>> per-board
>>>> basis.
>>>>
>>>> Signed-off-by: Sebastian Hesselbarth<sebastian.hesselbarth@gmail.com>
>>>> ---
>> ...
>>>> + ethernet-port@0 {
>>>> + device_type = "network";
>>>> + compatible = "marvell,orion-eth-port";
>>>> + reg =<0>;
>>>> + interrupts =<29>;
>>>> + /* overwrite MAC address in bootloader */
>>>> + local-mac-address = [00 00 00 00 00 00];
>>>
>>> Hi Sebastian
>>>
>>> Its probably a good idea to set the local administration bit in this
>>> MAC address. i.e. first byte is 02.
>>
>> Andrew,
>>
>> we just need an invalid address here to trigger the default behavior of
>> the driver and load the MAC address from its register. As PPC binding
>> documentation also has all zero, I just took it.
>
> The truth is in PPC case, often we set the real mac address with some
> variables like 'eth[x]addr' in u-boot prompt, then u-boot will parse
> that value to fill the dtb. At last the associated driver can get the
> actual mac address from the dtb. And especially for those older u-boot
> version, even you have to reset the 'local-mac-address' property in dts
> directly with the real mac address before generate the dtb since the
> older u-boot have no this ability to fill dtb again before pass the kernel.

Tiejun,

with Marvell SoCs it is no different, except that there is almost no dtb
support in their u-boot. The default behavior of the driver always was
to load the MAC address from its register if there is no valid overwrite
value. Using an invalid address (and all zero above is invalid) will
cause of_get_mac_address() to fail (which we allow), the corresponding
platform_data will never be written, and cause the default behavior.

We only need an invalid address passed initially on local-mac-address.
DT aware boot loader will overwrite but DT agnositic boot loader will
not. I can put any invalid MAC address in here, so I have chosen the
very first I can think of.

Sebastian

^ permalink raw reply

* Re: [PATCH v2 10/10] kernel: might_fault does not imply might_sleep
From: Peter Zijlstra @ 2013-05-22 10:16 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-m32r-ja, kvm, Catalin Marinas, Will Deacon, David Howells,
	linux-mm, Paul Mackerras, H. Peter Anvin, linux-arch,
	linux-am33-list, Hirokazu Takata, x86, Ingo Molnar, Arnd Bergmann,
	microblaze-uclinux, Chris Metcalf, rostedt, Thomas Gleixner,
	linux-arm-kernel, Michal Simek, linux-m32r, linux-kernel,
	Koichi Yasutake, linuxppc-dev
In-Reply-To: <20130522094709.GA26451@redhat.com>

On Wed, May 22, 2013 at 12:47:09PM +0300, Michael S. Tsirkin wrote:
> >  
> > +static inline bool __can_fault(void)
> > +{
> > +	/*
> > +	 * Some code (nfs/sunrpc) uses socket ops on kernel memory while
> > +	 * holding the mmap_sem, this is safe because kernel memory doesn't
> > +	 * get paged out, therefore we'll never actually fault, and the
> > +	 * below annotations will generate false positives.
> > +	 */
> > +	if (segment_eq(get_fs(), KERNEL_DS))
> > +		return false;
> > +
> > +	if (in_atomic() /* || pagefault_disabled() */)
> 
> One question here: I'm guessing you put this comment here
> for illustrative purposes, implying code that will
> be enabled in -rt?
> We don't want it upstream I think, right?

Right, and as a reminder that when we do this we need to add a patch to
-rt. But yeah, we should have a look and see if its worth pulling those
patches from -rt into mainline in some way shape or form. They're big
but trivial IIRC.

I'm fine with you leaving that comment out though.. 

^ permalink raw reply

* Re: [PATCH v2 00/10] uaccess: better might_sleep/might_fault behavior
From: Peter Zijlstra @ 2013-05-22 10:19 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-m32r-ja, kvm, Michael S. Tsirkin, Catalin Marinas,
	Will Deacon, David Howells, linux-mm, Paul Mackerras,
	H. Peter Anvin, linux-arch, linux-am33-list, Hirokazu Takata, x86,
	Ingo Molnar, microblaze-uclinux, Chris Metcalf, Thomas Gleixner,
	linux-arm-kernel, Michal Simek, linux-m32r, linux-kernel,
	Koichi Yasutake, linuxppc-dev
In-Reply-To: <201305221125.36284.arnd@arndb.de>

On Wed, May 22, 2013 at 11:25:36AM +0200, Arnd Bergmann wrote:
> Calling might_fault() for every __get_user/__put_user is rather expensive
> because it turns what should be a single instruction (plus fixup) into an
> external function call.

We could hide it all behind CONFIG_DEBUG_ATOMIC_SLEEP just like
might_sleep() is. I'm not sure there's a point to might_fault() when
might_sleep() is a NOP.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox