* i915.ko WC writes are slow after ea8596bb2d8d379
@ 2014-10-08 9:03 Chris Wilson
2014-10-08 10:10 ` Chuck Ebbert
2014-10-08 17:47 ` Chuck Ebbert
0 siblings, 2 replies; 16+ messages in thread
From: Chris Wilson @ 2014-10-08 9:03 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Jiri Kosina, H. Peter Anvin, Steven Rostedt,
Jason Baron, yrl.pp-manager.tt, Borislav Petkov, Ingo Molnar,
Daniel Vetter
I ran into a problem on a Sandybridge i5-2500s whilst measuring the
performance of GTT write-combining access. I found subsequent runs were
about 10-40x slower than the first. For example,
igt/gem_gtt_speed:
Time to read 16k through a GTT map: 325.285µs
Time to write 16k through a GTT map: 4.729µs
Time to clear 16k through a GTT map: 4.584µs
Time to clear 16k through a cached GTT map: 1.342µs
on the second run became:
Time to read 16k through a GTT map: 332.148µs
Time to write 16k through a GTT map: 209.411µs
Time to clear 16k through a GTT map: 56.460µs
Time to clear 16k through a cached GTT map: 50.897µs
Naively I would say that we lost the wc on our ioremap.
/sys/kernel/debug/x86/pat_memtype_list remained the same across repeated
runs.
A bisection pointed to
commit ea8596bb2d8d37957f3e92db9511c50801689180
Author: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Date: Thu Jul 18 20:47:53 2013 +0900
kprobes/x86: Remove unused text_poke_smp() and text_poke_smp_batch() functions
of which the active ingredient was just
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b32ebf9..f4001e0 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2334,7 +2334,6 @@ config HAVE_ATOMIC_IOMAP
config HAVE_TEXT_POKE_SMP
bool
- select STOP_MACHINE if SMP
config X86_DEV_DMA_OPS
bool
and adding that back into the current build, e.g.
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 3632743..48a8a69 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -87,6 +87,7 @@ config X86
select HAVE_USER_RETURN_NOTIFIER
select ARCH_BINFMT_ELF_RANDOMIZE_PIE
select HAVE_ARCH_JUMP_LABEL
+ select STOP_MACHINE
select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
select SPARSE_IRQ
select GENERIC_FIND_FIRST_BIT
fixes the regression.
For the record, this kernel build doesn't use modules, which seems relevant
in light of ea8596bb2 "fixes a Kconfig dependency issue on STOP_MACHINE
in the case of CONFIG_SMP && !CONFIG_MODULE_UNLOAD".
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: i915.ko WC writes are slow after ea8596bb2d8d379 2014-10-08 9:03 i915.ko WC writes are slow after ea8596bb2d8d379 Chris Wilson @ 2014-10-08 10:10 ` Chuck Ebbert 2014-10-08 19:49 ` Chris Wilson 2015-11-18 14:48 ` Chris Wilson 2014-10-08 17:47 ` Chuck Ebbert 1 sibling, 2 replies; 16+ messages in thread From: Chuck Ebbert @ 2014-10-08 10:10 UTC (permalink / raw) To: Chris Wilson Cc: linux-kernel, Masami Hiramatsu, Jiri Kosina, H. Peter Anvin, Steven Rostedt, Jason Baron, yrl.pp-manager.tt, Borislav Petkov, Ingo Molnar, Daniel Vetter On Wed, 8 Oct 2014 10:03:36 +0100 Chris Wilson <chris@chris-wilson.co.uk> wrote: > > I ran into a problem on a Sandybridge i5-2500s whilst measuring the > performance of GTT write-combining access. I found subsequent runs were > about 10-40x slower than the first. For example, > > igt/gem_gtt_speed: > > Time to read 16k through a GTT map: 325.285µs > Time to write 16k through a GTT map: 4.729µs > Time to clear 16k through a GTT map: 4.584µs > Time to clear 16k through a cached GTT map: 1.342µs > > on the second run became: > > Time to read 16k through a GTT map: 332.148µs > Time to write 16k through a GTT map: 209.411µs > Time to clear 16k through a GTT map: 56.460µs > Time to clear 16k through a cached GTT map: 50.897µs > > Naively I would say that we lost the wc on our ioremap. > /sys/kernel/debug/x86/pat_memtype_list remained the same across repeated > runs. > > A bisection pointed to > > commit ea8596bb2d8d37957f3e92db9511c50801689180 > Author: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> > Date: Thu Jul 18 20:47:53 2013 +0900 > > kprobes/x86: Remove unused text_poke_smp() and text_poke_smp_batch() functions > > of which the active ingredient was just > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index b32ebf9..f4001e0 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -2334,7 +2334,6 @@ config HAVE_ATOMIC_IOMAP > > config HAVE_TEXT_POKE_SMP > bool > - select STOP_MACHINE if SMP > > config X86_DEV_DMA_OPS > bool > > and adding that back into the current build, e.g. Hmm, set_mtrr() uses stop_machine(). I wonder if your MTRRs are out of sync and your results depend on which CPU the test runs on? > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index 3632743..48a8a69 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -87,6 +87,7 @@ config X86 > select HAVE_USER_RETURN_NOTIFIER > select ARCH_BINFMT_ELF_RANDOMIZE_PIE > select HAVE_ARCH_JUMP_LABEL > + select STOP_MACHINE > select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE > select SPARSE_IRQ > select GENERIC_FIND_FIRST_BIT > > fixes the regression. > > For the record, this kernel build doesn't use modules, which seems relevant > in light of ea8596bb2 "fixes a Kconfig dependency issue on STOP_MACHINE > in the case of CONFIG_SMP && !CONFIG_MODULE_UNLOAD". ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: i915.ko WC writes are slow after ea8596bb2d8d379 2014-10-08 10:10 ` Chuck Ebbert @ 2014-10-08 19:49 ` Chris Wilson 2014-10-08 21:36 ` H. Peter Anvin 2015-11-18 14:48 ` Chris Wilson 1 sibling, 1 reply; 16+ messages in thread From: Chris Wilson @ 2014-10-08 19:49 UTC (permalink / raw) To: Chuck Ebbert Cc: linux-kernel, Masami Hiramatsu, Jiri Kosina, H. Peter Anvin, Steven Rostedt, Jason Baron, yrl.pp-manager.tt, Borislav Petkov, Ingo Molnar, Daniel Vetter, x86, Thomas Gleixner On Wed, Oct 08, 2014 at 05:10:59AM -0500, Chuck Ebbert wrote: > On Wed, 8 Oct 2014 10:03:36 +0100 > Chris Wilson <chris@chris-wilson.co.uk> wrote: > > > > > I ran into a problem on a Sandybridge i5-2500s whilst measuring the > > performance of GTT write-combining access. I found subsequent runs were > > about 10-40x slower than the first. For example, > > > > igt/gem_gtt_speed: > > > > Time to read 16k through a GTT map: 325.285µs > > Time to write 16k through a GTT map: 4.729µs > > Time to clear 16k through a GTT map: 4.584µs > > Time to clear 16k through a cached GTT map: 1.342µs > > > > on the second run became: > > > > Time to read 16k through a GTT map: 332.148µs > > Time to write 16k through a GTT map: 209.411µs > > Time to clear 16k through a GTT map: 56.460µs > > Time to clear 16k through a cached GTT map: 50.897µs > > > > Naively I would say that we lost the wc on our ioremap. > > /sys/kernel/debug/x86/pat_memtype_list remained the same across repeated > > runs. > > > > A bisection pointed to > > > > commit ea8596bb2d8d37957f3e92db9511c50801689180 > > Author: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> > > Date: Thu Jul 18 20:47:53 2013 +0900 > > > > kprobes/x86: Remove unused text_poke_smp() and text_poke_smp_batch() functions > > > > of which the active ingredient was just > > > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > > index b32ebf9..f4001e0 100644 > > --- a/arch/x86/Kconfig > > +++ b/arch/x86/Kconfig > > @@ -2334,7 +2334,6 @@ config HAVE_ATOMIC_IOMAP > > > > config HAVE_TEXT_POKE_SMP > > bool > > - select STOP_MACHINE if SMP > > > > config X86_DEV_DMA_OPS > > bool > > > > and adding that back into the current build, e.g. > > Hmm, set_mtrr() uses stop_machine(). I wonder if your MTRRs are out of > sync and your results depend on which CPU the test runs on? Indeed, this appears to be the explanation. (And here I thought PAT superseded mtrrs - i915.ko stopped trying to use assign an mtrr for its GTT quite a while ago.) Replacing the stop_machine there with on_each_cpu does the trick: diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c index f961de9..c0e37d5 100644 --- a/arch/x86/kernel/cpu/mtrr/main.c +++ b/arch/x86/kernel/cpu/mtrr/main.c @@ -151,7 +151,7 @@ struct set_mtrr_data { * * Returns nothing. */ -static int mtrr_rendezvous_handler(void *info) +static void mtrr_rendezvous_handler(void *info) { struct set_mtrr_data *data = info; @@ -174,7 +174,6 @@ static int mtrr_rendezvous_handler(void *info) } else if (mtrr_aps_delayed_init || !cpu_online(smp_processor_id())) { mtrr_if->set_all(); } - return 0; } static inline int types_compatible(mtrr_type type1, mtrr_type type2) @@ -228,7 +227,7 @@ set_mtrr(unsigned int reg, unsigned long base, unsigned long size, mtrr_type typ .smp_type = type }; - stop_machine(mtrr_rendezvous_handler, &data, cpu_online_mask); + on_each_cpu_mask(cpu_online_mask, mtrr_rendezvous_handler, &data, true); } static void set_mtrr_from_inactive_cpu(unsigned int reg, unsigned long base, @@ -240,8 +239,7 @@ static void set_mtrr_from_inactive_cpu(unsigned int reg, unsigned long base, .smp_type = type }; - stop_machine_from_inactive_cpu(mtrr_rendezvous_handler, &data, - cpu_callout_mask); + on_each_cpu_mask(cpu_callout_mask, mtrr_rendezvous_handler, &data, true); } /** -- Chris Wilson, Intel Open Source Technology Centre ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: i915.ko WC writes are slow after ea8596bb2d8d379 2014-10-08 19:49 ` Chris Wilson @ 2014-10-08 21:36 ` H. Peter Anvin 2014-10-09 6:53 ` Chris Wilson 0 siblings, 1 reply; 16+ messages in thread From: H. Peter Anvin @ 2014-10-08 21:36 UTC (permalink / raw) To: Chris Wilson, Chuck Ebbert Cc: linux-kernel, Masami Hiramatsu, Jiri Kosina, Steven Rostedt, Jason Baron, yrl.pp-manager.tt, Borislav Petkov, Ingo Molnar, Daniel Vetter, x86, Thomas Gleixner On 10/08/2014 12:49 PM, Chris Wilson wrote: > > Indeed, this appears to be the explanation. (And here I thought PAT > superseded mtrrs - i915.ko stopped trying to use assign an mtrr for its > GTT quite a while ago.) > > Replacing the stop_machine there with on_each_cpu does the trick: > It should, but there seem to be quite a few drivers which still muck with MTRRs. However, i915 is not one of them, it calls io_mapping_create_wc() followed by arch_phys_wc_add(), so I'm wondering what the heck is going on here. > Naively I would say that we lost the wc on our ioremap. > /sys/kernel/debug/x86/pat_memtype_list remained the same across repeated > runs. Could you tell me what the above looks like? -hpa ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: i915.ko WC writes are slow after ea8596bb2d8d379 2014-10-08 21:36 ` H. Peter Anvin @ 2014-10-09 6:53 ` Chris Wilson 2014-10-09 12:44 ` Chuck Ebbert 0 siblings, 1 reply; 16+ messages in thread From: Chris Wilson @ 2014-10-09 6:53 UTC (permalink / raw) To: H. Peter Anvin Cc: Chuck Ebbert, linux-kernel, Masami Hiramatsu, Jiri Kosina, Steven Rostedt, Jason Baron, yrl.pp-manager.tt, Borislav Petkov, Ingo Molnar, Daniel Vetter, x86, Thomas Gleixner On Wed, Oct 08, 2014 at 02:36:49PM -0700, H. Peter Anvin wrote: > On 10/08/2014 12:49 PM, Chris Wilson wrote: > > > > Indeed, this appears to be the explanation. (And here I thought PAT > > superseded mtrrs - i915.ko stopped trying to use assign an mtrr for its > > GTT quite a while ago.) > > > > Replacing the stop_machine there with on_each_cpu does the trick: > > > > It should, but there seem to be quite a few drivers which still muck > with MTRRs. However, i915 is not one of them, it calls > io_mapping_create_wc() followed by arch_phys_wc_add(), so I'm wondering > what the heck is going on here. This system also have a radeon GPU. Disabling it (not building in the module) makes no difference to the wc speed. > > Naively I would say that we lost the wc on our ioremap. > > /sys/kernel/debug/x86/pat_memtype_list remained the same across repeated > > runs. > > Could you tell me what the above looks like? # cat /sys/kernel/debug/x86/pat_memtype_list PAT memtype list: write-back @ 0x8cf34000-0x8cf43000 write-back @ 0x8cf4d000-0x8cf4e000 write-back @ 0x8cf4d000-0x8cf50000 write-back @ 0x8cf50000-0x8cf51000 write-back @ 0x8cf51000-0x8cf52000 write-back @ 0x8cf52000-0x8cf53000 write-back @ 0x8cf53000-0x8cf55000 write-back @ 0x8cf55000-0x8cf56000 write-back @ 0x8cf9d000-0x8cf9e000 write-back @ 0x8cf9f000-0x8cfa0000 write-back @ 0x8cffc000-0x8cffd000 uncached-minus @ 0x8fc00000-0x8fe00000 write-combining @ 0x8fe00000-0x90000000 uncached-minus @ 0x90220000-0x90240000 uncached-minus @ 0x90300000-0x90320000 uncached-minus @ 0x90340000-0x90341000 uncached-minus @ 0x90380000-0x90381000 write-combining @ 0xa0000000-0xc0000000 write-combining @ 0xa0139000-0xa0159000 write-combining @ 0xa0159000-0xa0179000 write-combining @ 0xa0179000-0xa0199000 write-combining @ 0xc0040000-0xc025e000 write-combining @ 0xc025e000-0xc045e000 write-combining @ 0xc045e000-0xc045f000 write-combining @ 0xc045f000-0xc075f000 uncached-minus @ 0xf8000000-0xfc000000 uncached-minus @ 0xfed00000-0xfed01000 uncached-minus @ 0xfed10000-0xfed16000 uncached-minus @ 0xfed1f000-0xfed20000 (identical for good/bad runs) # cat /proc/mtrr reg00: base=0x000000000 ( 0MB), size= 2048MB, count=1: write-back reg01: base=0x080000000 ( 2048MB), size= 256MB, count=1: write-back reg02: base=0x08e000000 ( 2272MB), size= 32MB, count=1: uncachable reg03: base=0x08d000000 ( 2256MB), size= 16MB, count=1: uncachable reg04: base=0x100000000 ( 4096MB), size= 2048MB, count=1: write-back reg05: base=0x170000000 ( 5888MB), size= 256MB, count=1: uncachable reg06: base=0x16f000000 ( 5872MB), size= 16MB, count=1: uncachable reg07: base=0x16e800000 ( 5864MB), size= 8MB, count=1: uncachable reg08: base=0x16e600000 ( 5862MB), size= 2MB, count=1: uncachable # cat /proc/iomem: 00000000-00000fff : reserved 00001000-0009bbff : System RAM 0009bc00-0009ffff : reserved 000a0000-000bffff : PCI Bus 0000:00 000c0000-000cdfff : Video ROM 000d0000-000d3fff : PCI Bus 0000:00 000d4000-000d7fff : PCI Bus 0000:00 000d8000-000dbfff : PCI Bus 0000:00 000dc000-000dffff : PCI Bus 0000:00 000e0000-000fffff : reserved 000e0000-000e3fff : PCI Bus 0000:00 000e4000-000e7fff : PCI Bus 0000:00 000f0000-000fffff : System ROM 00100000-1fffffff : System RAM 01000000-0161981b : Kernel code 0161981c-01ca20ff : Kernel data 01dac000-01e2dfff : Kernel bss 20000000-201fffff : reserved 20000000-201fffff : pnp 00:05 20200000-3fffffff : System RAM 40000000-401fffff : reserved 40000000-401fffff : pnp 00:05 40200000-8ccd2fff : System RAM 8ccd3000-8cd66fff : reserved 8cd67000-8cfe6fff : ACPI Non-volatile Storage 8cfe7000-8cffefff : ACPI Tables 8cfff000-8cffffff : System RAM 8d000000-8f9fffff : reserved 8da00000-8f9fffff : Graphics Stolen Memory 8fa00000-feafffff : PCI Bus 0000:00 8fa00000-8fa00fff : pnp 00:03 8fc00000-8fffffff : 0000:00:02.0 90000000-900fffff : PCI Bus 0000:04 90000000-900fffff : PCI Bus 0000:05 90000000-90003fff : 0000:05:00.0 90010000-900107ff : 0000:05:00.0 90100000-901fffff : PCI Bus 0000:03 90100000-90101fff : 0000:03:00.0 90200000-902fffff : PCI Bus 0000:01 90200000-9021ffff : 0000:01:00.0 90220000-9023ffff : 0000:01:00.0 90240000-90243fff : 0000:01:00.1 90300000-9031ffff : 0000:00:19.0 90300000-9031ffff : e1000e 90330000-903300ff : 0000:00:1f.3 90340000-903407ff : 0000:00:1f.2 90340000-903407ff : ahci 90350000-903503ff : 0000:00:1d.0 90360000-90363fff : 0000:00:1b.0 90370000-903703ff : 0000:00:1a.0 90380000-90380fff : 0000:00:19.0 90380000-90380fff : e1000e 90390000-90390fff : 0000:00:16.3 903a0000-903a000f : 0000:00:16.0 a0000000-bfffffff : 0000:00:02.0 c0000000-cfffffff : PCI Bus 0000:01 c0000000-cfffffff : 0000:01:00.0 f8000000-fbffffff : PCI MMCONFIG 0000 [bus 00-3f] f8000000-fbffffff : reserved f8000000-fbffffff : pnp 00:03 fec00000-fec00fff : reserved fec00000-fec003ff : IOAPIC 0 fed00000-fed003ff : HPET 0 fed00000-fed003ff : PNP0103:00 fed10000-fed13fff : reserved fed18000-fed19fff : reserved fed18000-fed18fff : pnp 00:03 fed19000-fed19fff : pnp 00:03 fed1c000-fed1ffff : reserved fed1c000-fed1ffff : pnp 00:03 fed20000-fed3ffff : pnp 00:03 fed40000-fed44fff : PCI Bus 0000:00 fed45000-fed8ffff : pnp 00:03 fed90000-fed93fff : pnp 00:03 fee00000-fee00fff : Local APIC fee00000-fee00fff : reserved ff000000-ffffffff : INT0800:00 ff980000-ffbfffff : reserved ffd80000-ffffffff : reserved 100000000-16e5fffff : System RAM 16e600000-16fffffff : RAM buffer # lspci -vv -s 0:0:2 00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller]) Subsystem: Intel Corporation Device 2210 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 26 Region 0: Memory at 8fc00000 (64-bit, non-prefetchable) [size=4M] Region 2: Memory at a0000000 (64-bit, prefetchable) [size=512M] Region 4: I/O ports at 3000 [size=64] Expansion ROM at <unassigned> [disabled] Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit- Address: fee0f00c Data: 41b1 Capabilities: [d0] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [a4] PCI Advanced Features AFCap: TP+ FLR+ AFCtrl: FLR- AFStatus: TP- Kernel driver in use: i915 -- Chris Wilson, Intel Open Source Technology Centre ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: i915.ko WC writes are slow after ea8596bb2d8d379 2014-10-09 6:53 ` Chris Wilson @ 2014-10-09 12:44 ` Chuck Ebbert 2014-10-09 13:00 ` Chris Wilson 0 siblings, 1 reply; 16+ messages in thread From: Chuck Ebbert @ 2014-10-09 12:44 UTC (permalink / raw) To: Chris Wilson Cc: H. Peter Anvin, linux-kernel, Masami Hiramatsu, Jiri Kosina, Steven Rostedt, Jason Baron, yrl.pp-manager.tt, Borislav Petkov, Ingo Molnar, Daniel Vetter, x86, Thomas Gleixner On Thu, 9 Oct 2014 07:53:31 +0100 Chris Wilson <chris@chris-wilson.co.uk> wrote: > # cat /proc/mtrr > reg00: base=0x000000000 ( 0MB), size= 2048MB, count=1: write-back > reg01: base=0x080000000 ( 2048MB), size= 256MB, count=1: write-back > reg02: base=0x08e000000 ( 2272MB), size= 32MB, count=1: uncachable > reg03: base=0x08d000000 ( 2256MB), size= 16MB, count=1: uncachable > reg04: base=0x100000000 ( 4096MB), size= 2048MB, count=1: write-back > reg05: base=0x170000000 ( 5888MB), size= 256MB, count=1: uncachable > reg06: base=0x16f000000 ( 5872MB), size= 16MB, count=1: uncachable > reg07: base=0x16e800000 ( 5864MB), size= 8MB, count=1: uncachable > reg08: base=0x16e600000 ( 5862MB), size= 2MB, count=1: uncachable > Well that's what the kernel thinks is in every CPU. Could you try installing x86info and running "x86info --mtrr --all-cpus" while running the broken kernel? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: i915.ko WC writes are slow after ea8596bb2d8d379 2014-10-09 12:44 ` Chuck Ebbert @ 2014-10-09 13:00 ` Chris Wilson 2014-10-09 14:46 ` Chuck Ebbert 0 siblings, 1 reply; 16+ messages in thread From: Chris Wilson @ 2014-10-09 13:00 UTC (permalink / raw) To: Chuck Ebbert Cc: H. Peter Anvin, linux-kernel, Masami Hiramatsu, Jiri Kosina, Steven Rostedt, Jason Baron, yrl.pp-manager.tt, Borislav Petkov, Ingo Molnar, Daniel Vetter, x86, Thomas Gleixner On Thu, Oct 09, 2014 at 07:44:16AM -0500, Chuck Ebbert wrote: > Could you try installing x86info and running "x86info --mtrr > --all-cpus" while running the broken kernel? # /opt/xorg/src/intel-gpu-tools/tests/gem_gtt_speed IGT-Version: 1.8-g32a0308 (x86_64) (Linux: 3.17.0+ x86_64) Time to read 16k through a GTT map: 318.643µs Time to write 16k through a GTT map: 203.103µs Time to clear 16k through a GTT map: 53.098µs Time to clear 16k through a cached GTT map: 49.925µs (i.e. bad kernel) # x86info --mtrr --all-cpus x86info v1.30. Dave Jones 2001-2011 Feedback to <davej@redhat.com>. Found 4 CPUs. CPU #1: Extended Family: 0 Extended Model: 2 Family: 6 Model: 42 Stepping: 7 Type: 0 (Original OEM) CPU Model (x86info's best guess): Unknown model. Processor name string (BIOS programmed): Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz MTRR registers: MTRRcap (0xfe): 0x0000000000000d0a (smrr flag: 0x1, wc flag: 0x1, fix flag: 0x1, vcnt field: 0x0a (10)) MTRRphysBase0 (0x200): 0x0000000000000006 (physbase field:0x000000 type field: 0x06 (write-back)) MTRRphysMask0 (0x201): 0x0000000f80000800 (physmask field:0xf80000 valid flag: 1) MTRRphysBase1 (0x202): 0x0000000080000006 (physbase field:0x080000 type field: 0x06 (write-back)) MTRRphysMask1 (0x203): 0x0000000ff0000800 (physmask field:0xff0000 valid flag: 1) MTRRphysBase2 (0x204): 0x000000008e000000 (physbase field:0x08e000 type field: 0x00 (uncacheable)) MTRRphysMask2 (0x205): 0x0000000ffe000800 (physmask field:0xffe000 valid flag: 1) MTRRphysBase3 (0x206): 0x000000008d000000 (physbase field:0x08d000 type field: 0x00 (uncacheable)) MTRRphysMask3 (0x207): 0x0000000fff000800 (physmask field:0xfff000 valid flag: 1) MTRRphysBase4 (0x208): 0x0000000100000006 (physbase field:0x100000 type field: 0x06 (write-back)) MTRRphysMask4 (0x209): 0x0000000f80000800 (physmask field:0xf80000 valid flag: 1) MTRRphysBase5 (0x20a): 0x0000000170000000 (physbase field:0x170000 type field: 0x00 (uncacheable)) MTRRphysMask5 (0x20b): 0x0000000ff0000800 (physmask field:0xff0000 valid flag: 1) MTRRphysBase6 (0x20c): 0x000000016f000000 (physbase field:0x16f000 type field: 0x00 (uncacheable)) MTRRphysMask6 (0x20d): 0x0000000fff000800 (physmask field:0xfff000 valid flag: 1) MTRRphysBase7 (0x20e): 0x000000016e800000 (physbase field:0x16e800 type field: 0x00 (uncacheable)) MTRRphysMask7 (0x20f): 0x0000000fff800800 (physmask field:0xfff800 valid flag: 1) MTRRfix64K_00000 (0x250): 0x0606060606060606 MTRRfix16K_80000 (0x258): 0x0606060606060606 MTRRfix16K_A0000 (0x259): 0x0000000000000000 MTRRfix4K_C8000 (0x269): 0x0505050505050505 MTRRfix4K_D0000 0x26a: 0x0000000000000000 MTRRfix4K_D8000 0x26b: 0x0000000000000000 MTRRfix4K_E0000 0x26c: 0x0000000000000000 MTRRfix4K_E8000 0x26d: 0x0505050505050505 MTRRfix4K_F0000 0x26e: 0x0505050505050505 MTRRfix4K_F8000 0x26f: 0x0505050505050505 MTRRdefType (0x2ff): 0x0000000000000c00 (fixed-range flag: 0x1, mtrr flag: 0x1, type field: 0x00 (uncacheable)) -------------------------------------------------------------------------- CPU #2: Extended Family: 0 Extended Model: 2 Family: 6 Model: 42 Stepping: 7 Type: 0 (Original OEM) CPU Model (x86info's best guess): Unknown model. Processor name string (BIOS programmed): Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz MTRR registers: MTRRcap (0xfe): 0x0000000000000d0a (smrr flag: 0x1, wc flag: 0x1, fix flag: 0x1, vcnt field: 0x0a (10)) MTRRphysBase0 (0x200): 0x0000000000000006 (physbase field:0x000000 type field: 0x06 (write-back)) MTRRphysMask0 (0x201): 0x0000000f80000800 (physmask field:0xf80000 valid flag: 1) MTRRphysBase1 (0x202): 0x0000000080000006 (physbase field:0x080000 type field: 0x06 (write-back)) MTRRphysMask1 (0x203): 0x0000000ff0000800 (physmask field:0xff0000 valid flag: 1) MTRRphysBase2 (0x204): 0x000000008e000000 (physbase field:0x08e000 type field: 0x00 (uncacheable)) MTRRphysMask2 (0x205): 0x0000000ffe000800 (physmask field:0xffe000 valid flag: 1) MTRRphysBase3 (0x206): 0x000000008d000000 (physbase field:0x08d000 type field: 0x00 (uncacheable)) MTRRphysMask3 (0x207): 0x0000000fff000800 (physmask field:0xfff000 valid flag: 1) MTRRphysBase4 (0x208): 0x0000000100000006 (physbase field:0x100000 type field: 0x06 (write-back)) MTRRphysMask4 (0x209): 0x0000000f80000800 (physmask field:0xf80000 valid flag: 1) MTRRphysBase5 (0x20a): 0x0000000170000000 (physbase field:0x170000 type field: 0x00 (uncacheable)) MTRRphysMask5 (0x20b): 0x0000000ff0000800 (physmask field:0xff0000 valid flag: 1) MTRRphysBase6 (0x20c): 0x000000016f000000 (physbase field:0x16f000 type field: 0x00 (uncacheable)) MTRRphysMask6 (0x20d): 0x0000000fff000800 (physmask field:0xfff000 valid flag: 1) MTRRphysBase7 (0x20e): 0x000000016e800000 (physbase field:0x16e800 type field: 0x00 (uncacheable)) MTRRphysMask7 (0x20f): 0x0000000fff800800 (physmask field:0xfff800 valid flag: 1) MTRRfix64K_00000 (0x250): 0x0606060606060606 MTRRfix16K_80000 (0x258): 0x0606060606060606 MTRRfix16K_A0000 (0x259): 0x0000000000000000 MTRRfix4K_C8000 (0x269): 0x0505050505050505 MTRRfix4K_D0000 0x26a: 0x0000000000000000 MTRRfix4K_D8000 0x26b: 0x0000000000000000 MTRRfix4K_E0000 0x26c: 0x0000000000000000 MTRRfix4K_E8000 0x26d: 0x0505050505050505 MTRRfix4K_F0000 0x26e: 0x0505050505050505 MTRRfix4K_F8000 0x26f: 0x0505050505050505 MTRRdefType (0x2ff): 0x0000000000000c00 (fixed-range flag: 0x1, mtrr flag: 0x1, type field: 0x00 (uncacheable)) -------------------------------------------------------------------------- CPU #3: Extended Family: 0 Extended Model: 2 Family: 6 Model: 42 Stepping: 7 Type: 0 (Original OEM) CPU Model (x86info's best guess): Unknown model. Processor name string (BIOS programmed): Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz MTRR registers: MTRRcap (0xfe): 0x0000000000000d0a (smrr flag: 0x1, wc flag: 0x1, fix flag: 0x1, vcnt field: 0x0a (10)) MTRRphysBase0 (0x200): 0x0000000000000006 (physbase field:0x000000 type field: 0x06 (write-back)) MTRRphysMask0 (0x201): 0x0000000f80000800 (physmask field:0xf80000 valid flag: 1) MTRRphysBase1 (0x202): 0x0000000080000006 (physbase field:0x080000 type field: 0x06 (write-back)) MTRRphysMask1 (0x203): 0x0000000ff0000800 (physmask field:0xff0000 valid flag: 1) MTRRphysBase2 (0x204): 0x000000008e000000 (physbase field:0x08e000 type field: 0x00 (uncacheable)) MTRRphysMask2 (0x205): 0x0000000ffe000800 (physmask field:0xffe000 valid flag: 1) MTRRphysBase3 (0x206): 0x000000008d000000 (physbase field:0x08d000 type field: 0x00 (uncacheable)) MTRRphysMask3 (0x207): 0x0000000fff000800 (physmask field:0xfff000 valid flag: 1) MTRRphysBase4 (0x208): 0x0000000100000006 (physbase field:0x100000 type field: 0x06 (write-back)) MTRRphysMask4 (0x209): 0x0000000f80000800 (physmask field:0xf80000 valid flag: 1) MTRRphysBase5 (0x20a): 0x0000000170000000 (physbase field:0x170000 type field: 0x00 (uncacheable)) MTRRphysMask5 (0x20b): 0x0000000ff0000800 (physmask field:0xff0000 valid flag: 1) MTRRphysBase6 (0x20c): 0x000000016f000000 (physbase field:0x16f000 type field: 0x00 (uncacheable)) MTRRphysMask6 (0x20d): 0x0000000fff000800 (physmask field:0xfff000 valid flag: 1) MTRRphysBase7 (0x20e): 0x000000016e800000 (physbase field:0x16e800 type field: 0x00 (uncacheable)) MTRRphysMask7 (0x20f): 0x0000000fff800800 (physmask field:0xfff800 valid flag: 1) MTRRfix64K_00000 (0x250): 0x0606060606060606 MTRRfix16K_80000 (0x258): 0x0606060606060606 MTRRfix16K_A0000 (0x259): 0x0000000000000000 MTRRfix4K_C8000 (0x269): 0x0505050505050505 MTRRfix4K_D0000 0x26a: 0x0000000000000000 MTRRfix4K_D8000 0x26b: 0x0000000000000000 MTRRfix4K_E0000 0x26c: 0x0000000000000000 MTRRfix4K_E8000 0x26d: 0x0505050505050505 MTRRfix4K_F0000 0x26e: 0x0505050505050505 MTRRfix4K_F8000 0x26f: 0x0505050505050505 MTRRdefType (0x2ff): 0x0000000000000c00 (fixed-range flag: 0x1, mtrr flag: 0x1, type field: 0x00 (uncacheable)) -------------------------------------------------------------------------- CPU #4: Extended Family: 0 Extended Model: 2 Family: 6 Model: 42 Stepping: 7 Type: 0 (Original OEM) CPU Model (x86info's best guess): Unknown model. Processor name string (BIOS programmed): Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz MTRR registers: MTRRcap (0xfe): 0x0000000000000d0a (smrr flag: 0x1, wc flag: 0x1, fix flag: 0x1, vcnt field: 0x0a (10)) MTRRphysBase0 (0x200): 0x0000000000000006 (physbase field:0x000000 type field: 0x06 (write-back)) MTRRphysMask0 (0x201): 0x0000000f80000800 (physmask field:0xf80000 valid flag: 1) MTRRphysBase1 (0x202): 0x0000000080000006 (physbase field:0x080000 type field: 0x06 (write-back)) MTRRphysMask1 (0x203): 0x0000000ff0000800 (physmask field:0xff0000 valid flag: 1) MTRRphysBase2 (0x204): 0x000000008e000000 (physbase field:0x08e000 type field: 0x00 (uncacheable)) MTRRphysMask2 (0x205): 0x0000000ffe000800 (physmask field:0xffe000 valid flag: 1) MTRRphysBase3 (0x206): 0x000000008d000000 (physbase field:0x08d000 type field: 0x00 (uncacheable)) MTRRphysMask3 (0x207): 0x0000000fff000800 (physmask field:0xfff000 valid flag: 1) MTRRphysBase4 (0x208): 0x0000000100000006 (physbase field:0x100000 type field: 0x06 (write-back)) MTRRphysMask4 (0x209): 0x0000000f80000800 (physmask field:0xf80000 valid flag: 1) MTRRphysBase5 (0x20a): 0x0000000170000000 (physbase field:0x170000 type field: 0x00 (uncacheable)) MTRRphysMask5 (0x20b): 0x0000000ff0000800 (physmask field:0xff0000 valid flag: 1) MTRRphysBase6 (0x20c): 0x000000016f000000 (physbase field:0x16f000 type field: 0x00 (uncacheable)) MTRRphysMask6 (0x20d): 0x0000000fff000800 (physmask field:0xfff000 valid flag: 1) MTRRphysBase7 (0x20e): 0x000000016e800000 (physbase field:0x16e800 type field: 0x00 (uncacheable)) MTRRphysMask7 (0x20f): 0x0000000fff800800 (physmask field:0xfff800 valid flag: 1) MTRRfix64K_00000 (0x250): 0x0606060606060606 MTRRfix16K_80000 (0x258): 0x0606060606060606 MTRRfix16K_A0000 (0x259): 0x0000000000000000 MTRRfix4K_C8000 (0x269): 0x0505050505050505 MTRRfix4K_D0000 0x26a: 0x0000000000000000 MTRRfix4K_D8000 0x26b: 0x0000000000000000 MTRRfix4K_E0000 0x26c: 0x0000000000000000 MTRRfix4K_E8000 0x26d: 0x0505050505050505 MTRRfix4K_F0000 0x26e: 0x0505050505050505 MTRRfix4K_F8000 0x26f: 0x0505050505050505 MTRRdefType (0x2ff): 0x0000000000000c00 (fixed-range flag: 0x1, mtrr flag: 0x1, type field: 0x00 (uncacheable)) -------------------------------------------------------------------------- Total processor threads: 4 This system has 1 dual-core processor with hyper-threading (2 threads per core) running at an estimated 3.30GHz -- Chris Wilson, Intel Open Source Technology Centre ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: i915.ko WC writes are slow after ea8596bb2d8d379 2014-10-09 13:00 ` Chris Wilson @ 2014-10-09 14:46 ` Chuck Ebbert 2014-10-09 15:14 ` Chris Wilson 0 siblings, 1 reply; 16+ messages in thread From: Chuck Ebbert @ 2014-10-09 14:46 UTC (permalink / raw) To: Chris Wilson Cc: H. Peter Anvin, linux-kernel, Masami Hiramatsu, Jiri Kosina, Steven Rostedt, Jason Baron, yrl.pp-manager.tt, Borislav Petkov, Ingo Molnar, Daniel Vetter, x86, Thomas Gleixner, Dave Jones On Thu, 9 Oct 2014 14:00:47 +0100 Chris Wilson <chris@chris-wilson.co.uk> wrote: > On Thu, Oct 09, 2014 at 07:44:16AM -0500, Chuck Ebbert wrote: > > Could you try installing x86info and running "x86info --mtrr > > --all-cpus" while running the broken kernel? > > # /opt/xorg/src/intel-gpu-tools/tests/gem_gtt_speed > IGT-Version: 1.8-g32a0308 (x86_64) (Linux: 3.17.0+ x86_64) > Time to read 16k through a GTT map: 318.643µs > Time to write 16k through a GTT map: 203.103µs > Time to clear 16k through a GTT map: 53.098µs > Time to clear 16k through a cached GTT map: 49.925µs > > (i.e. bad kernel) > > # x86info --mtrr --all-cpus > x86info v1.30. Dave Jones 2001-2011 > Feedback to <davej@redhat.com>. > > Found 4 CPUs. > CPU #1: > Extended Family: 0 Extended Model: 2 Family: 6 Model: 42 Stepping: 7 > Type: 0 (Original OEM) > CPU Model (x86info's best guess): Unknown model. > Processor name string (BIOS programmed): Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz > > MTRR registers: > MTRRcap (0xfe): 0x0000000000000d0a (smrr flag: 0x1, wc flag: 0x1, fix flag: 0x1, vcnt field: 0x0a (10)) > MTRRphysBase0 (0x200): 0x0000000000000006 (physbase field:0x000000 type field: 0x06 (write-back)) > MTRRphysMask0 (0x201): 0x0000000f80000800 (physmask field:0xf80000 valid flag: 1) > MTRRphysBase1 (0x202): 0x0000000080000006 (physbase field:0x080000 type field: 0x06 (write-back)) > MTRRphysMask1 (0x203): 0x0000000ff0000800 (physmask field:0xff0000 valid flag: 1) > MTRRphysBase2 (0x204): 0x000000008e000000 (physbase field:0x08e000 type field: 0x00 (uncacheable)) > MTRRphysMask2 (0x205): 0x0000000ffe000800 (physmask field:0xffe000 valid flag: 1) > MTRRphysBase3 (0x206): 0x000000008d000000 (physbase field:0x08d000 type field: 0x00 (uncacheable)) > MTRRphysMask3 (0x207): 0x0000000fff000800 (physmask field:0xfff000 valid flag: 1) > MTRRphysBase4 (0x208): 0x0000000100000006 (physbase field:0x100000 type field: 0x06 (write-back)) > MTRRphysMask4 (0x209): 0x0000000f80000800 (physmask field:0xf80000 valid flag: 1) > MTRRphysBase5 (0x20a): 0x0000000170000000 (physbase field:0x170000 type field: 0x00 (uncacheable)) > MTRRphysMask5 (0x20b): 0x0000000ff0000800 (physmask field:0xff0000 valid flag: 1) > MTRRphysBase6 (0x20c): 0x000000016f000000 (physbase field:0x16f000 type field: 0x00 (uncacheable)) > MTRRphysMask6 (0x20d): 0x0000000fff000800 (physmask field:0xfff000 valid flag: 1) > MTRRphysBase7 (0x20e): 0x000000016e800000 (physbase field:0x16e800 type field: 0x00 (uncacheable)) > MTRRphysMask7 (0x20f): 0x0000000fff800800 (physmask field:0xfff800 valid flag: 1) > MTRRfix64K_00000 (0x250): 0x0606060606060606 > MTRRfix16K_80000 (0x258): 0x0606060606060606 > MTRRfix16K_A0000 (0x259): 0x0000000000000000 > MTRRfix4K_C8000 (0x269): 0x0505050505050505 > MTRRfix4K_D0000 0x26a: 0x0000000000000000 > MTRRfix4K_D8000 0x26b: 0x0000000000000000 > MTRRfix4K_E0000 0x26c: 0x0000000000000000 > MTRRfix4K_E8000 0x26d: 0x0505050505050505 > MTRRfix4K_F0000 0x26e: 0x0505050505050505 > MTRRfix4K_F8000 0x26f: 0x0505050505050505 > MTRRdefType (0x2ff): 0x0000000000000c00 (fixed-range flag: 0x1, mtrr flag: 0x1, type field: 0x00 (uncacheable)) > <snip> Well they're all the same. Hmm, x86info is not dumping all the variable MTRRs. You have 10, but it only prints the first 8. I don't know if it will show anything different, but can you try fixing it with this patch? --- a/mtrr.c +++ b/mtrr.c @@ -75,19 +75,23 @@ printf("0x%016llx\n", val); } -static void decode_mtrrcap(int cpu, int msr) +unsigned int decode_mtrrcap(int cpu, int msr) { unsigned long long val; + unsigned int vcnt = 0; int ret; ret = mtrr_value(cpu,msr,&val); if (ret) { + vcnt = (unsigned int)(val & IA32_MTRRCAP_VCNT); printf("0x%016llx ", val); printf("(smrr flag: 0x%01x, ",(unsigned int) (val & IA32_MTRRCAP_SMRR) >> 11 ); printf("wc flag: 0x%01x, ",(unsigned int) (val&IA32_MTRRCAP_WC) >> 10); printf("fix flag: 0x%01x, ",(unsigned int) (val&IA32_MTRRCAP_FIX) >> 8); - printf("vcnt field: 0x%02x (%d))\n",(unsigned int) (val&IA32_MTRRCAP_VCNT) , (int) (val&IA32_MTRRCAP_VCNT)); + printf("vcnt field: 0x%02x (%u))\n", vcnt, vcnt); } + + return vcnt; } static void decode_mtrr_deftype(int cpu, int msr) @@ -142,7 +146,7 @@ void dump_mtrrs(struct cpudata *cpu) { unsigned long long val = 0; - unsigned int i; + unsigned int i, vcnt; if (!(cpu->flags_edx & (X86_FEATURE_MTRR))) return; @@ -157,11 +161,11 @@ printf("MTRR registers:\n"); printf("MTRRcap (0xfe): "); - decode_mtrrcap(cpu->number, 0xfe); + vcnt = decode_mtrrcap(cpu->number, 0xfe); set_max_phy_addr(cpu); - for (i = 0; i < 16; i+=2) { + for (i = 0; i < 2 * vcnt; i += 2) { printf("MTRRphysBase%u (0x%x): ", i/2, (unsigned int) 0x200+i); decode_mtrr_physbase(cpu->number, 0x200+i); printf("MTRRphysMask%u (0x%x): ", i/2, (unsigned int) 0x201+i); ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: i915.ko WC writes are slow after ea8596bb2d8d379 2014-10-09 14:46 ` Chuck Ebbert @ 2014-10-09 15:14 ` Chris Wilson 0 siblings, 0 replies; 16+ messages in thread From: Chris Wilson @ 2014-10-09 15:14 UTC (permalink / raw) To: Chuck Ebbert Cc: H. Peter Anvin, linux-kernel, Masami Hiramatsu, Jiri Kosina, Steven Rostedt, Jason Baron, yrl.pp-manager.tt, Borislav Petkov, Ingo Molnar, Daniel Vetter, x86, Thomas Gleixner, Dave Jones On Thu, Oct 09, 2014 at 09:46:37AM -0500, Chuck Ebbert wrote: > Well they're all the same. > > Hmm, x86info is not dumping all the variable MTRRs. You have 10, but > it only prints the first 8. I don't know if it will show anything > different, but can you try fixing it with this patch? Source (https://github.com/dankamongmen/x86info) was slightly different, but I followed the drift. tldr: 8,9 appear to be identical on all cpus as well. $ sudo ./x86info --mtrr --all-cpus x86info v1.31pre Found 4 CPUs. CPU #1: Extended Family: 0 Extended Model: 2 Family: 6 Model: 42 Stepping: 7 Type: 0 (Original OEM) CPU Model (x86info's best guess): Core i7 (SandyBridge) Processor name string (BIOS programmed): Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz MTRR registers: MTRRcap (0xfe): 0x0000000000000d0a wc:1 fix:1 vcnt:10 MTRRphysBase0 (0x200): 0x0000000000000006 (physbase:0x000000 type: 0x06 (write-back)) MTRRphysMask0 (0x201): 0x0000000f80000800 (physmask:0xf80000 valid:1) MTRRphysBase1 (0x202): 0x0000000080000006 (physbase:0x080000 type: 0x06 (write-back)) MTRRphysMask1 (0x203): 0x0000000ff0000800 (physmask:0xff0000 valid:1) MTRRphysBase2 (0x204): 0x000000008e000000 (physbase:0x08e000 type: 0x00 (uncacheable)) MTRRphysMask2 (0x205): 0x0000000ffe000800 (physmask:0xffe000 valid:1) MTRRphysBase3 (0x206): 0x000000008d000000 (physbase:0x08d000 type: 0x00 (uncacheable)) MTRRphysMask3 (0x207): 0x0000000fff000800 (physmask:0xfff000 valid:1) MTRRphysBase4 (0x208): 0x0000000100000006 (physbase:0x100000 type: 0x06 (write-back)) MTRRphysMask4 (0x209): 0x0000000f80000800 (physmask:0xf80000 valid:1) MTRRphysBase5 (0x20a): 0x0000000170000000 (physbase:0x170000 type: 0x00 (uncacheable)) MTRRphysMask5 (0x20b): 0x0000000ff0000800 (physmask:0xff0000 valid:1) MTRRphysBase6 (0x20c): 0x000000016f000000 (physbase:0x16f000 type: 0x00 (uncacheable)) MTRRphysMask6 (0x20d): 0x0000000fff000800 (physmask:0xfff000 valid:1) MTRRphysBase7 (0x20e): 0x000000016e800000 (physbase:0x16e800 type: 0x00 (uncacheable)) MTRRphysMask7 (0x20f): 0x0000000fff800800 (physmask:0xfff800 valid:1) MTRRphysBase8 (0x210): 0x000000016e600000 (physbase:0x16e600 type: 0x00 (uncacheable)) MTRRphysMask8 (0x211): 0x0000000fffe00800 (physmask:0xfffe00 valid:1) MTRRphysBase9 (0x212): 0x0000000000000000 (physbase:0x000000 type: 0x00 (uncacheable)) MTRRphysMask9 (0x213): 0x0000000000000000 (physmask:0x000000 valid:0) MTRRfix64K_00000 (0x250): 0x0606060606060606 MTRRfix16K_80000 (0x258): 0x0606060606060606 MTRRfix16K_A0000 (0x259): 0x0000000000000000 MTRRfix4K_C8000 (0x269): 0x0505050505050505 MTRRfix4K_D0000 0x26a: 0x0000000000000000 MTRRfix4K_D8000 0x26b: 0x0000000000000000 MTRRfix4K_E0000 0x26c: 0x0000000000000000 MTRRfix4K_E8000 0x26d: 0x0505050505050505 MTRRfix4K_F0000 0x26e: 0x0505050505050505 MTRRfix4K_F8000 0x26f: 0x0505050505050505 MTRRdefType (0x2ff): 0x0000000000000c00 (fixed-range flag:1 enable flag:1 default type:0x00 (uncacheable)) -------------------------------------------------------------------------- CPU #2: Extended Family: 0 Extended Model: 2 Family: 6 Model: 42 Stepping: 7 Type: 0 (Original OEM) CPU Model (x86info's best guess): Core i7 (SandyBridge) Processor name string (BIOS programmed): Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz MTRR registers: MTRRcap (0xfe): 0x0000000000000d0a wc:1 fix:1 vcnt:10 MTRRphysBase0 (0x200): 0x0000000000000006 (physbase:0x000000 type: 0x06 (write-back)) MTRRphysMask0 (0x201): 0x0000000f80000800 (physmask:0xf80000 valid:1) MTRRphysBase1 (0x202): 0x0000000080000006 (physbase:0x080000 type: 0x06 (write-back)) MTRRphysMask1 (0x203): 0x0000000ff0000800 (physmask:0xff0000 valid:1) MTRRphysBase2 (0x204): 0x000000008e000000 (physbase:0x08e000 type: 0x00 (uncacheable)) MTRRphysMask2 (0x205): 0x0000000ffe000800 (physmask:0xffe000 valid:1) MTRRphysBase3 (0x206): 0x000000008d000000 (physbase:0x08d000 type: 0x00 (uncacheable)) MTRRphysMask3 (0x207): 0x0000000fff000800 (physmask:0xfff000 valid:1) MTRRphysBase4 (0x208): 0x0000000100000006 (physbase:0x100000 type: 0x06 (write-back)) MTRRphysMask4 (0x209): 0x0000000f80000800 (physmask:0xf80000 valid:1) MTRRphysBase5 (0x20a): 0x0000000170000000 (physbase:0x170000 type: 0x00 (uncacheable)) MTRRphysMask5 (0x20b): 0x0000000ff0000800 (physmask:0xff0000 valid:1) MTRRphysBase6 (0x20c): 0x000000016f000000 (physbase:0x16f000 type: 0x00 (uncacheable)) MTRRphysMask6 (0x20d): 0x0000000fff000800 (physmask:0xfff000 valid:1) MTRRphysBase7 (0x20e): 0x000000016e800000 (physbase:0x16e800 type: 0x00 (uncacheable)) MTRRphysMask7 (0x20f): 0x0000000fff800800 (physmask:0xfff800 valid:1) MTRRphysBase8 (0x210): 0x000000016e600000 (physbase:0x16e600 type: 0x00 (uncacheable)) MTRRphysMask8 (0x211): 0x0000000fffe00800 (physmask:0xfffe00 valid:1) MTRRphysBase9 (0x212): 0x0000000000000000 (physbase:0x000000 type: 0x00 (uncacheable)) MTRRphysMask9 (0x213): 0x0000000000000000 (physmask:0x000000 valid:0) MTRRfix64K_00000 (0x250): 0x0606060606060606 MTRRfix16K_80000 (0x258): 0x0606060606060606 MTRRfix16K_A0000 (0x259): 0x0000000000000000 MTRRfix4K_C8000 (0x269): 0x0505050505050505 MTRRfix4K_D0000 0x26a: 0x0000000000000000 MTRRfix4K_D8000 0x26b: 0x0000000000000000 MTRRfix4K_E0000 0x26c: 0x0000000000000000 MTRRfix4K_E8000 0x26d: 0x0505050505050505 MTRRfix4K_F0000 0x26e: 0x0505050505050505 MTRRfix4K_F8000 0x26f: 0x0505050505050505 MTRRdefType (0x2ff): 0x0000000000000c00 (fixed-range flag:1 enable flag:1 default type:0x00 (uncacheable)) -------------------------------------------------------------------------- CPU #3: Extended Family: 0 Extended Model: 2 Family: 6 Model: 42 Stepping: 7 Type: 0 (Original OEM) CPU Model (x86info's best guess): Core i7 (SandyBridge) Processor name string (BIOS programmed): Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz MTRR registers: MTRRcap (0xfe): 0x0000000000000d0a wc:1 fix:1 vcnt:10 MTRRphysBase0 (0x200): 0x0000000000000006 (physbase:0x000000 type: 0x06 (write-back)) MTRRphysMask0 (0x201): 0x0000000f80000800 (physmask:0xf80000 valid:1) MTRRphysBase1 (0x202): 0x0000000080000006 (physbase:0x080000 type: 0x06 (write-back)) MTRRphysMask1 (0x203): 0x0000000ff0000800 (physmask:0xff0000 valid:1) MTRRphysBase2 (0x204): 0x000000008e000000 (physbase:0x08e000 type: 0x00 (uncacheable)) MTRRphysMask2 (0x205): 0x0000000ffe000800 (physmask:0xffe000 valid:1) MTRRphysBase3 (0x206): 0x000000008d000000 (physbase:0x08d000 type: 0x00 (uncacheable)) MTRRphysMask3 (0x207): 0x0000000fff000800 (physmask:0xfff000 valid:1) MTRRphysBase4 (0x208): 0x0000000100000006 (physbase:0x100000 type: 0x06 (write-back)) MTRRphysMask4 (0x209): 0x0000000f80000800 (physmask:0xf80000 valid:1) MTRRphysBase5 (0x20a): 0x0000000170000000 (physbase:0x170000 type: 0x00 (uncacheable)) MTRRphysMask5 (0x20b): 0x0000000ff0000800 (physmask:0xff0000 valid:1) MTRRphysBase6 (0x20c): 0x000000016f000000 (physbase:0x16f000 type: 0x00 (uncacheable)) MTRRphysMask6 (0x20d): 0x0000000fff000800 (physmask:0xfff000 valid:1) MTRRphysBase7 (0x20e): 0x000000016e800000 (physbase:0x16e800 type: 0x00 (uncacheable)) MTRRphysMask7 (0x20f): 0x0000000fff800800 (physmask:0xfff800 valid:1) MTRRphysBase8 (0x210): 0x000000016e600000 (physbase:0x16e600 type: 0x00 (uncacheable)) MTRRphysMask8 (0x211): 0x0000000fffe00800 (physmask:0xfffe00 valid:1) MTRRphysBase9 (0x212): 0x0000000000000000 (physbase:0x000000 type: 0x00 (uncacheable)) MTRRphysMask9 (0x213): 0x0000000000000000 (physmask:0x000000 valid:0) MTRRfix64K_00000 (0x250): 0x0606060606060606 MTRRfix16K_80000 (0x258): 0x0606060606060606 MTRRfix16K_A0000 (0x259): 0x0000000000000000 MTRRfix4K_C8000 (0x269): 0x0505050505050505 MTRRfix4K_D0000 0x26a: 0x0000000000000000 MTRRfix4K_D8000 0x26b: 0x0000000000000000 MTRRfix4K_E0000 0x26c: 0x0000000000000000 MTRRfix4K_E8000 0x26d: 0x0505050505050505 MTRRfix4K_F0000 0x26e: 0x0505050505050505 MTRRfix4K_F8000 0x26f: 0x0505050505050505 MTRRdefType (0x2ff): 0x0000000000000c00 (fixed-range flag:1 enable flag:1 default type:0x00 (uncacheable)) -------------------------------------------------------------------------- CPU #4: Extended Family: 0 Extended Model: 2 Family: 6 Model: 42 Stepping: 7 Type: 0 (Original OEM) CPU Model (x86info's best guess): Core i7 (SandyBridge) Processor name string (BIOS programmed): Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz MTRR registers: MTRRcap (0xfe): 0x0000000000000d0a wc:1 fix:1 vcnt:10 MTRRphysBase0 (0x200): 0x0000000000000006 (physbase:0x000000 type: 0x06 (write-back)) MTRRphysMask0 (0x201): 0x0000000f80000800 (physmask:0xf80000 valid:1) MTRRphysBase1 (0x202): 0x0000000080000006 (physbase:0x080000 type: 0x06 (write-back)) MTRRphysMask1 (0x203): 0x0000000ff0000800 (physmask:0xff0000 valid:1) MTRRphysBase2 (0x204): 0x000000008e000000 (physbase:0x08e000 type: 0x00 (uncacheable)) MTRRphysMask2 (0x205): 0x0000000ffe000800 (physmask:0xffe000 valid:1) MTRRphysBase3 (0x206): 0x000000008d000000 (physbase:0x08d000 type: 0x00 (uncacheable)) MTRRphysMask3 (0x207): 0x0000000fff000800 (physmask:0xfff000 valid:1) MTRRphysBase4 (0x208): 0x0000000100000006 (physbase:0x100000 type: 0x06 (write-back)) MTRRphysMask4 (0x209): 0x0000000f80000800 (physmask:0xf80000 valid:1) MTRRphysBase5 (0x20a): 0x0000000170000000 (physbase:0x170000 type: 0x00 (uncacheable)) MTRRphysMask5 (0x20b): 0x0000000ff0000800 (physmask:0xff0000 valid:1) MTRRphysBase6 (0x20c): 0x000000016f000000 (physbase:0x16f000 type: 0x00 (uncacheable)) MTRRphysMask6 (0x20d): 0x0000000fff000800 (physmask:0xfff000 valid:1) MTRRphysBase7 (0x20e): 0x000000016e800000 (physbase:0x16e800 type: 0x00 (uncacheable)) MTRRphysMask7 (0x20f): 0x0000000fff800800 (physmask:0xfff800 valid:1) MTRRphysBase8 (0x210): 0x000000016e600000 (physbase:0x16e600 type: 0x00 (uncacheable)) MTRRphysMask8 (0x211): 0x0000000fffe00800 (physmask:0xfffe00 valid:1) MTRRphysBase9 (0x212): 0x0000000000000000 (physbase:0x000000 type: 0x00 (uncacheable)) MTRRphysMask9 (0x213): 0x0000000000000000 (physmask:0x000000 valid:0) MTRRfix64K_00000 (0x250): 0x0606060606060606 MTRRfix16K_80000 (0x258): 0x0606060606060606 MTRRfix16K_A0000 (0x259): 0x0000000000000000 MTRRfix4K_C8000 (0x269): 0x0505050505050505 MTRRfix4K_D0000 0x26a: 0x0000000000000000 MTRRfix4K_D8000 0x26b: 0x0000000000000000 MTRRfix4K_E0000 0x26c: 0x0000000000000000 MTRRfix4K_E8000 0x26d: 0x0505050505050505 MTRRfix4K_F0000 0x26e: 0x0505050505050505 MTRRfix4K_F8000 0x26f: 0x0505050505050505 MTRRdefType (0x2ff): 0x0000000000000c00 (fixed-range flag:1 enable flag:1 default type:0x00 (uncacheable)) -------------------------------------------------------------------------- Total processor threads: 4 This system has 1 dual-core processor with hyper-threading (2 threads per core) running at an estimated 3.30GHz -- Chris Wilson, Intel Open Source Technology Centre ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: i915.ko WC writes are slow after ea8596bb2d8d379 2014-10-08 10:10 ` Chuck Ebbert 2014-10-08 19:49 ` Chris Wilson @ 2015-11-18 14:48 ` Chris Wilson 2015-11-18 15:57 ` Andy Lutomirski 2015-11-19 8:16 ` i915.ko WC writes are slow after ea8596bb2d8d379 Ingo Molnar 1 sibling, 2 replies; 16+ messages in thread From: Chris Wilson @ 2015-11-18 14:48 UTC (permalink / raw) To: Chuck Ebbert Cc: linux-kernel, Andrew Morton, Tejun Heo, Andy Lutomirski, Rusty Russell, Peter Zijlstra, Masami Hiramatsu, Jiri Kosina, H. Peter Anvin, Steven Rostedt, Jason Baron, yrl.pp-manager.tt, Borislav Petkov, Ingo Molnar, Daniel Vetter On Wed, Oct 08, 2014 at 05:10:59AM -0500, Chuck Ebbert wrote: > On Wed, 8 Oct 2014 10:03:36 +0100 > Chris Wilson <chris@chris-wilson.co.uk> wrote: > > > > > I ran into a problem on a Sandybridge i5-2500s whilst measuring the > > performance of GTT write-combining access. I found subsequent runs were > > about 10-40x slower than the first. For example, > > > > igt/gem_gtt_speed: > > > > Time to read 16k through a GTT map: 325.285µs > > Time to write 16k through a GTT map: 4.729µs > > Time to clear 16k through a GTT map: 4.584µs > > Time to clear 16k through a cached GTT map: 1.342µs > > > > on the second run became: > > > > Time to read 16k through a GTT map: 332.148µs > > Time to write 16k through a GTT map: 209.411µs > > Time to clear 16k through a GTT map: 56.460µs > > Time to clear 16k through a cached GTT map: 50.897µs > > > > Naively I would say that we lost the wc on our ioremap. > > /sys/kernel/debug/x86/pat_memtype_list remained the same across repeated > > runs. > > > > A bisection pointed to > > > > commit ea8596bb2d8d37957f3e92db9511c50801689180 > > Author: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> > > Date: Thu Jul 18 20:47:53 2013 +0900 > > > > kprobes/x86: Remove unused text_poke_smp() and text_poke_smp_batch() functions > > > > of which the active ingredient was just > > > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > > index b32ebf9..f4001e0 100644 > > --- a/arch/x86/Kconfig > > +++ b/arch/x86/Kconfig > > @@ -2334,7 +2334,6 @@ config HAVE_ATOMIC_IOMAP > > > > config HAVE_TEXT_POKE_SMP > > bool > > - select STOP_MACHINE if SMP > > > > config X86_DEV_DMA_OPS > > bool > > > > and adding that back into the current build, e.g. > > Hmm, set_mtrr() uses stop_machine(). I wonder if your MTRRs are out of > sync and your results depend on which CPU the test runs on? (From the other reply, it did and is still required). I have run into other issues where stop_machine() tries to only do a irq-disabled callback on the local CPU as opposed to halting all CPUs and running the callback universally. My understanding is that the root cause of the issue is: diff --git a/init/Kconfig b/init/Kconfig index af09b4f..8235e0b 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1993,8 +1993,7 @@ config INIT_ALL_POSSIBLE config STOP_MACHINE bool - default y - depends on (SMP && MODULE_UNLOAD) || HOTPLUG_CPU + default y if SMP || HOTPLUG_CPU help Need stop_machine() primitive. Although diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h index d2abbdb..ff4f029 100644 --- a/include/linux/stop_machine.h +++ b/include/linux/stop_machine.h @@ -97,7 +97,7 @@ static inline int try_stop_cpus(const struct cpumask *cpumask, * grabbing every spinlock (and more). So the "read" side to such a * lock is anything which disables preemption. */ -#if defined(CONFIG_STOP_MACHINE) && defined(CONFIG_SMP) +#if defined(CONFIG_SMP) || defined(CONFIG_HOTPLUG_CPU) /** * stop_machine: freeze the machine on all CPUs and run this function @@ -128,7 +128,7 @@ int __stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus); int stop_machine_from_inactive_cpu(int (*fn)(void *), void *data, const struct cpumask *cpus); -#else /* CONFIG_STOP_MACHINE && CONFIG_SMP */ +#else /* CONFIG_SMP */ static inline int __stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus) @@ -153,5 +153,5 @@ static inline int stop_machine_from_inactive_cpu(int (*fn)(void *), void *data, return __stop_machine(fn, data, cpus); } -#endif /* CONFIG_STOP_MACHINE && CONFIG_SMP */ +#endif /* CONFIG_SMP || CONFIG_HOTPLUG_CPU */ #endif /* _LINUX_STOP_MACHINE */ diff --git a/init/Kconfig b/init/Kconfig index af09b4f..44600a8 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1991,13 +1991,6 @@ config INIT_ALL_POSSIBLE it was better to provide this option than to break all the archs and have several arch maintainers pursuing me down dark alleys. -config STOP_MACHINE - bool - default y - depends on (SMP && MODULE_UNLOAD) || HOTPLUG_CPU - help - Need stop_machine() primitive. - source "block/Kconfig" config PREEMPT_NOTIFIERS diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c index fd643d8..2dd1f306 100644 --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -513,7 +513,7 @@ static int __init cpu_stop_init(void) } early_initcall(cpu_stop_init); -#ifdef CONFIG_STOP_MACHINE +#if defined(CONFIG_SMP) || defined(CONFIG_HOTPLUG_CPU) int __stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus) { @@ -613,4 +613,4 @@ int stop_machine_from_inactive_cpu(int (*fn)(void *), void *data, return ret ?: done.ret; } -#endif /* CONFIG_STOP_MACHINE */ +#endif /* CONFIG_SMP || CONFIG_HOTPLUG_CPU */ may be more apt. -Chris -- Chris Wilson, Intel Open Source Technology Centre ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: i915.ko WC writes are slow after ea8596bb2d8d379 2015-11-18 14:48 ` Chris Wilson @ 2015-11-18 15:57 ` Andy Lutomirski 2015-11-19 8:14 ` Ingo Molnar 2015-11-19 8:16 ` i915.ko WC writes are slow after ea8596bb2d8d379 Ingo Molnar 1 sibling, 1 reply; 16+ messages in thread From: Andy Lutomirski @ 2015-11-18 15:57 UTC (permalink / raw) To: Chris Wilson Cc: Chuck Ebbert, linux-kernel@vger.kernel.org, Andrew Morton, Tejun Heo, Rusty Russell, Peter Zijlstra, Masami Hiramatsu, Jiri Kosina, H. Peter Anvin, Steven Rostedt, Jason Baron, yrl.pp-manager.tt, Borislav Petkov, Ingo Molnar, Daniel Vetter On Wed, Nov 18, 2015 at 6:48 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote: > Although > > diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h > index d2abbdb..ff4f029 100644 > --- a/include/linux/stop_machine.h > +++ b/include/linux/stop_machine.h > @@ -97,7 +97,7 @@ static inline int try_stop_cpus(const struct cpumask *cpumask, > * grabbing every spinlock (and more). So the "read" side to such a > * lock is anything which disables preemption. > */ > -#if defined(CONFIG_STOP_MACHINE) && defined(CONFIG_SMP) > +#if defined(CONFIG_SMP) || defined(CONFIG_HOTPLUG_CPU) [...] This seems much better. Having a set of stop_machine functions around that don't work depending on config seems dangerous. --Andy ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: i915.ko WC writes are slow after ea8596bb2d8d379 2015-11-18 15:57 ` Andy Lutomirski @ 2015-11-19 8:14 ` Ingo Molnar 2015-11-19 10:03 ` [PATCH] kernel: Remove stop_machine() Kconfig dependency Chris Wilson 0 siblings, 1 reply; 16+ messages in thread From: Ingo Molnar @ 2015-11-19 8:14 UTC (permalink / raw) To: Andy Lutomirski Cc: Chris Wilson, Chuck Ebbert, linux-kernel@vger.kernel.org, Andrew Morton, Tejun Heo, Rusty Russell, Peter Zijlstra, Masami Hiramatsu, Jiri Kosina, H. Peter Anvin, Steven Rostedt, Jason Baron, yrl.pp-manager.tt, Borislav Petkov, Daniel Vetter * Andy Lutomirski <luto@amacapital.net> wrote: > On Wed, Nov 18, 2015 at 6:48 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote: > > Although > > > > diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h > > index d2abbdb..ff4f029 100644 > > --- a/include/linux/stop_machine.h > > +++ b/include/linux/stop_machine.h > > @@ -97,7 +97,7 @@ static inline int try_stop_cpus(const struct cpumask *cpumask, > > * grabbing every spinlock (and more). So the "read" side to such a > > * lock is anything which disables preemption. > > */ > > -#if defined(CONFIG_STOP_MACHINE) && defined(CONFIG_SMP) > > +#if defined(CONFIG_SMP) || defined(CONFIG_HOTPLUG_CPU) > > [...] > > This seems much better. Having a set of stop_machine functions around > that don't work depending on config seems dangerous. Agreed. Acked-by: Ingo Molnar <mingo@kernel.org> Thanks, Ingo ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH] kernel: Remove stop_machine() Kconfig dependency 2015-11-19 8:14 ` Ingo Molnar @ 2015-11-19 10:03 ` Chris Wilson 0 siblings, 0 replies; 16+ messages in thread From: Chris Wilson @ 2015-11-19 10:03 UTC (permalink / raw) To: linux-kernel, Linus Torvalds Cc: Chris Wilson, Pranith Kumar, Andrew Morton, Michal Hocko, Vladimir Davydov, Johannes Weiner, Ingo Molnar, H . Peter Anvin, Tejun Heo, Iulia Manda, Andy Lutomirski, Rusty Russell, Peter Zijlstra, Chuck Ebbert Currently the full stop_machine() routine is only enabled on SMP if module unloading is enabled, or if the CPUs are hotpluggable. This leads to configurations where stop_machine() is broken as it will then only run the callback on the local CPU with irqs disabled, and not stop the other CPUs or run the callback on them. For example, this breaks MTRR setup on x86 in certain configs since commit ea8596bb2d8d37957f3e92db9511c50801689180 Author: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Date: Thu Jul 18 20:47:53 2013 +0900 kprobes/x86: Remove unused text_poke_smp() and text_poke_smp_batch() functions as the MTRR is only established on the boot CPU. This patch removes the Kconfig option for STOP_MACHINE and uses the SMP and HOTPLUG_CPU config options to compile the correct stop_machine() for the architecture, removing the false dependency on MODULE_UNLOAD in the process. Link: https://lkml.org/lkml/2014/10/8/124 References: https://bugs.freedesktop.org/show_bug.cgi?id=84794 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Preemptively-Acked-by: Ingo Molnar <mingo@kernel.org> Cc:"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Pranith Kumar <bobby.prani@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vladimir Davydov <vdavydov@parallels.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Tejun Heo <tj@kernel.org> Cc: Iulia Manda <iulia.manda21@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Chuck Ebbert <cebbert.lkml@gmail.com> --- include/linux/stop_machine.h | 6 +++--- init/Kconfig | 7 ------- kernel/stop_machine.c | 4 ++-- 3 files changed, 5 insertions(+), 12 deletions(-) diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h index 0adedca24c5b..0e1b1540597a 100644 --- a/include/linux/stop_machine.h +++ b/include/linux/stop_machine.h @@ -99,7 +99,7 @@ static inline int try_stop_cpus(const struct cpumask *cpumask, * grabbing every spinlock (and more). So the "read" side to such a * lock is anything which disables preemption. */ -#if defined(CONFIG_STOP_MACHINE) && defined(CONFIG_SMP) +#if defined(CONFIG_SMP) || defined(CONFIG_HOTPLUG_CPU) /** * stop_machine: freeze the machine on all CPUs and run this function @@ -118,7 +118,7 @@ int stop_machine(cpu_stop_fn_t fn, void *data, const struct cpumask *cpus); int stop_machine_from_inactive_cpu(cpu_stop_fn_t fn, void *data, const struct cpumask *cpus); -#else /* CONFIG_STOP_MACHINE && CONFIG_SMP */ +#else /* CONFIG_SMP || CONFIG_HOTPLUG_CPU */ static inline int stop_machine(cpu_stop_fn_t fn, void *data, const struct cpumask *cpus) @@ -137,5 +137,5 @@ static inline int stop_machine_from_inactive_cpu(cpu_stop_fn_t fn, void *data, return stop_machine(fn, data, cpus); } -#endif /* CONFIG_STOP_MACHINE && CONFIG_SMP */ +#endif /* CONFIG_SMP || CONFIG_HOTPLUG_CPU */ #endif /* _LINUX_STOP_MACHINE */ diff --git a/init/Kconfig b/init/Kconfig index c24b6f767bf0..235c7a2c0d20 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -2030,13 +2030,6 @@ config INIT_ALL_POSSIBLE it was better to provide this option than to break all the archs and have several arch maintainers pursuing me down dark alleys. -config STOP_MACHINE - bool - default y - depends on (SMP && MODULE_UNLOAD) || HOTPLUG_CPU - help - Need stop_machine() primitive. - source "block/Kconfig" config PREEMPT_NOTIFIERS diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c index 867bc20e1ef1..a3bbaee77c58 100644 --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -531,7 +531,7 @@ static int __init cpu_stop_init(void) } early_initcall(cpu_stop_init); -#ifdef CONFIG_STOP_MACHINE +#if defined(CONFIG_SMP) || defined(CONFIG_HOTPLUG_CPU) static int __stop_machine(cpu_stop_fn_t fn, void *data, const struct cpumask *cpus) { @@ -631,4 +631,4 @@ int stop_machine_from_inactive_cpu(cpu_stop_fn_t fn, void *data, return ret ?: done.ret; } -#endif /* CONFIG_STOP_MACHINE */ +#endif /* CONFIG_SMP || CONFIG_HOTPLUG_CPU */ -- 2.6.2 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: i915.ko WC writes are slow after ea8596bb2d8d379 2015-11-18 14:48 ` Chris Wilson 2015-11-18 15:57 ` Andy Lutomirski @ 2015-11-19 8:16 ` Ingo Molnar 1 sibling, 0 replies; 16+ messages in thread From: Ingo Molnar @ 2015-11-19 8:16 UTC (permalink / raw) To: Chris Wilson Cc: Chuck Ebbert, linux-kernel, Andrew Morton, Tejun Heo, Andy Lutomirski, Rusty Russell, Peter Zijlstra, Masami Hiramatsu, Jiri Kosina, H. Peter Anvin, Steven Rostedt, Jason Baron, yrl.pp-manager.tt, Borislav Petkov, Daniel Vetter * Chris Wilson <chris@chris-wilson.co.uk> wrote: > > > A bisection pointed to > > > > > > commit ea8596bb2d8d37957f3e92db9511c50801689180 > > > Author: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> > > > Date: Thu Jul 18 20:47:53 2013 +0900 > > > > > > kprobes/x86: Remove unused text_poke_smp() and text_poke_smp_batch() functions > > > > > > of which the active ingredient was just > > > > > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > > > index b32ebf9..f4001e0 100644 > > > --- a/arch/x86/Kconfig > > > +++ b/arch/x86/Kconfig > > > @@ -2334,7 +2334,6 @@ config HAVE_ATOMIC_IOMAP > > > > > > config HAVE_TEXT_POKE_SMP > > > bool > > > - select STOP_MACHINE if SMP Ouch... This is certainly an educative example of how pure 'code removal' patches can have unintended side effects. Is there a full fix patch available, and is anyone pushing that to Linus? Thanks, Ingo ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: i915.ko WC writes are slow after ea8596bb2d8d379 2014-10-08 9:03 i915.ko WC writes are slow after ea8596bb2d8d379 Chris Wilson 2014-10-08 10:10 ` Chuck Ebbert @ 2014-10-08 17:47 ` Chuck Ebbert 2014-10-09 1:45 ` Masami Hiramatsu 1 sibling, 1 reply; 16+ messages in thread From: Chuck Ebbert @ 2014-10-08 17:47 UTC (permalink / raw) To: Chris Wilson Cc: linux-kernel, Masami Hiramatsu, Jiri Kosina, H. Peter Anvin, Steven Rostedt, Jason Baron, yrl.pp-manager.tt, Borislav Petkov, Ingo Molnar, Daniel Vetter On Wed, 8 Oct 2014 10:03:36 +0100 Chris Wilson <chris@chris-wilson.co.uk> wrote: > and adding that back into the current build, e.g. > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index 3632743..48a8a69 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -87,6 +87,7 @@ config X86 > select HAVE_USER_RETURN_NOTIFIER > select ARCH_BINFMT_ELF_RANDOMIZE_PIE > select HAVE_ARCH_JUMP_LABEL > + select STOP_MACHINE > select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE > select SPARSE_IRQ > select GENERIC_FIND_FIRST_BIT > > fixes the regression. > Looking closer at this, it seems most configs work by accident, because they have MOD_UNLOAD and/or HOTPLUG_CPU enabled. I take it you disabled both of those? stop_machine() is called from all kinds of places and almost none of them make sure STOP_MACHINE is selected. $ find -name Kconf\* | xargs grep STOP_MACHINE ./init/Kconfig:config STOP_MACHINE All these places use stop_machine(): mm/page_alloc.c, line 3886 drivers/xen/manage.c, line 130 drivers/char/hw_random/intel-rng.c, line 373 arch/powerpc/mm/numa.c: line 1616 line 1623 arch/powerpc/platforms/powernv/subcore.c, line 324 arch/arm/kernel/kprobes.c, line 165 arch/arm/kernel/patch.c: line 64 line 71 arch/s390/kernel/jump_label.c, line 61 arch/s390/kernel/kprobes.c: line 311 line 320 arch/s390/kernel/time.c: line 820 line 1590 arch/x86/kernel/cpu/mtrr/main.c, line 231 arch/arm64/kernel/insn.c, line 181 kernel/time/timekeeping.c, line 892 kernel/trace/ftrace.c, line 2219 kernel/module.c: line 770 line 1861 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: i915.ko WC writes are slow after ea8596bb2d8d379 2014-10-08 17:47 ` Chuck Ebbert @ 2014-10-09 1:45 ` Masami Hiramatsu 0 siblings, 0 replies; 16+ messages in thread From: Masami Hiramatsu @ 2014-10-09 1:45 UTC (permalink / raw) To: Chuck Ebbert Cc: Chris Wilson, linux-kernel, Jiri Kosina, H. Peter Anvin, Steven Rostedt, Jason Baron, yrl.pp-manager.tt, Borislav Petkov, Ingo Molnar, Daniel Vetter (2014/10/09 2:47), Chuck Ebbert wrote: > On Wed, 8 Oct 2014 10:03:36 +0100 > Chris Wilson <chris@chris-wilson.co.uk> wrote: > >> and adding that back into the current build, e.g. >> >> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig >> index 3632743..48a8a69 100644 >> --- a/arch/x86/Kconfig >> +++ b/arch/x86/Kconfig >> @@ -87,6 +87,7 @@ config X86 >> select HAVE_USER_RETURN_NOTIFIER >> select ARCH_BINFMT_ELF_RANDOMIZE_PIE >> select HAVE_ARCH_JUMP_LABEL >> + select STOP_MACHINE >> select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE >> select SPARSE_IRQ >> select GENERIC_FIND_FIRST_BIT >> >> fixes the regression. >> > > Looking closer at this, it seems most configs work by accident, > because they have MOD_UNLOAD and/or HOTPLUG_CPU enabled. I take it > you disabled both of those? stop_machine() is called from all kinds > of places and almost none of them make sure STOP_MACHINE is selected. I guess most of them expects stop_machine() is not a configurable feature... If some of them requires stop_machine(), it should enable it on its kconfig entry (including ftrace, kprobes). > $ find -name Kconf\* | xargs grep STOP_MACHINE > ./init/Kconfig:config STOP_MACHINE > > All these places use stop_machine(): > > mm/page_alloc.c, line 3886 > drivers/xen/manage.c, line 130 > drivers/char/hw_random/intel-rng.c, line 373 > arch/powerpc/mm/numa.c: > line 1616 > line 1623 > arch/powerpc/platforms/powernv/subcore.c, line 324 > arch/arm/kernel/kprobes.c, line 165 > arch/arm/kernel/patch.c: > line 64 > line 71 > arch/s390/kernel/jump_label.c, line 61 > arch/s390/kernel/kprobes.c: > line 311 > line 320 > arch/s390/kernel/time.c: > line 820 > line 1590 > arch/x86/kernel/cpu/mtrr/main.c, line 231 > arch/arm64/kernel/insn.c, line 181 > kernel/time/timekeeping.c, line 892 > kernel/trace/ftrace.c, line 2219 > kernel/module.c: > line 770 > line 1861 > BTW, as I sent a series of patches, the last two can be removed. https://lkml.org/lkml/2014/8/25/142 Thank you, -- Masami HIRAMATSU Software Platform Research Dept. Linux Technology Research Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: masami.hiramatsu.pt@hitachi.com ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2015-11-19 10:05 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-10-08 9:03 i915.ko WC writes are slow after ea8596bb2d8d379 Chris Wilson 2014-10-08 10:10 ` Chuck Ebbert 2014-10-08 19:49 ` Chris Wilson 2014-10-08 21:36 ` H. Peter Anvin 2014-10-09 6:53 ` Chris Wilson 2014-10-09 12:44 ` Chuck Ebbert 2014-10-09 13:00 ` Chris Wilson 2014-10-09 14:46 ` Chuck Ebbert 2014-10-09 15:14 ` Chris Wilson 2015-11-18 14:48 ` Chris Wilson 2015-11-18 15:57 ` Andy Lutomirski 2015-11-19 8:14 ` Ingo Molnar 2015-11-19 10:03 ` [PATCH] kernel: Remove stop_machine() Kconfig dependency Chris Wilson 2015-11-19 8:16 ` i915.ko WC writes are slow after ea8596bb2d8d379 Ingo Molnar 2014-10-08 17:47 ` Chuck Ebbert 2014-10-09 1:45 ` Masami Hiramatsu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox