LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH] KVM: PPC: Add generic hpte management functions
From: Avi Kivity @ 2010-06-28 13:30 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <4C28A2C1.50700@suse.de>

On 06/28/2010 04:25 PM, Alexander Graf wrote:
>>
>>>> Less and simpler code, better reporting through slabtop, less wastage
>>>> of partially allocated slab pages.
>>>>
>>>>          
>>> But it also means that one VM can spill the global slab cache and kill
>>> another VM's mm performance, no?
>>>
>>>        
>> What do you mean by spill?
>>      


Well?

>> btw, in the midst of the nit-picking frenzy I forgot to ask how the
>> individual hash chain lengths as well as the per-vm allocation were
>> limited.
>>
>> On x86 we have a per-vm limit and we allow the mm shrinker to reduce
>> shadow mmu data structures dynamically.
>>
>>      
> Very simple. I keep an int with the number of allocated entries around
> and if that hits a define'd threshold, I flush all shadow pages.
>    

A truly nefarious guest will make all ptes hash to the same chain, 
making some operations very long (O(n^2) in the x86 mmu, don't know 
about ppc) under a spinlock.  So we had to limit hash chains, not just 
the number of entries.

But your mmu is per-cpu, no?  In that case, no spinlock, and any damage 
the guest does is limited to itself.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH] KVM: PPC: Add generic hpte management functions
From: Alexander Graf @ 2010-06-28 13:25 UTC (permalink / raw)
  To: Avi Kivity; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <4C2872F5.20501@redhat.com>

Avi Kivity wrote:
> On 06/28/2010 12:55 PM, Alexander Graf wrote:
>> Avi Kivity wrote:
>>   
>>> On 06/28/2010 12:27 PM, Alexander Graf wrote:
>>>     
>>>>> Am I looking at old code?
>>>>>          
>>>>
>>>> Apparently. Check book3s_mmu_*.c
>>>>        
>>> I don't have that pattern.
>>>      
>> It's in this patch.
>>    
>
> Yes.  Silly me.
>
>>> +static void invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache
>>> *pte)
>>> +{
>>> +    dprintk_mmu("KVM: Flushing SPT: 0x%lx (0x%llx) ->  0x%llx\n",
>>> +            pte->pte.eaddr, pte->pte.vpage, pte->host_va);
>>> +
>>> +    /* Different for 32 and 64 bit */
>>> +    kvmppc_mmu_invalidate_pte(vcpu, pte);
>>> +
>>> +    if (pte->pte.may_write)
>>> +        kvm_release_pfn_dirty(pte->pfn);
>>> +    else
>>> +        kvm_release_pfn_clean(pte->pfn);
>>> +
>>> +    list_del(&pte->list_pte);
>>> +    list_del(&pte->list_vpte);
>>> +    list_del(&pte->list_vpte_long);
>>> +    list_del(&pte->list_all);
>>> +
>>> +    kmem_cache_free(vcpu->arch.hpte_cache, pte);
>>> +}
>>> +
>>>      
>
> (that's the old one with list_all - better check what's going on here)

Yeah, I just searched my inbox for the first patch. Obviously it was the
old version :(.

>
>
>>>>> (another difference is using struct hlist_head instead of list_head,
>>>>> which I recommend since it saves space)
>>>>>          
>>>> Hrm. I thought about this quite a bit before too, but that makes
>>>> invalidation more complicated, no? We always need to remember the
>>>> previous entry in a list.
>>>>        
>>> hlist_for_each_entry_safe() does that.
>>>      
>> Oh - very nice. So all I need to do is pass the previous list entry to
>> invalide_pte too and I'm good. I guess I'll give it a shot.
>>    
>
> No, just the for_each cursor.
>
>>> Less and simpler code, better reporting through slabtop, less wastage
>>> of partially allocated slab pages.
>>>      
>> But it also means that one VM can spill the global slab cache and kill
>> another VM's mm performance, no?
>>    
>
> What do you mean by spill?
>
> btw, in the midst of the nit-picking frenzy I forgot to ask how the
> individual hash chain lengths as well as the per-vm allocation were
> limited.
>
> On x86 we have a per-vm limit and we allow the mm shrinker to reduce
> shadow mmu data structures dynamically.
>

Very simple. I keep an int with the number of allocated entries around
and if that hits a define'd threshold, I flush all shadow pages.


Alex

^ permalink raw reply

* Fw:Re:Re: PCIe bus seems work while 'dma' can't under linux
From: jxnuxdy @ 2010-06-28  9:45 UTC (permalink / raw)
  To: benh, linuxppc-dev, Michal Simek, devicetree-discuss,
	microblaze-uclinux, Stephen Rothwell

[-- Attachment #1: Type: text/plain, Size: 7311 bytes --]

Thanks Benjamin, Is there anyone else fimilar the part as CPU 8544E, my PCIe regions for CPU bridge seems not correct, what may be the probably root cause? 

bash-2.04# cat /proc/pci
PCI devices found:
  Bus  0, device   0, function  0:
    Class 0b20  Header Type 01: PCI device 1957:0032 (rev 17).
  Bus  1, device   0, function  0:
    Class 0580  Header Type 00: PCI device 11ab:db10 (rev 1).
      Prefetchable 64 bit memory at 0x80000000 [0x800fffff].
      Prefetchable 64 bit memory at 0x84000000 [0x87ffffff].
bash-2.04#

,Regards
Denny

-------- Forwarding messages --------
From: jxnuxdy <jxnuxdy@163.com>
Date: 2010-06-15 15:05:56
To:  "Benjamin Herrenschmidt" <benh@kernel.crashing.org>
Cc:  linuxppc-dev@ozlabs.org,"Michal Simek" <monstr@monstr.eu>,devicetree-discuss@lists.ozlabs.org,microblaze-uclinux@itee.uq.edu.au,"Stephen Rothwell" <sfr@canb.auug.org.au>
Subject: Re:Re: PCIe bus seems work while 'dma' can't under linux
Thanks Benjamin, the regions don't display as what we expect, that's why we suspect if there any configuration probelms in CPU host bridge, but we changed the uboot/linux a lot, seems take no effect on that problems.

We use CPU MPC8544, and connect two PCIE devices to CPU PCIE1 and PCIE2 directly without a  extended PCIE bridge, so we disabled PCIE3 and PCI controlers in uboot level.

More settings pls take a look at the attach file log.txt.


Many thanks,
Denny
----------------------------------------------------------------------------------------------------------------------------
在2010-06-11 15:21:24，"Benjamin Herrenschmidt" <benh@kernel.crashing.org> 写道：
>On Fri, 2010-06-11 at 09:30 +0800, jxnuxdy wrote:
>> Hi guys,
>> 
>> I encountered a PCIe problem under linux, the two PCIe bus on my board seems work, 
>> at least I can access the registers through the PCIe bus, however the dma for the
>> PCIe bus can't work, so I just dumped the pci device, but I am curiously to find
>> there is no regions displayed on PCIe controlers, why? is it relate with my 'dma' issue then?
>
>It would help if you told us a bit more what the HW is, what you are
>doing with it, etc...
>
>IE. What is your host, what is your device, what platform, etc...
>
>DMA should work provided that your platform code sets it up properly.
>The DMA regions don't appear in /proc/pci or lspci.
>
>Cheers,
>Ben.
>
>> 
>> bash-2.04# cat /proc/pci
>> PCI devices found:
>>   Bus  0, device   0, function  0:
>>     Class 0b20  Header Type 01: PCI device 1957:0032 (rev 17).
>>   Bus  1, device   0, function  0:
>>     Class 0580  Header Type 00: PCI device 11ab:db90 (rev 1).
>>       Prefetchable 64 bit memory at 0x80000000 [0x800fffff].
>>       Prefetchable 64 bit memory at 0x84000000 [0x87ffffff].
>>   Bus  9, device   0, function  0:
>>     Class 0b20  Header Type 01: PCI device 1957:0032 (rev 17).
>>   Bus 10, device   0, function  0:
>>     Class 0580  Header Type 00: PCI device 11ab:db90 (rev 1).
>>       Prefetchable 64 bit memory at 0xa0000000 [0xa00fffff].
>>       Prefetchable 64 bit memory at 0xa4000000 [0xa7ffffff].
>> bash-2.04# lspci -vv
>> 00:00.0 Power PC: Unknown device 1957:0032 (rev 11)
>>         !!! Invalid class 0b20 for header type 01
>>         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
>>         Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
>>         Latency: 0, cache line size 08
>>         Bus: primary=00, secondary=01, subordinate=06, sec-latency=0
>>         I/O behind bridge: 00000000-00000fff
>>         Memory behind bridge: 80000000-9fffffff
>>         BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
>>         Capabilities: [44] Power Management version 2
>>                 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
>>                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-
>>         Capabilities: [4c] #10 [0041]
>> 
>> 01:00.0 Memory controller: Galileo Technology Ltd.: Unknown device db90 (rev 01)
>>         Subsystem: Galileo Technology Ltd.: Unknown device 11ab
>>         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
>>         Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
>>         Latency: 0, cache line size 08
>>         Interrupt: pin A routed to IRQ 0
>>         Region 0: Memory at 80000000 (64-bit, prefetchable) [size=1M]
>>         Region 2: Memory at 84000000 (64-bit, prefetchable) [size=64M]
>>         Capabilities: [40] Power Management version 2
>>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
>>                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-
>>         Capabilities: [50] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
>>                 Address: 0000000000000000  Data: 0000
>>         Capabilities: [60] #10 [0011]
>> 
>> 09:00.0 Power PC: Unknown device 1957:0032 (rev 11)
>>         !!! Invalid class 0b20 for header type 01
>>         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
>>         Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
>>         Latency: 0, cache line size 08
>>         Bus: primary=00, secondary=0a, subordinate=0f, sec-latency=0
>>         I/O behind bridge: 00000000-00000fff
>>         Memory behind bridge: a0000000-bfffffff
>>         BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
>>         Capabilities: [44] Power Management version 2
>>                 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
>>                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-
>>         Capabilities: [4c] #10 [0041]
>> 
>> 0a:00.0 Memory controller: Galileo Technology Ltd.: Unknown device db90 (rev 01)
>>         Subsystem: Galileo Technology Ltd.: Unknown device 11ab
>>         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
>>         Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
>>         Latency: 0, cache line size 08
>>         Interrupt: pin A routed to IRQ 0
>>         Region 0: Memory at a0000000 (64-bit, prefetchable) [size=1M]
>>         Region 2: Memory at a4000000 (64-bit, prefetchable) [size=64M]
>>         Capabilities: [40] Power Management version 2
>>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
>>                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-
>>         Capabilities: [50] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
>>                 Address: 0000000000000000  Data: 0000
>>         Capabilities: [60] #10 [0011]
>> 
>> bash-2.04# 
>> 
>> Thanks
>> Denny
>> 
>> _______________________________________________
>> Linuxppc-dev mailing list
>> Linuxppc-dev@lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/linuxppc-dev
>
>
>_______________________________________________
>Linuxppc-dev mailing list
>Linuxppc-dev@lists.ozlabs.org
>https://lists.ozlabs.org/listinfo/linuxppc-dev

[-- Attachment #2: log.txt --]
[-- Type: text/plain, Size: 63314 bytes --]


U-Boot 1.1.3 (Jun 14 2010 - 19:26:53)

CPU:   8544_E, Version: 1.1, (0x803c0111)
Core:  E500, Version: 2.2, (0x80210022)
Clock Configuration:
       CPU: 800 MHz, CCB: 400 MHz,
       DDR: 200 MHz, LBC:  50 MHz
L1:    D-cache 32 kB enabled
       I-cache 32 kB enabled
Board: C48
CPU Board Revision 0.0 (0x0000)
    PCI2: disabled
I2C:   ready
DRAM:  Initializing DDRSDRAM 
memsize = 200 
    DDR: 512 MB
POST RAM test disabled.
Now running in RAM - U-Boot at: 1fb7c000
trap_init : 0x0
system inventory subsystem initialized 
FLASH: 64 MB
L2 cache 256KB: enabled
PCI:
               Scanning PCI bus 00
        01  00  11ab  db90  0580  00
               Scanning PCI bus 02
        03  00  11ab  db90  0580  00
In:    serial
Out:   serial
Err:   serial
Net:   
ENET0: PHY is Marvell 88E1112 (1410c97)

set_bootstatus: BS_LOAD_OS, platform_idx = 11 

Hit ESC to stop autoboot:  0 
## Booting image at fb400000 ...
   Image Name:   Linux-2.6.14.2
   Image Type:   PowerPC Linux Multi-File Image (gzip compressed)
   Data Size:    2506379 Bytes =  2.4 MB
   Load Address: 00000000
   Entry Point:  00000000
   Contents:
   Image 0:  1429862 Bytes =  1.4 MB
   Image 1:  1076503 Bytes =  1 MB
   Uncompressing Multi-File Image ... ## Current stack ends at 0x1FB5ABD0 => set upper limit to 0x00800000
## initrd at 0xFB55D1B4 ... 0xFB663ECA (len=1076503=0x106D17)
   Loading Ramdisk to 1fa53000, end 1fb59d17 ... OK
 initrd_start = 1fa53000, initrd_end = 1fb59d17 
## Transferring control to Linux (at address 00000000) ...
Memory CAM mapping: CAM0=256Mb, CAM1=256Mb, CAM2=0Mb residual: 0Mb
tlbcam_index=2
Linux version 2.6.14.2 (dxiao@blc-10-6) (gcc version 3.4.6) #25 Fri Jun 11 17:59:39 PDT 2010
silkworm85xx_setup_arch
mpc85xx_setup: Doing Pcie bridge setup
Scanning PcieBus...
cpld_init: platform (101) not supported
Brocade Silkworm port (C) 2006 Brocade Communications Systems, Inc.
  DMA zone: 131072 pages, LIFO batch:31
  Normal zone: 0 pages, LIFO batch:1
  HighMem zone: 0 pages, LIFO batch:1
Built 1 zonelists
Kernel command line: ip=off console=ttyS1,9600 noinitrd rootfstype=jffs2 root=/dev/mtdblock1 rw
OpenPIC Version 1.2 (1 CPUs and 60 IRQ sources) at fafb9000
PID hash table entries: 4096 (order: 12, 65536 bytes)
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 514944k available (2280k kernel code, 668k data, 144k init, 0k highmem)
Mount-cache hash table entries: 512
checking if image is initramfs...it isn't (no cpio magic); looks like an initrd
Freeing initrd memory: 1051k freed
softlockup thread 0 started up.
NET: Registered protocol family 16
PCI:: Probing PCI hardware
PCI: 0000:00:00.0: class b20 doesn't match header type 01. Ignoring class.
PCI: 0001:02:00.0: class b20 doesn't match header type 01. Ignoring class.
Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
JFFS2 version 2.2. (NAND) (C) 2001-2003 Red Hat, Inc.
Initializing Cryptographic API
Generic RTC Driver v1.07
SWBD Platform Driver v1.0: [type 101, rev 2].
Config Silkworm 
PowerPC Book-E Watchdog Timer Loaded
Serial: 8250/16550 driver $Revision: 1.90 $ 2 ports, IRQ sharing disabled
ttyS0 at MMIO 0xe0004600 (irq = 26) is a 16550A
ttyS1 at MMIO 0xe0004500 (irq = 26) is a 16550A
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered
RAMDISK driver initialized: 16 RAM disks of 32768K size 1024 blocksize
loop: loaded (max 8 devices)
eth0: Gianfar Ethernet Controller Version 1.1, 00:e0:0c:00:00:fd 
eth0: Running with NAPI enabled
eth0: 256/256 RX/TX BD ring size
mtdchar: write-caching enabled
silkworm: Using SWBD101  flash configuration
Boot flash: Found 1 x16 devices at 0x0 in 16-bit bank
 Amd/Fujitsu Extended Query Table at 0x0040
Boot flash: CFI does not contain boot bank location. Assuming top.
number of CFI chips: 1
cfi_cmdset_0002: Disabling erase-suspend-program due to code brokenness.
Creating 6 MTD partitions on "Boot flash":
0x00000000-0x00800000 : "unused (mtd0)"
0x00800000-0x03400000 : "filesys: kernel and initrd (mtd1)"
0x03400000-0x03c00000 : "kernel: kernel and initrd (mtd2)"
0x03c00000-0x03c40000 : "bootenv0: boot environment (mtd3)"
0x03c40000-0x03c80000 : "bootenv1: boot environment (mtd4)"
0x03c80000-0x04000000 : "bootimage (mtd5)"
i2c /dev entries driver
MPC adapter: Platform type [101],  Did not register I2C multiplexor callback.
MPC adapter: Platform type [101],  Did not register I2C multiplexor callback.
NET: Registered protocol family 2
IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
TCP established hash table entries: 131072 (order: 7, 524288 bytes)
TCP bind hash table entries: 65536 (order: 6, 262144 bytes)
TCP: Hash tables configured (established 131072 bind 65536)
TCP reno registered
ip_conntrack version 2.3 (4096 buckets, 32768 max) - 216 bytes per conntrack
ip_tables: (C) 2000-2002 Netfilter core team
arp_tables: (C) 2002 David S. Miller
TCP bic registered
Initializing IPsec netlink socket
NET: Registered protocol family 1
NET: Registered protocol family 10
IPv6 over IPv4 tunneling driver
ip6_tables: (C) 2000-2002 Netfilter core team
NET: Registered protocol family 17
NET: Registered protocol family 15
VFS: Mounted root (jffs2 filesystem).
Freeing unused kernel memory: 144k init
INIT: version 2.78 booting
INIT: Entering runlevel: 2
Installing dpci dpci_switch_module: module license 'unspecified' taints kernel.
switch module
Installing dgen module
eth0: PHY is Generic MII (ffffffff)
/etc/rc.d/rc2.d/S30diags: /nfabos/bin/diagburnin.sh: No such file or directory
bash-2.04# eth0: Full Duplex
eth0: Speed 100BT
eth0: Link is up

bash-2.04# nfdiag
SWBD: modelId 0x0 extId 0x4e
CHOW48 platform.
slot: 0, bus: 1, dev: 0, size: 4194304, vAddr = 0x0, dmaAddr = 0x0 pciFd 0x4
DMA CPU Address 0xdf000000  PCI Address 0x9f000000 for slot 0 cheetah3 0
slot: 0, bus: 3, dev: 0, size: 4194304, vAddr = 0x0, dmaAddr = 0x0 pciFd 0x4
DMA CPU Address 0xdec00000  PCI Address 0x9ec00000 for slot 0 cheetah3 1

main (lc0)> pci -o read -s 0 -b 0 -u 0 -a 0 -l 0x128


00     BusNo 0     DevNo 0

00000000 1957 0032 0006 0010 0011 0b20 0008 0001              .W.2............
00000010 0000 0000 0000 0000 0100 0006 0000 0000              ................
00000020 8000 9ff0 1001 0001 0000 0000 0000 0000              ................
00000030 0000 0000 0044 0000 0000 0000 0000 0000              .....D..........
00000040 0000 0000 4c01 fe02 0000 0000 0010 0041              ....L..........A
00000050 0001 0000 2810 0000 d481 0003 0008 0011              ....(...........
00000060 07c0 0000 03c0 0040 0000 0000 0000 0000              .......@........
00000070 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000080 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000090 0000 0000 0000 0000 0000 0000 0000 0000              ................
000000A0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000000B0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000000C0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000000D0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000000E0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000000F0 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000100 0001 0001 0000 0000 0000 0000 2010 0006              ................
00000110 0000 0000 0000 0000 00a0 0000 0000 0000              ................
00000120 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000130 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000140 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000150 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000160 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000170 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000180 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000190 0000 0000 0000 0000 0000 0000 0000 0000              ................
000001A0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000001B0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000001C0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000001D0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000001E0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000001F0 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000200 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000210 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000220 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000230 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000240 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000250 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000260 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000270 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000280 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000290 0000 0000 0000 0000 0000 0000 0000 0000              ................
000002A0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000002B0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000002C0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000002D0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000002E0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000002F0 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000300 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000310 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000320 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000330 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000340 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000350 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000360 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000370 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000380 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000390 0000 0000 0000 0000 0000 0000 0000 0000              ................
000003A0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000003B0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000003C0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000003D0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000003E0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000003F0 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000400 0000 0000 0016 0000 04e2 0000 0000 0000              ................
00000410 0004 0000 0001 0000 0000 0000 4040 0000              ............@@..
00000420 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000430 0000 0000 0000 0000 8121 009e b04c 0004              .........!...L..
00000440 0010 0000 0000 0000 0000 0000 0000 0000              ................
00000450 d7ce 0014 1e20 01fc 0000 0000 0c5c 0000              .............\..
00000460 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000470 1957 0032 0011 0b20 0000 0000 0001 0000              .W.2............
00000480 3d48 0000 0000 0000 07f0 0000 0000 0000              =H..............
00000490 07c0 0000 0000 0000 0000 0000 0000 0000              ................


main (lc0)> pci -o read -u 0 -s 0 -b 1 -a 0x0 -l 0xff
 

00     BusNo 1     DevNo 0

00000000 11ab db90 0006 0010 0001 0580 0008 0000              ................
00000010 000c 8000 0000 0000 000c 8400 0000 0000              ................
00000020 0000 0000 0000 0000 0000 0000 11ab 11ab              ................
00000030 0000 0000 0040 0000 0000 0000 0100 0000              .....@..........
00000040 5001 0002 0000 0000 0000 0000 0000 0000              P...............
00000050 6005 0080 0000 0000 0000 0000 0000 0000              `...............
00000060 0010 0011 0080 003c 2000 0000 a411 0003              .......<........
00000070 0008 1011 0000 0000 0000 0000 0000 0000              ................
00000080 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000090 0000 0000 0000 0000 0000 0000 0000 0000              ................
000000A0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000000B0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000000C0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000000D0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000000E0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000000F0 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000100 0001 0001 0000 0000 0000 0000 0010 0006              ................
00000110 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000120 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000130 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000140 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000150 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000160 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000170 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000180 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000190 0000 0000 0000 0000 0000 0000 0000 0000              ................
000001A0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000001B0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000001C0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000001D0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000001E0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000001F0 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000200 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000210 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000220 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000230 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000240 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000250 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000260 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000270 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000280 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000290 0000 0000 0000 0000 0000 0000 0000 0000              ................
000002A0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000002B0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000002C0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000002D0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000002E0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000002F0 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000300 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000310 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000320 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000330 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000340 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000350 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000360 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000370 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000380 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000390 0000 0000 0000 0000 0000 0000 0000 0000              ................
000003A0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000003B0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000003C0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000003D0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000003E0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000003F0 0000 0000 0000 0000 0000 0000 

main (lc0)> pci -o read -s 0 -u 0 -a 0x0 -b 2 -l 0x40


00     BusNo 2     DevNo 0

00000000 1957 0032 0006 0010 0011 0b20 0008 0001              .W.2............
00000010 0000 0000 0000 0000 0300 000f 0000 0000              ................
00000020 a000 bff0 1001 0001 0000 0000 0000 0000              ................
00000030 0000 0000 0044 0000 0000 0000 0000 0000              .....D..........
00000040 0000 0000 4c01 fe02 0000 0000 0010 0041              ....L..........A
00000050 0001 0000 2810 0000 d481 0003 0008 0011              ....(...........
00000060 07c0 0000 03c0 0040 0000 0000 0000 0000              .......@........
00000070 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000080 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000090 0000 0000 0000 0000 0000 0000 0000 0000              ................
000000A0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000000B0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000000C0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000000D0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000000E0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000000F0 0000 0000 0000 0000 0000 0000 0000 0000              ................


main (lc0)> pci -o read -s 0 -u 0 -a 0x0 -b 3 -l 0x40


00     BusNo 3     DevNo 0

00000000 11ab db90 0006 0010 0001 0580 0008 0000              ................
00000010 000c a000 0000 0000 000c a400 0000 0000              ................
00000020 0000 0000 0000 0000 0000 0000 11ab 11ab              ................
00000030 0000 0000 0040 0000 0000 0000 0100 0000              .....@..........
00000040 5001 0002 0000 0000 0000 0000 0000 0000              P...............
00000050 6005 0080 0000 0000 0000 0000 0000 0000              `...............
00000060 0010 0011 0080 003c 2000 0000 a411 0003              .......<........
00000070 0008 1011 0000 0000 0000 0000 0000 0000              ................
00000080 0000 0000 0000 0000 0000 0000 0000 0000              ................
00000090 0000 0000 0000 0000 0000 0000 0000 0000              ................
000000A0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000000B0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000000C0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000000D0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000000E0 0000 0000 0000 0000 0000 0000 0000 0000              ................
000000F0 0000 0000 0000 0000 0000 0000 0000 0000              ................


main (lc0)> mem -o read -a 0xe000a000 -l 0x400

E000A000 $830100f8 $00000000 $00000000 $0010ffff                  ................
E000A010 $0400ffff $00000028 $00000000 $00000000                  .......(........
E000A020 $00000000 $00000000 $00000000 $00000000                  ................
E000A030 $00000000 $00000000 $00000000 $00000000                  ................
E000A040 $00000000 $00000000 $00000000 $00000000                  ................
E000A050 $00000000 $00000000 $00000000 $00000000                  ................
E000A060 $00000000 $00000000 $00000000 $00000000                  ................
E000A070 $00000000 $00000000 $00000000 $00000000                  ................
E000A080 $00000000 $00000000 $00000000 $00000000                  ................
E000A090 $00000000 $00000000 $00000000 $00000000                  ................
E000A0A0 $00000000 $00000000 $00000000 $00000000                  ................
E000A0B0 $00000000 $00000000 $00000000 $00000000                  ................
E000A0C0 $00000000 $00000000 $00000000 $00000000                  ................
E000A0D0 $00000000 $00000000 $00000000 $00000000                  ................
E000A0E0 $00000000 $00000000 $00000000 $00000000                  ................
E000A0F0 $00000000 $00000000 $00000000 $00000000                  ................
E000A100 $00000000 $00000000 $00000000 $00000000                  ................
E000A110 $00000000 $00000000 $00000000 $00000000                  ................
E000A120 $00000000 $00000000 $00000000 $00000000                  ................
E000A130 $00000000 $00000000 $00000000 $00000000                  ................
E000A140 $00000000 $00000000 $00000000 $00000000                  ................
E000A150 $00000000 $00000000 $00000000 $00000000                  ................
E000A160 $00000000 $00000000 $00000000 $00000000                  ................
E000A170 $00000000 $00000000 $00000000 $00000000                  ................
E000A180 $00000000 $00000000 $00000000 $00000000                  ................
E000A190 $00000000 $00000000 $00000000 $00000000                  ................
E000A1A0 $00000000 $00000000 $00000000 $00000000                  ................
E000A1B0 $00000000 $00000000 $00000000 $00000000                  ................
E000A1C0 $00000000 $00000000 $00000000 $00000000                  ................
E000A1D0 $00000000 $00000000 $00000000 $00000000                  ................
E000A1E0 $00000000 $00000000 $00000000 $00000000                  ................
E000A1F0 $00000000 $00000000 $00000000 $00000000                  ................
E000A200 $00000000 $00000000 $00000000 $00000000                  ................
E000A210 $00000000 $00000000 $00000000 $00000000                  ................
E000A220 $00000000 $00000000 $00000000 $00000000                  ................
E000A230 $00000000 $00000000 $00000000 $00000000                  ................
E000A240 $00000000 $00000000 $00000000 $00000000                  ................
E000A250 $00000000 $00000000 $00000000 $00000000                  ................
E000A260 $00000000 $00000000 $00000000 $00000000                  ................
E000A270 $00000000 $00000000 $00000000 $00000000                  ................
E000A280 $00000000 $00000000 $00000000 $00000000                  ................
E000A290 $00000000 $00000000 $00000000 $00000000                  ................
E000A2A0 $00000000 $00000000 $00000000 $00000000                  ................
E000A2B0 $00000000 $00000000 $00000000 $00000000                  ................
E000A2C0 $00000000 $00000000 $00000000 $00000000                  ................
E000A2D0 $00000000 $00000000 $00000000 $00000000                  ................
E000A2E0 $00000000 $00000000 $00000000 $00000000                  ................
E000A2F0 $00000000 $00000000 $00000000 $00000000                  ................
E000A300 $00000000 $00000000 $00000000 $00000000                  ................
E000A310 $00000000 $00000000 $00000000 $00000000                  ................
E000A320 $00000000 $00000000 $00000000 $00000000                  ................
E000A330 $00000000 $00000000 $00000000 $00000000                  ................
E000A340 $00000000 $00000000 $00000000 $00000000                  ................
E000A350 $00000000 $00000000 $00000000 $00000000                  ................
E000A360 $00000000 $00000000 $00000000 $00000000                  ................
E000A370 $00000000 $00000000 $00000000 $00000000                  ................
E000A380 $00000000 $00000000 $00000000 $00000000                  ................
E000A390 $00000000 $00000000 $00000000 $00000000                  ................
E000A3A0 $00000000 $00000000 $00000000 $00000000                  ................
E000A3B0 $00000000 $00000000 $00000000 $00000000                  ................
E000A3C0 $00000000 $00000000 $00000000 $00000000                  ................
E000A3D0 $00000000 $00000000 $00000000 $00000000                  ................
E000A3E0 $00000000 $00000000 $00000000 $00000000                  ................
E000A3F0 $00000000 $00000000 $00000000 $00000000                  ................
E000A400 $00000000 $00000000 $00000000 $00000000                  ................
E000A410 $00000000 $00000000 $00000000 $00000000                  ................
E000A420 $00000000 $00000000 $00000000 $00000000                  ................
E000A430 $00000000 $00000000 $00000000 $00000000                  ................
E000A440 $00000000 $00000000 $00000000 $00000000                  ................
E000A450 $00000000 $00000000 $00000000 $00000000                  ................
E000A460 $00000000 $00000000 $00000000 $00000000                  ................
E000A470 $00000000 $00000000 $00000000 $00000000                  ................
E000A480 $00000000 $00000000 $00000000 $00000000                  ................
E000A490 $00000000 $00000000 $00000000 $00000000                  ................
E000A4A0 $00000000 $00000000 $00000000 $00000000                  ................
E000A4B0 $00000000 $00000000 $00000000 $00000000                  ................
E000A4C0 $00000000 $00000000 $00000000 $00000000                  ................
E000A4D0 $00000000 $00000000 $00000000 $00000000                  ................
E000A4E0 $00000000 $00000000 $00000000 $00000000                  ................
E000A4F0 $00000000 $00000000 $00000000 $00000000                  ................
E000A500 $00000000 $00000000 $00000000 $00000000                  ................
E000A510 $00000000 $00000000 $00000000 $00000000                  ................
E000A520 $00000000 $00000000 $00000000 $00000000                  ................
E000A530 $00000000 $00000000 $00000000 $00000000                  ................
E000A540 $00000000 $00000000 $00000000 $00000000                  ................
E000A550 $00000000 $00000000 $00000000 $00000000                  ................
E000A560 $00000000 $00000000 $00000000 $00000000                  ................
E000A570 $00000000 $00000000 $00000000 $00000000                  ................
E000A580 $00000000 $00000000 $00000000 $00000000                  ................
E000A590 $00000000 $00000000 $00000000 $00000000                  ................
E000A5A0 $00000000 $00000000 $00000000 $00000000                  ................
E000A5B0 $00000000 $00000000 $00000000 $00000000                  ................
E000A5C0 $00000000 $00000000 $00000000 $00000000                  ................
E000A5D0 $00000000 $00000000 $00000000 $00000000                  ................
E000A5E0 $00000000 $00000000 $00000000 $00000000                  ................
E000A5F0 $00000000 $00000000 $00000000 $00000000                  ................
E000A600 $00000000 $00000000 $00000000 $00000000                  ................
E000A610 $00000000 $00000000 $00000000 $00000000                  ................
E000A620 $00000000 $00000000 $00000000 $00000000                  ................
E000A630 $00000000 $00000000 $00000000 $00000000                  ................
E000A640 $00000000 $00000000 $00000000 $00000000                  ................
E000A650 $00000000 $00000000 $00000000 $00000000                  ................
E000A660 $00000000 $00000000 $00000000 $00000000                  ................
E000A670 $00000000 $00000000 $00000000 $00000000                  ................
E000A680 $00000000 $00000000 $00000000 $00000000                  ................
E000A690 $00000000 $00000000 $00000000 $00000000                  ................
E000A6A0 $00000000 $00000000 $00000000 $00000000                  ................
E000A6B0 $00000000 $00000000 $00000000 $00000000                  ................
E000A6C0 $00000000 $00000000 $00000000 $00000000                  ................
E000A6D0 $00000000 $00000000 $00000000 $00000000                  ................
E000A6E0 $00000000 $00000000 $00000000 $00000000                  ................
E000A6F0 $00000000 $00000000 $00000000 $00000000                  ................
E000A700 $00000000 $00000000 $00000000 $00000000                  ................
E000A710 $00000000 $00000000 $00000000 $00000000                  ................
E000A720 $00000000 $00000000 $00000000 $00000000                  ................
E000A730 $00000000 $00000000 $00000000 $00000000                  ................
E000A740 $00000000 $00000000 $00000000 $00000000                  ................
E000A750 $00000000 $00000000 $00000000 $00000000                  ................
E000A760 $00000000 $00000000 $00000000 $00000000                  ................
E000A770 $00000000 $00000000 $00000000 $00000000                  ................
E000A780 $00000000 $00000000 $00000000 $00000000                  ................
E000A790 $00000000 $00000000 $00000000 $00000000                  ................
E000A7A0 $00000000 $00000000 $00000000 $00000000                  ................
E000A7B0 $00000000 $00000000 $00000000 $00000000                  ................
E000A7C0 $00000000 $00000000 $00000000 $00000000                  ................
E000A7D0 $00000000 $00000000 $00000000 $00000000                  ................
E000A7E0 $00000000 $00000000 $00000000 $00000000                  ................
E000A7F0 $00000000 $00000000 $00000000 $00000000                  ................
E000A800 $00000000 $00000000 $00000000 $00000000                  ................
E000A810 $00000000 $00000000 $00000000 $00000000                  ................
E000A820 $00000000 $00000000 $00000000 $00000000                  ................
E000A830 $00000000 $00000000 $00000000 $00000000                  ................
E000A840 $00000000 $00000000 $00000000 $00000000                  ................
E000A850 $00000000 $00000000 $00000000 $00000000                  ................
E000A860 $00000000 $00000000 $00000000 $00000000                  ................
E000A870 $00000000 $00000000 $00000000 $00000000                  ................
E000A880 $00000000 $00000000 $00000000 $00000000                  ................
E000A890 $00000000 $00000000 $00000000 $00000000                  ................
E000A8A0 $00000000 $00000000 $00000000 $00000000                  ................
E000A8B0 $00000000 $00000000 $00000000 $00000000                  ................
E000A8C0 $00000000 $00000000 $00000000 $00000000                  ................
E000A8D0 $00000000 $00000000 $00000000 $00000000                  ................
E000A8E0 $00000000 $00000000 $00000000 $00000000                  ................
E000A8F0 $00000000 $00000000 $00000000 $00000000                  ................
E000A900 $00000000 $00000000 $00000000 $00000000                  ................
E000A910 $00000000 $00000000 $00000000 $00000000                  ................
E000A920 $00000000 $00000000 $00000000 $00000000                  ................
E000A930 $00000000 $00000000 $00000000 $00000000                  ................
E000A940 $00000000 $00000000 $00000000 $00000000                  ................
E000A950 $00000000 $00000000 $00000000 $00000000                  ................
E000A960 $00000000 $00000000 $00000000 $00000000                  ................
E000A970 $00000000 $00000000 $00000000 $00000000                  ................
E000A980 $00000000 $00000000 $00000000 $00000000                  ................
E000A990 $00000000 $00000000 $00000000 $00000000                  ................
E000A9A0 $00000000 $00000000 $00000000 $00000000                  ................
E000A9B0 $00000000 $00000000 $00000000 $00000000                  ................
E000A9C0 $00000000 $00000000 $00000000 $00000000                  ................
E000A9D0 $00000000 $00000000 $00000000 $00000000                  ................
E000A9E0 $00000000 $00000000 $00000000 $00000000                  ................
E000A9F0 $00000000 $00000000 $00000000 $00000000                  ................
E000AA00 $00000000 $00000000 $00000000 $00000000                  ................
E000AA10 $00000000 $00000000 $00000000 $00000000                  ................
E000AA20 $00000000 $00000000 $00000000 $00000000                  ................
E000AA30 $00000000 $00000000 $00000000 $00000000                  ................
E000AA40 $00000000 $00000000 $00000000 $00000000                  ................
E000AA50 $00000000 $00000000 $00000000 $00000000                  ................
E000AA60 $00000000 $00000000 $00000000 $00000000                  ................
E000AA70 $00000000 $00000000 $00000000 $00000000                  ................
E000AA80 $00000000 $00000000 $00000000 $00000000                  ................
E000AA90 $00000000 $00000000 $00000000 $00000000                  ................
E000AAA0 $00000000 $00000000 $00000000 $00000000                  ................
E000AAB0 $00000000 $00000000 $00000000 $00000000                  ................
E000AAC0 $00000000 $00000000 $00000000 $00000000                  ................
E000AAD0 $00000000 $00000000 $00000000 $00000000                  ................
E000AAE0 $00000000 $00000000 $00000000 $00000000                  ................
E000AAF0 $00000000 $00000000 $00000000 $00000000                  ................
E000AB00 $00000000 $00000000 $00000000 $00000000                  ................
E000AB10 $00000000 $00000000 $00000000 $00000000                  ................
E000AB20 $00000000 $00000000 $00000000 $00000000                  ................
E000AB30 $00000000 $00000000 $00000000 $00000000                  ................
E000AB40 $00000000 $00000000 $00000000 $00000000                  ................
E000AB50 $00000000 $00000000 $00000000 $00000000                  ................
E000AB60 $00000000 $00000000 $00000000 $00000000                  ................
E000AB70 $00000000 $00000000 $00000000 $00000000                  ................
E000AB80 $00000000 $00000000 $00000000 $00000000                  ................
E000AB90 $00000000 $00000000 $00000000 $00000000                  ................
E000ABA0 $00000000 $00000000 $00000000 $00000000                  ................
E000ABB0 $00000000 $00000000 $00000000 $00000000                  ................
E000ABC0 $00000000 $00000000 $00000000 $00000000                  ................
E000ABD0 $00000000 $00000000 $00000000 $00000000                  ................
E000ABE0 $00000000 $00000000 $00000000 $00000000                  ................
E000ABF0 $00000000 $00000000 $02080100 $00000000                  ................
E000AC00 $00000000 $00000000 $00000000 $00000000                  ................
E000AC10 $80044023 $00000000 $00000000 $00000000                  ..@#............
E000AC20 $00080000 $00000000 $00080000 $00000000                  ................
E000AC30 $8004401c $00000000 $00000000 $00000000                  ..@.............
E000AC40 $00000000 $00000000 $000e3000 $00000000                  ..........0.....
E000AC50 $80088016 $00000000 $00000000 $00000000                  ................
E000AC60 $00000000 $00000000 $00000000 $00000000                  ................
E000AC70 $00044023 $00000000 $00000000 $00000000                  ..@#............
E000AC80 $00000000 $00000000 $00000000 $00000000                  ................
E000AC90 $00044023 $00000000 $00000000 $00000000                  ..@#............
E000ACA0 $00000000 $00000000 $00000000 $00000000                  ................
E000ACB0 $00000000 $00000000 $00000000 $00000000                  ................
E000ACC0 $00000000 $00000000 $00000000 $00000000                  ................
E000ACD0 $00000000 $00000000 $00000000 $00000000                  ................
E000ACE0 $00000000 $00000000 $00000000 $00000000                  ................
E000ACF0 $00000000 $00000000 $00000000 $00000000                  ................
E000AD00 $00000000 $00000000 $00000000 $00000000                  ................
E000AD10 $00000000 $00000000 $00000000 $00000000                  ................
E000AD20 $00000000 $00000000 $00000000 $00000000                  ................
E000AD30 $00000000 $00000000 $00000000 $00000000                  ................
E000AD40 $00000000 $00000000 $00000000 $00000000                  ................
E000AD50 $00000000 $00000000 $00000000 $00000000                  ................
E000AD60 $00000000 $00000000 $00000000 $00000000                  ................
E000AD70 $00000000 $00000000 $00000000 $00000000                  ................
E000AD80 $00000000 $00000000 $00000000 $00000000                  ................
E000AD90 $00000000 $00000000 $00000000 $00000000                  ................
E000ADA0 $00000000 $00000000 $00000000 $00000000                  ................
E000ADB0 $a0f5501e $00000000 $00000000 $00000000                  ..P.............
E000ADC0 $00000000 $00000000 $00000000 $00000000                  ................
E000ADD0 $20f44023 $00000000 $00000000 $00000000                  ..@#............
E000ADE0 $00000000 $00000000 $00000000 $00000000                  ................
E000ADF0 $20f44023 $00000000 $00000000 $00000000                  ..@#............
E000AE00 $80020000 $00000000 $00bdfe00 $00000000                  ................
E000AE10 $00000000 $00000000 $00000000 $00000000                  ................
E000AE20 $00000041 $00000000 $00000800 $00000000                  ...A............
E000AE30 $00000000 $00000000 $00000000 $00000000                  ................
E000AE40 $00000000 $00000000 $00000000 $00000000                  ................
E000AE50 $00000000 $00000000 $00000000 $00000000                  ................
E000AE60 $00000000 $00000000 $00000000 $00000000                  ................
E000AE70 $00000000 $00000000 $00000000 $00000000                  ................
E000AE80 $00000000 $00000000 $00000000 $00000000                  ................
E000AE90 $00000000 $00000000 $00000000 $00000000                  ................
E000AEA0 $00000000 $00000000 $00000000 $00000000                  ................
E000AEB0 $00000000 $00000000 $00000000 $00000000                  ................
E000AEC0 $00000000 $00000000 $00000000 $00000000                  ................
E000AED0 $00000000 $00000000 $00000000 $00000000                  ................
E000AEE0 $00000000 $00000000 $00000000 $00000000                  ................
E000AEF0 $00000000 $00000000 $00000000 $00000000                  ................
E000AF00 $80400080 $00000000 $00000000 $00000000                  .@..............
E000AF10 $c8800000 $a0000000 $00000000 $00000000                  ................
E000AF20 $00008000 $00000000 $00000000 $00000000                  ................
E000AF30 $00000000 $00000000 $00000000 $00000000                  ................
E000AF40 $00000000 $00000000 $00000000 $00000000                  ................
E000AF50 $00000000 $00000000 $00000000 $00000000                  ................
E000AF60 $00000000 $00000000 $00000000 $00000000                  ................
E000AF70 $00000000 $00000000 $00000000 $00000000                  ................
E000AF80 $00000000 $00000000 $00000000 $00000000                  ................
E000AF90 $00000000 $00000000 $00000000 $00000000                  ................
E000AFA0 $00000000 $00000000 $00000000 $00000000                  ................
E000AFB0 $00000000 $00000000 $00000000 $00000000                  ................
E000AFC0 $00000000 $00000000 $00000000 $00000000                  ................
E000AFD0 $00000000 $00000000 $00000000 $00000000                  ................
E000AFE0 $00000000 $00000000 $00000000 $00000000                  ................
E000AFF0 $00000000 $00000000 $00000000 $00000000                  ................


main (lc0)> mem -o read -a 0xe0009000 0x400

Invalid usage

main (lc0)> mem -o read -a 0xe0009000 -l 0x400

E0009000 $800300fc $00000000 $00000000 $0010ffff                  ................
E0009010 $0400ffff $00000028 $00000000 $00000000                  .......(........
E0009020 $00000000 $00000000 $00000000 $00000000                  ................
E0009030 $00000000 $00000000 $00000000 $00000000                  ................
E0009040 $00000000 $00000000 $00000000 $00000000                  ................
E0009050 $00000000 $00000000 $00000000 $00000000                  ................
E0009060 $00000000 $00000000 $00000000 $00000000                  ................
E0009070 $00000000 $00000000 $00000000 $00000000                  ................
E0009080 $00000000 $00000000 $00000000 $00000000                  ................
E0009090 $00000000 $00000000 $00000000 $00000000                  ................
E00090A0 $00000000 $00000000 $00000000 $00000000                  ................
E00090B0 $00000000 $00000000 $00000000 $00000000                  ................
E00090C0 $00000000 $00000000 $00000000 $00000000                  ................
E00090D0 $00000000 $00000000 $00000000 $00000000                  ................
E00090E0 $00000000 $00000000 $00000000 $00000000                  ................
E00090F0 $00000000 $00000000 $00000000 $00000000                  ................
E0009100 $00000000 $00000000 $00000000 $00000000                  ................
E0009110 $00000000 $00000000 $00000000 $00000000                  ................
E0009120 $00000000 $00000000 $00000000 $00000000                  ................
E0009130 $00000000 $00000000 $00000000 $00000000                  ................
E0009140 $00000000 $00000000 $00000000 $00000000                  ................
E0009150 $00000000 $00000000 $00000000 $00000000                  ................
E0009160 $00000000 $00000000 $00000000 $00000000                  ................
E0009170 $00000000 $00000000 $00000000 $00000000                  ................
E0009180 $00000000 $00000000 $00000000 $00000000                  ................
E0009190 $00000000 $00000000 $00000000 $00000000                  ................
E00091A0 $00000000 $00000000 $00000000 $00000000                  ................
E00091B0 $00000000 $00000000 $00000000 $00000000                  ................
E00091C0 $00000000 $00000000 $00000000 $00000000                  ................
E00091D0 $00000000 $00000000 $00000000 $00000000                  ................
E00091E0 $00000000 $00000000 $00000000 $00000000                  ................
E00091F0 $00000000 $00000000 $00000000 $00000000                  ................
E0009200 $00000000 $00000000 $00000000 $00000000                  ................
E0009210 $00000000 $00000000 $00000000 $00000000                  ................
E0009220 $00000000 $00000000 $00000000 $00000000                  ................
E0009230 $00000000 $00000000 $00000000 $00000000                  ................
E0009240 $00000000 $00000000 $00000000 $00000000                  ................
E0009250 $00000000 $00000000 $00000000 $00000000                  ................
E0009260 $00000000 $00000000 $00000000 $00000000                  ................
E0009270 $00000000 $00000000 $00000000 $00000000                  ................
E0009280 $00000000 $00000000 $00000000 $00000000                  ................
E0009290 $00000000 $00000000 $00000000 $00000000                  ................
E00092A0 $00000000 $00000000 $00000000 $00000000                  ................
E00092B0 $00000000 $00000000 $00000000 $00000000                  ................
E00092C0 $00000000 $00000000 $00000000 $00000000                  ................
E00092D0 $00000000 $00000000 $00000000 $00000000                  ................
E00092E0 $00000000 $00000000 $00000000 $00000000                  ................
E00092F0 $00000000 $00000000 $00000000 $00000000                  ................
E0009300 $00000000 $00000000 $00000000 $00000000                  ................
E0009310 $00000000 $00000000 $00000000 $00000000                  ................
E0009320 $00000000 $00000000 $00000000 $00000000                  ................
E0009330 $00000000 $00000000 $00000000 $00000000                  ................
E0009340 $00000000 $00000000 $00000000 $00000000                  ................
E0009350 $00000000 $00000000 $00000000 $00000000                  ................
E0009360 $00000000 $00000000 $00000000 $00000000                  ................
E0009370 $00000000 $00000000 $00000000 $00000000                  ................
E0009380 $00000000 $00000000 $00000000 $00000000                  ................
E0009390 $00000000 $00000000 $00000000 $00000000                  ................
E00093A0 $00000000 $00000000 $00000000 $00000000                  ................
E00093B0 $00000000 $00000000 $00000000 $00000000                  ................
E00093C0 $00000000 $00000000 $00000000 $00000000                  ................
E00093D0 $00000000 $00000000 $00000000 $00000000                  ................
E00093E0 $00000000 $00000000 $00000000 $00000000                  ................
E00093F0 $00000000 $00000000 $00000000 $00000000                  ................
E0009400 $00000000 $00000000 $00000000 $00000000                  ................
E0009410 $00000000 $00000000 $00000000 $00000000                  ................
E0009420 $00000000 $00000000 $00000000 $00000000                  ................
E0009430 $00000000 $00000000 $00000000 $00000000                  ................
E0009440 $00000000 $00000000 $00000000 $00000000                  ................
E0009450 $00000000 $00000000 $00000000 $00000000                  ................
E0009460 $00000000 $00000000 $00000000 $00000000                  ................
E0009470 $00000000 $00000000 $00000000 $00000000                  ................
E0009480 $00000000 $00000000 $00000000 $00000000                  ................
E0009490 $00000000 $00000000 $00000000 $00000000                  ................
E00094A0 $00000000 $00000000 $00000000 $00000000                  ................
E00094B0 $00000000 $00000000 $00000000 $00000000                  ................
E00094C0 $00000000 $00000000 $00000000 $00000000                  ................
E00094D0 $00000000 $00000000 $00000000 $00000000                  ................
E00094E0 $00000000 $00000000 $00000000 $00000000                  ................
E00094F0 $00000000 $00000000 $00000000 $00000000                  ................
E0009500 $00000000 $00000000 $00000000 $00000000                  ................
E0009510 $00000000 $00000000 $00000000 $00000000                  ................
E0009520 $00000000 $00000000 $00000000 $00000000                  ................
E0009530 $00000000 $00000000 $00000000 $00000000                  ................
E0009540 $00000000 $00000000 $00000000 $00000000                  ................
E0009550 $00000000 $00000000 $00000000 $00000000                  ................
E0009560 $00000000 $00000000 $00000000 $00000000                  ................
E0009570 $00000000 $00000000 $00000000 $00000000                  ................
E0009580 $00000000 $00000000 $00000000 $00000000                  ................
E0009590 $00000000 $00000000 $00000000 $00000000                  ................
E00095A0 $00000000 $00000000 $00000000 $00000000                  ................
E00095B0 $00000000 $00000000 $00000000 $00000000                  ................
E00095C0 $00000000 $00000000 $00000000 $00000000                  ................
E00095D0 $00000000 $00000000 $00000000 $00000000                  ................
E00095E0 $00000000 $00000000 $00000000 $00000000                  ................
E00095F0 $00000000 $00000000 $00000000 $00000000                  ................
E0009600 $00000000 $00000000 $00000000 $00000000                  ................
E0009610 $00000000 $00000000 $00000000 $00000000                  ................
E0009620 $00000000 $00000000 $00000000 $00000000                  ................
E0009630 $00000000 $00000000 $00000000 $00000000                  ................
E0009640 $00000000 $00000000 $00000000 $00000000                  ................
E0009650 $00000000 $00000000 $00000000 $00000000                  ................
E0009660 $00000000 $00000000 $00000000 $00000000                  ................
E0009670 $00000000 $00000000 $00000000 $00000000                  ................
E0009680 $00000000 $00000000 $00000000 $00000000                  ................
E0009690 $00000000 $00000000 $00000000 $00000000                  ................
E00096A0 $00000000 $00000000 $00000000 $00000000                  ................
E00096B0 $00000000 $00000000 $00000000 $00000000                  ................
E00096C0 $00000000 $00000000 $00000000 $00000000                  ................
E00096D0 $00000000 $00000000 $00000000 $00000000                  ................
E00096E0 $00000000 $00000000 $00000000 $00000000                  ................
E00096F0 $00000000 $00000000 $00000000 $00000000                  ................
E0009700 $00000000 $00000000 $00000000 $00000000                  ................
E0009710 $00000000 $00000000 $00000000 $00000000                  ................
E0009720 $00000000 $00000000 $00000000 $00000000                  ................
E0009730 $00000000 $00000000 $00000000 $00000000                  ................
E0009740 $00000000 $00000000 $00000000 $00000000                  ................
E0009750 $00000000 $00000000 $00000000 $00000000                  ................
E0009760 $00000000 $00000000 $00000000 $00000000                  ................
E0009770 $00000000 $00000000 $00000000 $00000000                  ................
E0009780 $00000000 $00000000 $00000000 $00000000                  ................
E0009790 $00000000 $00000000 $00000000 $00000000                  ................
E00097A0 $00000000 $00000000 $00000000 $00000000                  ................
E00097B0 $00000000 $00000000 $00000000 $00000000                  ................
E00097C0 $00000000 $00000000 $00000000 $00000000                  ................
E00097D0 $00000000 $00000000 $00000000 $00000000                  ................
E00097E0 $00000000 $00000000 $00000000 $00000000                  ................
E00097F0 $00000000 $00000000 $00000000 $00000000                  ................
E0009800 $00000000 $00000000 $00000000 $00000000                  ................
E0009810 $00000000 $00000000 $00000000 $00000000                  ................
E0009820 $00000000 $00000000 $00000000 $00000000                  ................
E0009830 $00000000 $00000000 $00000000 $00000000                  ................
E0009840 $00000000 $00000000 $00000000 $00000000                  ................
E0009850 $00000000 $00000000 $00000000 $00000000                  ................
E0009860 $00000000 $00000000 $00000000 $00000000                  ................
E0009870 $00000000 $00000000 $00000000 $00000000                  ................
E0009880 $00000000 $00000000 $00000000 $00000000                  ................
E0009890 $00000000 $00000000 $00000000 $00000000                  ................
E00098A0 $00000000 $00000000 $00000000 $00000000                  ................
E00098B0 $00000000 $00000000 $00000000 $00000000                  ................
E00098C0 $00000000 $00000000 $00000000 $00000000                  ................
E00098D0 $00000000 $00000000 $00000000 $00000000                  ................
E00098E0 $00000000 $00000000 $00000000 $00000000                  ................
E00098F0 $00000000 $00000000 $00000000 $00000000                  ................
E0009900 $00000000 $00000000 $00000000 $00000000                  ................
E0009910 $00000000 $00000000 $00000000 $00000000                  ................
E0009920 $00000000 $00000000 $00000000 $00000000                  ................
E0009930 $00000000 $00000000 $00000000 $00000000                  ................
E0009940 $00000000 $00000000 $00000000 $00000000                  ................
E0009950 $00000000 $00000000 $00000000 $00000000                  ................
E0009960 $00000000 $00000000 $00000000 $00000000                  ................
E0009970 $00000000 $00000000 $00000000 $00000000                  ................
E0009980 $00000000 $00000000 $00000000 $00000000                  ................
E0009990 $00000000 $00000000 $00000000 $00000000                  ................
E00099A0 $00000000 $00000000 $00000000 $00000000                  ................
E00099B0 $00000000 $00000000 $00000000 $00000000                  ................
E00099C0 $00000000 $00000000 $00000000 $00000000                  ................
E00099D0 $00000000 $00000000 $00000000 $00000000                  ................
E00099E0 $00000000 $00000000 $00000000 $00000000                  ................
E00099F0 $00000000 $00000000 $00000000 $00000000                  ................
E0009A00 $00000000 $00000000 $00000000 $00000000                  ................
E0009A10 $00000000 $00000000 $00000000 $00000000                  ................
E0009A20 $00000000 $00000000 $00000000 $00000000                  ................
E0009A30 $00000000 $00000000 $00000000 $00000000                  ................
E0009A40 $00000000 $00000000 $00000000 $00000000                  ................
E0009A50 $00000000 $00000000 $00000000 $00000000                  ................
E0009A60 $00000000 $00000000 $00000000 $00000000                  ................
E0009A70 $00000000 $00000000 $00000000 $00000000                  ................
E0009A80 $00000000 $00000000 $00000000 $00000000                  ................
E0009A90 $00000000 $00000000 $00000000 $00000000                  ................
E0009AA0 $00000000 $00000000 $00000000 $00000000                  ................
E0009AB0 $00000000 $00000000 $00000000 $00000000                  ................
E0009AC0 $00000000 $00000000 $00000000 $00000000                  ................
E0009AD0 $00000000 $00000000 $00000000 $00000000                  ................
E0009AE0 $00000000 $00000000 $00000000 $00000000                  ................
E0009AF0 $00000000 $00000000 $00000000 $00000000                  ................
E0009B00 $00000000 $00000000 $00000000 $00000000                  ................
E0009B10 $00000000 $00000000 $00000000 $00000000                  ................
E0009B20 $00000000 $00000000 $00000000 $00000000                  ................
E0009B30 $00000000 $00000000 $00000000 $00000000                  ................
E0009B40 $00000000 $00000000 $00000000 $00000000                  ................
E0009B50 $00000000 $00000000 $00000000 $00000000                  ................
E0009B60 $00000000 $00000000 $00000000 $00000000                  ................
E0009B70 $00000000 $00000000 $00000000 $00000000                  ................
E0009B80 $00000000 $00000000 $00000000 $00000000                  ................
E0009B90 $00000000 $00000000 $00000000 $00000000                  ................
E0009BA0 $00000000 $00000000 $00000000 $00000000                  ................
E0009BB0 $00000000 $00000000 $00000000 $00000000                  ................
E0009BC0 $00000000 $00000000 $00000000 $00000000                  ................
E0009BD0 $00000000 $00000000 $00000000 $00000000                  ................
E0009BE0 $00000000 $00000000 $00000000 $00000000                  ................
E0009BF0 $00000000 $00000000 $02080100 $00000000                  ................
E0009C00 $00000000 $00000000 $00000000 $00000000                  ................
E0009C10 $80044023 $00000000 $00000000 $00000000                  ..@#............
E0009C20 $000a0000 $00000000 $000a0000 $00000000                  ................
E0009C30 $8004401c $00000000 $00000000 $00000000                  ..@.............
E0009C40 $00000000 $00000000 $000e3800 $00000000                  ..........8.....
E0009C50 $80088016 $00000000 $00000000 $00000000                  ................
E0009C60 $00000000 $00000000 $00000000 $00000000                  ................
E0009C70 $00044023 $00000000 $00000000 $00000000                  ..@#............
E0009C80 $00000000 $00000000 $00000000 $00000000                  ................
E0009C90 $00044023 $00000000 $00000000 $00000000                  ..@#............
E0009CA0 $00000000 $00000000 $00000000 $00000000                  ................
E0009CB0 $00000000 $00000000 $00000000 $00000000                  ................
E0009CC0 $00000000 $00000000 $00000000 $00000000                  ................
E0009CD0 $00000000 $00000000 $00000000 $00000000                  ................
E0009CE0 $00000000 $00000000 $00000000 $00000000                  ................
E0009CF0 $00000000 $00000000 $00000000 $00000000                  ................
E0009D00 $00000000 $00000000 $00000000 $00000000                  ................
E0009D10 $00000000 $00000000 $00000000 $00000000                  ................
E0009D20 $00000000 $00000000 $00000000 $00000000                  ................
E0009D30 $00000000 $00000000 $00000000 $00000000                  ................
E0009D40 $00000000 $00000000 $00000000 $00000000                  ................
E0009D50 $00000000 $00000000 $00000000 $00000000                  ................
E0009D60 $00000000 $00000000 $00000000 $00000000                  ................
E0009D70 $00000000 $00000000 $00000000 $00000000                  ................
E0009D80 $00000000 $00000000 $00000000 $00000000                  ................
E0009D90 $00000000 $00000000 $00000000 $00000000                  ................
E0009DA0 $00000000 $00000000 $00000000 $00000000                  ................
E0009DB0 $a0f5501e $00000000 $00000000 $00000000                  ..P.............
E0009DC0 $00000000 $00000000 $00000000 $00000000                  ................
E0009DD0 $20f44023 $00000000 $00000000 $00000000                  ..@#............
E0009DE0 $00000000 $00000000 $00000000 $00000000                  ................
E0009DF0 $20f44023 $00000000 $00000000 $00000000                  ..@#............
E0009E00 $80020000 $00000000 $00bdfe00 $00000000                  ................
E0009E10 $00000000 $00000000 $00000000 $00000000                  ................
E0009E20 $00000041 $00000000 $00000800 $00000000                  ...A............
E0009E30 $00000000 $00000000 $00000000 $00000000                  ................
E0009E40 $00000000 $00000000 $00000000 $00000000                  ................
E0009E50 $00000000 $00000000 $00000000 $00000000                  ................
E0009E60 $00000000 $00000000 $00000000 $00000000                  ................
E0009E70 $00000000 $00000000 $00000000 $00000000                  ................
E0009E80 $00000000 $00000000 $00000000 $00000000                  ................
E0009E90 $00000000 $00000000 $00000000 $00000000                  ................
E0009EA0 $00000000 $00000000 $00000000 $00000000                  ................
E0009EB0 $00000000 $00000000 $00000000 $00000000                  ................
E0009EC0 $00000000 $00000000 $00000000 $00000000                  ................
E0009ED0 $00000000 $00000000 $00000000 $00000000                  ................
E0009EE0 $00000000 $00000000 $00000000 $00000000                  ................
E0009EF0 $00000000 $00000000 $00000000 $00000000                  ................
E0009F00 $80400080 $00000000 $00000000 $00000000                  .@..............
E0009F10 $c8800000 $a0000000 $00000000 $00000000                  ................
E0009F20 $00008000 $00000000 $00000000 $00000000                  ................
E0009F30 $00000000 $00000000 $00000000 $00000000                  ................
E0009F40 $00000000 $00000000 $00000000 $00000000                  ................
E0009F50 $00000000 $00000000 $00000000 $00000000                  ................
E0009F60 $00000000 $00000000 $00000000 $00000000                  ................
E0009F70 $00000000 $00000000 $00000000 $00000000                  ................
E0009F80 $00000000 $00000000 $00000000 $00000000                  ................
E0009F90 $00000000 $00000000 $00000000 $00000000                  ................
E0009FA0 $00000000 $00000000 $00000000 $00000000                  ................
E0009FB0 $00000000 $00000000 $00000000 $00000000                  ................
E0009FC0 $00000000 $00000000 $00000000 $00000000                  ................
E0009FD0 $00000000 $00000000 $00000000 $00000000                  ................
E0009FE0 $00000000 $00000000 $00000000 $00000000                  ................
E0009FF0 $00000000 $00000000 $00000000 $00000000                  ................


main (lc0)> 

^ permalink raw reply

* Re: [PATCH] KVM: PPC: Add generic hpte management functions
From: Avi Kivity @ 2010-06-28 10:01 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <4C2871A8.1060706@suse.de>

On 06/28/2010 12:55 PM, Alexander Graf wrote:
> Avi Kivity wrote:
>    
>> On 06/28/2010 12:27 PM, Alexander Graf wrote:
>>      
>>>> Am I looking at old code?
>>>>          
>>>
>>> Apparently. Check book3s_mmu_*.c
>>>        
>> I don't have that pattern.
>>      
> It's in this patch.
>    

Yes.  Silly me.

>> +static void invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache *pte)
>> +{
>> +	dprintk_mmu("KVM: Flushing SPT: 0x%lx (0x%llx) ->  0x%llx\n",
>> +		    pte->pte.eaddr, pte->pte.vpage, pte->host_va);
>> +
>> +	/* Different for 32 and 64 bit */
>> +	kvmppc_mmu_invalidate_pte(vcpu, pte);
>> +
>> +	if (pte->pte.may_write)
>> +		kvm_release_pfn_dirty(pte->pfn);
>> +	else
>> +		kvm_release_pfn_clean(pte->pfn);
>> +
>> +	list_del(&pte->list_pte);
>> +	list_del(&pte->list_vpte);
>> +	list_del(&pte->list_vpte_long);
>> +	list_del(&pte->list_all);
>> +
>> +	kmem_cache_free(vcpu->arch.hpte_cache, pte);
>> +}
>> +
>>      

(that's the old one with list_all - better check what's going on here)


>>>> (another difference is using struct hlist_head instead of list_head,
>>>> which I recommend since it saves space)
>>>>          
>>> Hrm. I thought about this quite a bit before too, but that makes
>>> invalidation more complicated, no? We always need to remember the
>>> previous entry in a list.
>>>        
>> hlist_for_each_entry_safe() does that.
>>      
> Oh - very nice. So all I need to do is pass the previous list entry to
> invalide_pte too and I'm good. I guess I'll give it a shot.
>    

No, just the for_each cursor.

>> Less and simpler code, better reporting through slabtop, less wastage
>> of partially allocated slab pages.
>>      
> But it also means that one VM can spill the global slab cache and kill
> another VM's mm performance, no?
>    

What do you mean by spill?

btw, in the midst of the nit-picking frenzy I forgot to ask how the 
individual hash chain lengths as well as the per-vm allocation were limited.

On x86 we have a per-vm limit and we allow the mm shrinker to reduce 
shadow mmu data structures dynamically.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH] KVM: PPC: Add generic hpte management functions
From: Alexander Graf @ 2010-06-28  9:55 UTC (permalink / raw)
  To: Avi Kivity; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <4C286C98.8060903@redhat.com>

Avi Kivity wrote:
> On 06/28/2010 12:27 PM, Alexander Graf wrote:
>>> Am I looking at old code?
>>
>>
>> Apparently. Check book3s_mmu_*.c
>
> I don't have that pattern.

It's in this patch.

> +static void invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache *pte)
> +{
> +	dprintk_mmu("KVM: Flushing SPT: 0x%lx (0x%llx) -> 0x%llx\n",
> +		    pte->pte.eaddr, pte->pte.vpage, pte->host_va);
> +
> +	/* Different for 32 and 64 bit */
> +	kvmppc_mmu_invalidate_pte(vcpu, pte);
> +
> +	if (pte->pte.may_write)
> +		kvm_release_pfn_dirty(pte->pfn);
> +	else
> +		kvm_release_pfn_clean(pte->pfn);
> +
> +	list_del(&pte->list_pte);
> +	list_del(&pte->list_vpte);
> +	list_del(&pte->list_vpte_long);
> +	list_del(&pte->list_all);
> +
> +	kmem_cache_free(vcpu->arch.hpte_cache, pte);
> +}
> +

>
>>
>>>
>>> (another difference is using struct hlist_head instead of list_head,
>>> which I recommend since it saves space)
>>
>> Hrm. I thought about this quite a bit before too, but that makes
>> invalidation more complicated, no? We always need to remember the
>> previous entry in a list.
>
> hlist_for_each_entry_safe() does that.

Oh - very nice. So all I need to do is pass the previous list entry to
invalide_pte too and I'm good. I guess I'll give it a shot.

>
>>>
>>>>>> +int kvmppc_mmu_hpte_init(struct kvm_vcpu *vcpu)
>>>>>> +{
>>>>>> +    char kmem_name[128];
>>>>>> +
>>>>>> +    /* init hpte slab cache */
>>>>>> +    snprintf(kmem_name, 128, "kvm-spt-%p", vcpu);
>>>>>> +    vcpu->arch.hpte_cache = kmem_cache_create(kmem_name,
>>>>>> +        sizeof(struct hpte_cache), sizeof(struct hpte_cache), 0,
>>>>>> NULL);
>>>>>>
>>>>>>
>>>>> Why not one global cache?
>>>>>
>>>> You mean over all vcpus? Or over all VMs?
>>>
>>> Totally global.  As in 'static struct kmem_cache *kvm_hpte_cache;'.
>>
>> What would be the benefit?
>
> Less and simpler code, better reporting through slabtop, less wastage
> of partially allocated slab pages.

But it also means that one VM can spill the global slab cache and kill
another VM's mm performance, no?


Alex

^ permalink raw reply

* Re: [PATCH] KVM: PPC: Add generic hpte management functions
From: Avi Kivity @ 2010-06-28  9:34 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <1A0E0E54-D055-4333-B5EC-DE2F71382AB7@suse.de>

On 06/28/2010 12:27 PM, Alexander Graf wrote:
>> Am I looking at old code?
>
>
> Apparently. Check book3s_mmu_*.c

I don't have that pattern.

>
>>
>> (another difference is using struct hlist_head instead of list_head, 
>> which I recommend since it saves space)
>
> Hrm. I thought about this quite a bit before too, but that makes 
> invalidation more complicated, no? We always need to remember the 
> previous entry in a list.

hlist_for_each_entry_safe() does that.

>>
>>>>> +int kvmppc_mmu_hpte_init(struct kvm_vcpu *vcpu)
>>>>> +{
>>>>> +    char kmem_name[128];
>>>>> +
>>>>> +    /* init hpte slab cache */
>>>>> +    snprintf(kmem_name, 128, "kvm-spt-%p", vcpu);
>>>>> +    vcpu->arch.hpte_cache = kmem_cache_create(kmem_name,
>>>>> +        sizeof(struct hpte_cache), sizeof(struct hpte_cache), 0, 
>>>>> NULL);
>>>>>
>>>>>
>>>> Why not one global cache?
>>>>
>>> You mean over all vcpus? Or over all VMs?
>>
>> Totally global.  As in 'static struct kmem_cache *kvm_hpte_cache;'.
>
> What would be the benefit?

Less and simpler code, better reporting through slabtop, less wastage of 
partially allocated slab pages.

>>> Because this way they don't interfere. An operation on one vCPU 
>>> doesn't inflict anything on another. There's also no locking 
>>> necessary this way.
>>>
>>
>> The slab writers have solved this for everyone, not just us.  
>> kmem_cache_alloc() will usually allocate from a per-cpu cache, so no 
>> interference and/or locking.  See ____cache_alloc().
>>
>> If there's a problem in kmem_cache_alloc(), solve it there, don't 
>> introduce workarounds.
>
> So you would still keep different hash arrays and everything, just 
> allocate the objects from a global pool? 

Yes.

> I still fail to see how that benefits anyone.

See above.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH] KVM: PPC: Add generic hpte management functions
From: Alexander Graf @ 2010-06-28  9:27 UTC (permalink / raw)
  To: Avi Kivity; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <4C286770.6010204@redhat.com>


Am 28.06.2010 um 11:12 schrieb Avi Kivity <avi@redhat.com>:

> On 06/28/2010 11:55 AM, Alexander Graf wrote:
>>
>>>> +
>>>> +static inline u64 kvmppc_mmu_hash_pte(u64 eaddr) {
>>>> +    return hash_64(eaddr>>   PTE_SIZE, HPTEG_HASH_BITS_PTE);
>>>> +}
>>>> +
>>>> +static inline u64 kvmppc_mmu_hash_vpte(u64 vpage) {
>>>> +    return hash_64(vpage&   0xfffffffffULL, HPTEG_HASH_BITS_VPTE);
>>>> +}
>>>> +
>>>> +static inline u64 kvmppc_mmu_hash_vpte_long(u64 vpage) {
>>>> +    return hash_64((vpage&   0xffffff000ULL)>>   12,
>>>> +               HPTEG_HASH_BITS_VPTE_LONG);
>>>> +}
>>>>
>>>>
>>> Still with the wierd coding style?
>>>
>> Not sure what's going on there. My editor displays it normally.  
>> Weird.
>>
>
> Try hitting 'save'.

Thanks for the hint :). No really, no idea what's going on here.

>
>>>> +static void kvmppc_mmu_pte_flush_all(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +    struct hpte_cache *pte, *tmp;
>>>> +    int i;
>>>> +
>>>> +    for (i = 0; i<   HPTEG_HASH_NUM_VPTE_LONG; i++) {
>>>> +        struct list_head *list =&vcpu->arch.hpte_hash_vpte_long 
>>>> [i];
>>>> +
>>>> +        list_for_each_entry_safe(pte, tmp, list, list_vpte_long) {
>>>> +            /* Jump over the helper entry */
>>>> +            if (&pte->list_vpte_long == list)
>>>> +                continue;
>>>>
>>>>
>>> I don't think l_f_e_e_s() will ever give you the head back.
>>>
>> Uh. Usually you have struct list_head in a struct and you point to  
>> the first entry to loop over all. So if it doesn't return the first  
>> entry, that would seem very counter-intuitive.
>>
>
> Linux list_heads aren't intuitive.  The same structure is used for  
> the container and for the nodes.  Would have been better (and more  
> typesafe) to have separate list_heads and list_nodes.

Hrm. Ok, I'll check by reading the source.

>
>>>> +
>>>> +            invalidate_pte(vcpu, pte);
>>>> +        }
>>>>
>>>>
>>> Does invalidate_pte() remove the pte?  doesn't seem so, so you can  
>>> drop the _safe iteration.
>>>
>> Yes, it does.
>>
>
> I don't see it?
>
>> static void invalidate_pte(struct hpte_cache *pte)
>> {
>>    dprintk_mmu("KVM: Flushing SPT: 0x%lx (0x%llx) -> 0x%llx\n",
>>            pte->pte.eaddr, pte->pte.vpage, pte->host_va);
>>
>>    ppc_md.hpte_invalidate(pte->slot, pte->host_va,
>>                   MMU_PAGE_4K, MMU_SEGSIZE_256M,
>>                   false);
>>    pte->host_va = 0;
>>
>>    if (pte->pte.may_write)
>>        kvm_release_pfn_dirty(pte->pfn);
>>    else
>>        kvm_release_pfn_clean(pte->pfn);
>> }
>
> Am I looking at old code?

Apparently. Check book3s_mmu_*.c

>
>>>> +
>>>> +/* Flush with mask 0xfffffffff */
>>>> +static void kvmppc_mmu_pte_vflush_short(struct kvm_vcpu *vcpu,  
>>>> u64 guest_vp)
>>>> +{
>>>> +    struct list_head *list;
>>>> +    struct hpte_cache *pte, *tmp;
>>>> +    u64 vp_mask = 0xfffffffffULL;
>>>> +
>>>> +    list =&vcpu->arch.hpte_hash_vpte[kvmppc_mmu_hash_vpte 
>>>> (guest_vp)];
>>>> +
>>>> +    /* Check the list for matching entries */
>>>> +    list_for_each_entry_safe(pte, tmp, list, list_vpte) {
>>>> +        /* Jump over the helper entry */
>>>> +        if (&pte->list_vpte == list)
>>>> +            continue;
>>>>
>>>>
>>> list cannot contain list.  Or maybe I don't understand the data  
>>> structure.  Isn't it multiple hash tables with lists holding  
>>> matching ptes?
>>>
>> It is multiple hash tables with list_heads that are one element of  
>> a list that contains the matching ptes. Usually you'd have
>>
>> struct x {
>>   struct list_head;
>>   int foo;
>>   char bar;
>> };
>>
>> and you loop through each of those elements. What we have here is
>>
>> struct list_head hash[..];
>>
>> and some loose struct x's. The hash's "next" element is a struct x.
>>
>> The "normal" way would be to have "struct x hash[..];" but I  
>> figured that eats too much space.
>>
>
> No, what you describe is quite normal.  In fact, x86 kvm mmu is  
> exactly like that, except we only have a single hash:
>
>>    struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];

I see.

>
> (another difference is using struct hlist_head instead of list_head,  
> which I recommend since it saves space)

Hrm. I thought about this quite a bit before too, but that makes  
invalidation more complicated, no? We always need to remember the  
previous entry in a list.

>
>>>> +
>>>> +            if ((pte->pte.raddr>= pa_start)&&
>>>> +                (pte->pte.raddr<   pa_end)) {
>>>> +                invalidate_pte(vcpu, pte);
>>>> +            }
>>>>
>>>>
>>> Extra braces.
>>>
>> Yeah, for two-lined if's I find it more readable that way. Is it  
>> forbidden?
>>
>
> It's not forbidden, but it tends to attract "cleanup" patches, which  
> are annoying.  Best to conform to the coding style if there isn't a  
> good reason not to.
>
> Personally I prefer braces for one-liners (yes they're ugly, but  
> they're safer and easier to patch).
>
>>>> +int kvmppc_mmu_hpte_init(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +    char kmem_name[128];
>>>> +
>>>> +    /* init hpte slab cache */
>>>> +    snprintf(kmem_name, 128, "kvm-spt-%p", vcpu);
>>>> +    vcpu->arch.hpte_cache = kmem_cache_create(kmem_name,
>>>> +        sizeof(struct hpte_cache), sizeof(struct hpte_cache), 0,  
>>>> NULL);
>>>>
>>>>
>>> Why not one global cache?
>>>
>> You mean over all vcpus? Or over all VMs?
>
> Totally global.  As in 'static struct kmem_cache *kvm_hpte_cache;'.

What would be the benefit?

>
>> Because this way they don't interfere. An operation on one vCPU  
>> doesn't inflict anything on another. There's also no locking  
>> necessary this way.
>>
>
> The slab writers have solved this for everyone, not just us.   
> kmem_cache_alloc() will usually allocate from a per-cpu cache, so no  
> interference and/or locking.  See ____cache_alloc().
>
> If there's a problem in kmem_cache_alloc(), solve it there, don't  
> introduce workarounds.

So you would still keep different hash arrays and everything, just  
allocate the objects from a global pool? I still fail to see how that  
benefits anyone.

Alex

^ permalink raw reply

* Re: [PATCH] KVM: PPC: Add generic hpte management functions
From: Avi Kivity @ 2010-06-28  9:12 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc
In-Reply-To: <20417D40-9345-485B-9201-8B3722B7457F@suse.de>

On 06/28/2010 11:55 AM, Alexander Graf wrote:
>
>>> +
>>> +static inline u64 kvmppc_mmu_hash_pte(u64 eaddr) {
>>> +	return hash_64(eaddr>>   PTE_SIZE, HPTEG_HASH_BITS_PTE);
>>> +}
>>> +
>>> +static inline u64 kvmppc_mmu_hash_vpte(u64 vpage) {
>>> +	return hash_64(vpage&   0xfffffffffULL, HPTEG_HASH_BITS_VPTE);
>>> +}
>>> +
>>> +static inline u64 kvmppc_mmu_hash_vpte_long(u64 vpage) {
>>> +	return hash_64((vpage&   0xffffff000ULL)>>   12,
>>> +		       HPTEG_HASH_BITS_VPTE_LONG);
>>> +}
>>>
>>>        
>> Still with the wierd coding style?
>>      
> Not sure what's going on there. My editor displays it normally. Weird.
>    

Try hitting 'save'.

>>> +static void kvmppc_mmu_pte_flush_all(struct kvm_vcpu *vcpu)
>>> +{
>>> +	struct hpte_cache *pte, *tmp;
>>> +	int i;
>>> +
>>> +	for (i = 0; i<   HPTEG_HASH_NUM_VPTE_LONG; i++) {
>>> +		struct list_head *list =&vcpu->arch.hpte_hash_vpte_long[i];
>>> +
>>> +		list_for_each_entry_safe(pte, tmp, list, list_vpte_long) {
>>> +			/* Jump over the helper entry */
>>> +			if (&pte->list_vpte_long == list)
>>> +				continue;
>>>
>>>        
>> I don't think l_f_e_e_s() will ever give you the head back.
>>      
> Uh. Usually you have struct list_head in a struct and you point to the first entry to loop over all. So if it doesn't return the first entry, that would seem very counter-intuitive.
>    

Linux list_heads aren't intuitive.  The same structure is used for the 
container and for the nodes.  Would have been better (and more typesafe) 
to have separate list_heads and list_nodes.

>>> +
>>> +			invalidate_pte(vcpu, pte);
>>> +		}
>>>
>>>        
>> Does invalidate_pte() remove the pte?  doesn't seem so, so you can drop the _safe iteration.
>>      
> Yes, it does.
>    

I don't see it?

> static void invalidate_pte(struct hpte_cache *pte)
> {
>     dprintk_mmu("KVM: Flushing SPT: 0x%lx (0x%llx) -> 0x%llx\n",
>             pte->pte.eaddr, pte->pte.vpage, pte->host_va);
>
>     ppc_md.hpte_invalidate(pte->slot, pte->host_va,
>                    MMU_PAGE_4K, MMU_SEGSIZE_256M,
>                    false);
>     pte->host_va = 0;
>
>     if (pte->pte.may_write)
>         kvm_release_pfn_dirty(pte->pfn);
>     else
>         kvm_release_pfn_clean(pte->pfn);
> }

Am I looking at old code?

>>> +
>>> +/* Flush with mask 0xfffffffff */
>>> +static void kvmppc_mmu_pte_vflush_short(struct kvm_vcpu *vcpu, u64 guest_vp)
>>> +{
>>> +	struct list_head *list;
>>> +	struct hpte_cache *pte, *tmp;
>>> +	u64 vp_mask = 0xfffffffffULL;
>>> +
>>> +	list =&vcpu->arch.hpte_hash_vpte[kvmppc_mmu_hash_vpte(guest_vp)];
>>> +
>>> +	/* Check the list for matching entries */
>>> +	list_for_each_entry_safe(pte, tmp, list, list_vpte) {
>>> +		/* Jump over the helper entry */
>>> +		if (&pte->list_vpte == list)
>>> +			continue;
>>>
>>>        
>> list cannot contain list.  Or maybe I don't understand the data structure.  Isn't it multiple hash tables with lists holding matching ptes?
>>      
> It is multiple hash tables with list_heads that are one element of a list that contains the matching ptes. Usually you'd have
>
> struct x {
>    struct list_head;
>    int foo;
>    char bar;
> };
>
> and you loop through each of those elements. What we have here is
>
> struct list_head hash[..];
>
> and some loose struct x's. The hash's "next" element is a struct x.
>
> The "normal" way would be to have "struct x hash[..];" but I figured that eats too much space.
>    

No, what you describe is quite normal.  In fact, x86 kvm mmu is exactly 
like that, except we only have a single hash:

>     struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];

(another difference is using struct hlist_head instead of list_head, 
which I recommend since it saves space)

>>> +
>>> +			if ((pte->pte.raddr>= pa_start)&&
>>> +			    (pte->pte.raddr<   pa_end)) {
>>> +				invalidate_pte(vcpu, pte);
>>> +			}
>>>
>>>        
>> Extra braces.
>>      
> Yeah, for two-lined if's I find it more readable that way. Is it forbidden?
>    

It's not forbidden, but it tends to attract "cleanup" patches, which are 
annoying.  Best to conform to the coding style if there isn't a good 
reason not to.

Personally I prefer braces for one-liners (yes they're ugly, but they're 
safer and easier to patch).

>>> +int kvmppc_mmu_hpte_init(struct kvm_vcpu *vcpu)
>>> +{
>>> +	char kmem_name[128];
>>> +
>>> +	/* init hpte slab cache */
>>> +	snprintf(kmem_name, 128, "kvm-spt-%p", vcpu);
>>> +	vcpu->arch.hpte_cache = kmem_cache_create(kmem_name,
>>> +		sizeof(struct hpte_cache), sizeof(struct hpte_cache), 0, NULL);
>>>
>>>        
>> Why not one global cache?
>>      
> You mean over all vcpus? Or over all VMs?

Totally global.  As in 'static struct kmem_cache *kvm_hpte_cache;'.

> Because this way they don't interfere. An operation on one vCPU doesn't inflict anything on another. There's also no locking necessary this way.
>    

The slab writers have solved this for everyone, not just us.  
kmem_cache_alloc() will usually allocate from a per-cpu cache, so no 
interference and/or locking.  See ____cache_alloc().

If there's a problem in kmem_cache_alloc(), solve it there, don't 
introduce workarounds.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH] KVM: PPC: Add generic hpte management functions
From: Alexander Graf @ 2010-06-28  8:55 UTC (permalink / raw)
  To: Avi Kivity; +Cc: linuxppc-dev, KVM list, kvm-ppc
In-Reply-To: <4C285D1C.5060508@redhat.com>


On 28.06.2010, at 10:28, Avi Kivity wrote:

> On 06/26/2010 02:16 AM, Alexander Graf wrote:
>> Currently the shadow paging code keeps an array of entries it knows =
about.
>> Whenever the guest invalidates an entry, we loop through that entry,
>> trying to invalidate matching parts.
>>=20
>> While this is a really simple implementation, it is probably the most
>> ineffective one possible. So instead, let's keep an array of lists =
around
>> that are indexed by a hash. This way each PTE can be added by 4 =
list_add,
>> removed by 4 list_del invocations and the search only needs to loop =
through
>> entries that share the same hash.
>>=20
>> This patch implements said lookup and exports generic functions that =
both
>> the 32-bit and 64-bit backend can use.
>>=20
>>=20
>> +
>> +static inline u64 kvmppc_mmu_hash_pte(u64 eaddr) {
>> +	return hash_64(eaddr>>  PTE_SIZE, HPTEG_HASH_BITS_PTE);
>> +}
>> +
>> +static inline u64 kvmppc_mmu_hash_vpte(u64 vpage) {
>> +	return hash_64(vpage&  0xfffffffffULL, HPTEG_HASH_BITS_VPTE);
>> +}
>> +
>> +static inline u64 kvmppc_mmu_hash_vpte_long(u64 vpage) {
>> +	return hash_64((vpage&  0xffffff000ULL)>>  12,
>> +		       HPTEG_HASH_BITS_VPTE_LONG);
>> +}
>>  =20
>=20
> Still with the wierd coding style?

Not sure what's going on there. My editor displays it normally. Weird.

>=20
>> +static void kvmppc_mmu_pte_flush_all(struct kvm_vcpu *vcpu)
>> +{
>> +	struct hpte_cache *pte, *tmp;
>> +	int i;
>> +
>> +	for (i =3D 0; i<  HPTEG_HASH_NUM_VPTE_LONG; i++) {
>> +		struct list_head *list =
=3D&vcpu->arch.hpte_hash_vpte_long[i];
>> +
>> +		list_for_each_entry_safe(pte, tmp, list, list_vpte_long) =
{
>> +			/* Jump over the helper entry */
>> +			if (&pte->list_vpte_long =3D=3D list)
>> +				continue;
>>  =20
>=20
> I don't think l_f_e_e_s() will ever give you the head back.

Uh. Usually you have struct list_head in a struct and you point to the =
first entry to loop over all. So if it doesn't return the first entry, =
that would seem very counter-intuitive.

>=20
>> +
>> +			invalidate_pte(vcpu, pte);
>> +		}
>>  =20
>=20
> Does invalidate_pte() remove the pte?  doesn't seem so, so you can =
drop the _safe iteration.

Yes, it does.

>=20
>> +	}
>> +}
>> +
>> +void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, ulong guest_ea, =
ulong ea_mask)
>> +{
>> +	u64 i;
>> +
>> +	dprintk_mmu("KVM: Flushing %d Shadow PTEs: 0x%lx&  0x%lx\n",
>> +		    vcpu->arch.hpte_cache_count, guest_ea, ea_mask);
>> +
>> +	guest_ea&=3D ea_mask;
>> +
>> +	switch (ea_mask) {
>> +	case ~0xfffUL:
>> +	{
>> +		struct list_head *list;
>> +		struct hpte_cache *pte, *tmp;
>> +
>> +		/* Find the list of entries in the map */
>> +		list =
=3D&vcpu->arch.hpte_hash_pte[kvmppc_mmu_hash_pte(guest_ea)];
>> +
>> +		/* Check the list for matching entries */
>> +		list_for_each_entry_safe(pte, tmp, list, list_pte) {
>> +			/* Jump over the helper entry */
>> +			if (&pte->list_pte =3D=3D list)
>> +				continue;
>>  =20
>=20
> Same here.
>=20
>> +
>> +			/* Invalidate matching PTE */
>> +			if ((pte->pte.eaddr&  ~0xfffUL) =3D=3D guest_ea)
>> +				invalidate_pte(vcpu, pte);
>> +		}
>> +		break;
>> +	}
>>  =20
>=20
> Would be nice to put this block into a function.

Yup.

>=20
>> +	case 0x0ffff000:
>> +		/* 32-bit flush w/o segment, go through all possible =
segments */
>> +		for (i =3D 0; i<  0x100000000ULL; i +=3D 0x10000000ULL)
>> +			kvmppc_mmu_pte_flush(vcpu, guest_ea | i, =
~0xfffUL);
>> +		break;
>> +	case 0:
>> +		/* Doing a complete flush ->  start from scratch */
>> +		kvmppc_mmu_pte_flush_all(vcpu);
>> +		break;
>> +	default:
>> +		WARN_ON(1);
>> +		break;
>> +	}
>> +}
>> +
>> +/* Flush with mask 0xfffffffff */
>> +static void kvmppc_mmu_pte_vflush_short(struct kvm_vcpu *vcpu, u64 =
guest_vp)
>> +{
>> +	struct list_head *list;
>> +	struct hpte_cache *pte, *tmp;
>> +	u64 vp_mask =3D 0xfffffffffULL;
>> +
>> +	list =
=3D&vcpu->arch.hpte_hash_vpte[kvmppc_mmu_hash_vpte(guest_vp)];
>> +
>> +	/* Check the list for matching entries */
>> +	list_for_each_entry_safe(pte, tmp, list, list_vpte) {
>> +		/* Jump over the helper entry */
>> +		if (&pte->list_vpte =3D=3D list)
>> +			continue;
>>  =20
>=20
> list cannot contain list.  Or maybe I don't understand the data =
structure.  Isn't it multiple hash tables with lists holding matching =
ptes?

It is multiple hash tables with list_heads that are one element of a =
list that contains the matching ptes. Usually you'd have

struct x {
  struct list_head;
  int foo;
  char bar;
};

and you loop through each of those elements. What we have here is

struct list_head hash[..];

and some loose struct x's. The hash's "next" element is a struct x.

The "normal" way would be to have "struct x hash[..];" but I figured =
that eats too much space.

>=20
>> +
>> +		/* Invalidate matching PTEs */
>> +		if ((pte->pte.vpage&  vp_mask) =3D=3D guest_vp)
>> +			invalidate_pte(vcpu, pte);
>> +	}
>> +}
>> +
>>=20
>> +
>> +void kvmppc_mmu_pte_pflush(struct kvm_vcpu *vcpu, ulong pa_start, =
ulong pa_end)
>> +{
>> +	struct hpte_cache *pte, *tmp;
>> +	int i;
>> +
>> +	dprintk_mmu("KVM: Flushing %d Shadow pPTEs: 0x%lx - 0x%lx\n",
>> +		    vcpu->arch.hpte_cache_count, pa_start, pa_end);
>> +
>> +	for (i =3D 0; i<  HPTEG_HASH_NUM_VPTE_LONG; i++) {
>> +		struct list_head *list =
=3D&vcpu->arch.hpte_hash_vpte_long[i];
>> +
>> +		list_for_each_entry_safe(pte, tmp, list, list_vpte_long) =
{
>> +			/* Jump over the helper entry */
>> +			if (&pte->list_vpte_long =3D=3D list)
>> +				continue;
>> +
>> +			if ((pte->pte.raddr>=3D pa_start)&&
>> +			    (pte->pte.raddr<  pa_end)) {
>> +				invalidate_pte(vcpu, pte);
>> +			}
>>  =20
>=20
> Extra braces.

Yeah, for two-lined if's I find it more readable that way. Is it =
forbidden?

>=20
>> +		}
>> +	}
>> +}
>> +
>>=20
>> +
>> +static void kvmppc_mmu_hpte_init_hash(struct list_head *hash_list, =
int len)
>> +{
>> +	int i;
>> +
>> +	for (i =3D 0; i<  len; i++) {
>> +		INIT_LIST_HEAD(&hash_list[i]);
>> +	}
>> +}
>>  =20
>=20
> Extra braces.

Yup.

>=20
>> +
>> +int kvmppc_mmu_hpte_init(struct kvm_vcpu *vcpu)
>> +{
>> +	char kmem_name[128];
>> +
>> +	/* init hpte slab cache */
>> +	snprintf(kmem_name, 128, "kvm-spt-%p", vcpu);
>> +	vcpu->arch.hpte_cache =3D kmem_cache_create(kmem_name,
>> +		sizeof(struct hpte_cache), sizeof(struct hpte_cache), 0, =
NULL);
>>  =20
>=20
> Why not one global cache?

You mean over all vcpus? Or over all VMs? Because this way they don't =
interfere. An operation on one vCPU doesn't inflict anything on another. =
There's also no locking necessary this way.


Alex

^ permalink raw reply

* Re: [PATCH 18/26] KVM: PPC: KVM PV guest stubs
From: Avi Kivity @ 2010-06-28  8:33 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, kvm-ppc, Matt Evans, KVM list
In-Reply-To: <AD79CD04-74CF-49B9-BACC-4C190DF5214A@suse.de>

On 06/28/2010 11:23 AM, Alexander Graf wrote:
>> You mean even kvm.ko doesn't use privileged instructions?
>>      
> It does, but I don't think it's worth speeding those up. There are only a couple. Most of the privileged instructions in PPC KVM are statically compiled into the kernel because we need to guarantee they're in the RMO (first 8MB for the PS3).
>
> Even with the magic page in use, trapping instructions still works exactly as before, so we're only talking about a speed difference.
>
>    

Yeah, that also answers my question re pv->nonpv transition.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH 26/26] KVM: PPC: Add Documentation about PV interface
From: Avi Kivity @ 2010-06-28  8:32 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, kvm-ppc, Milton Miller, KVM list
In-Reply-To: <4330E5DC-63C5-40EA-9E99-34EE58074D1A@suse.de>

On 06/28/2010 11:21 AM, Alexander Graf wrote:
>
> The other alternative I'd see is to reuse an instruction that is not sc. We could for example pull the mfpvr trick again, but pass a different magic value in the register this time that tells the hypervisor "this is a hypercall".
>
> Or we could reserve a different SPR. But from what I've seen there are already quite a lot of SPRs out there. More than available numbers :).
>
> The hypercall technique I used here is actually inspired by MOL. They use magic constants in r3 and r4 for their "OSI" identification. I'm frankly not sure what the best approach is, but considering that syscalls from the kernel lie in the guest kernel's hand, we could just declare any breakage a guest kernel bug.
>
>    

Magic = liable to break without notice.

Given r0 is the architectural syscall number, and r3 is the Linux 
syscall number, we can use a combination of r0 and r3, reserve r3 in 
Linux, and hope that no one else uses our selection of r0.

Still smelly, but not as bad.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH] KVM: PPC: Add generic hpte management functions
From: Avi Kivity @ 2010-06-28  8:28 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc
In-Reply-To: <1277507817-626-2-git-send-email-agraf@suse.de>

On 06/26/2010 02:16 AM, Alexander Graf wrote:
> Currently the shadow paging code keeps an array of entries it knows about.
> Whenever the guest invalidates an entry, we loop through that entry,
> trying to invalidate matching parts.
>
> While this is a really simple implementation, it is probably the most
> ineffective one possible. So instead, let's keep an array of lists around
> that are indexed by a hash. This way each PTE can be added by 4 list_add,
> removed by 4 list_del invocations and the search only needs to loop through
> entries that share the same hash.
>
> This patch implements said lookup and exports generic functions that both
> the 32-bit and 64-bit backend can use.
>
>
> +
> +static inline u64 kvmppc_mmu_hash_pte(u64 eaddr) {
> +	return hash_64(eaddr>>  PTE_SIZE, HPTEG_HASH_BITS_PTE);
> +}
> +
> +static inline u64 kvmppc_mmu_hash_vpte(u64 vpage) {
> +	return hash_64(vpage&  0xfffffffffULL, HPTEG_HASH_BITS_VPTE);
> +}
> +
> +static inline u64 kvmppc_mmu_hash_vpte_long(u64 vpage) {
> +	return hash_64((vpage&  0xffffff000ULL)>>  12,
> +		       HPTEG_HASH_BITS_VPTE_LONG);
> +}
>    

Still with the wierd coding style?

> +static void kvmppc_mmu_pte_flush_all(struct kvm_vcpu *vcpu)
> +{
> +	struct hpte_cache *pte, *tmp;
> +	int i;
> +
> +	for (i = 0; i<  HPTEG_HASH_NUM_VPTE_LONG; i++) {
> +		struct list_head *list =&vcpu->arch.hpte_hash_vpte_long[i];
> +
> +		list_for_each_entry_safe(pte, tmp, list, list_vpte_long) {
> +			/* Jump over the helper entry */
> +			if (&pte->list_vpte_long == list)
> +				continue;
>    

I don't think l_f_e_e_s() will ever give you the head back.

> +
> +			invalidate_pte(vcpu, pte);
> +		}
>    

Does invalidate_pte() remove the pte?  doesn't seem so, so you can drop 
the _safe iteration.

> +	}
> +}
> +
> +void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, ulong guest_ea, ulong ea_mask)
> +{
> +	u64 i;
> +
> +	dprintk_mmu("KVM: Flushing %d Shadow PTEs: 0x%lx&  0x%lx\n",
> +		    vcpu->arch.hpte_cache_count, guest_ea, ea_mask);
> +
> +	guest_ea&= ea_mask;
> +
> +	switch (ea_mask) {
> +	case ~0xfffUL:
> +	{
> +		struct list_head *list;
> +		struct hpte_cache *pte, *tmp;
> +
> +		/* Find the list of entries in the map */
> +		list =&vcpu->arch.hpte_hash_pte[kvmppc_mmu_hash_pte(guest_ea)];
> +
> +		/* Check the list for matching entries */
> +		list_for_each_entry_safe(pte, tmp, list, list_pte) {
> +			/* Jump over the helper entry */
> +			if (&pte->list_pte == list)
> +				continue;
>    

Same here.

> +
> +			/* Invalidate matching PTE */
> +			if ((pte->pte.eaddr&  ~0xfffUL) == guest_ea)
> +				invalidate_pte(vcpu, pte);
> +		}
> +		break;
> +	}
>    

Would be nice to put this block into a function.

> +	case 0x0ffff000:
> +		/* 32-bit flush w/o segment, go through all possible segments */
> +		for (i = 0; i<  0x100000000ULL; i += 0x10000000ULL)
> +			kvmppc_mmu_pte_flush(vcpu, guest_ea | i, ~0xfffUL);
> +		break;
> +	case 0:
> +		/* Doing a complete flush ->  start from scratch */
> +		kvmppc_mmu_pte_flush_all(vcpu);
> +		break;
> +	default:
> +		WARN_ON(1);
> +		break;
> +	}
> +}
> +
> +/* Flush with mask 0xfffffffff */
> +static void kvmppc_mmu_pte_vflush_short(struct kvm_vcpu *vcpu, u64 guest_vp)
> +{
> +	struct list_head *list;
> +	struct hpte_cache *pte, *tmp;
> +	u64 vp_mask = 0xfffffffffULL;
> +
> +	list =&vcpu->arch.hpte_hash_vpte[kvmppc_mmu_hash_vpte(guest_vp)];
> +
> +	/* Check the list for matching entries */
> +	list_for_each_entry_safe(pte, tmp, list, list_vpte) {
> +		/* Jump over the helper entry */
> +		if (&pte->list_vpte == list)
> +			continue;
>    

list cannot contain list.  Or maybe I don't understand the data 
structure.  Isn't it multiple hash tables with lists holding matching ptes?

> +
> +		/* Invalidate matching PTEs */
> +		if ((pte->pte.vpage&  vp_mask) == guest_vp)
> +			invalidate_pte(vcpu, pte);
> +	}
> +}
> +
>
> +
> +void kvmppc_mmu_pte_pflush(struct kvm_vcpu *vcpu, ulong pa_start, ulong pa_end)
> +{
> +	struct hpte_cache *pte, *tmp;
> +	int i;
> +
> +	dprintk_mmu("KVM: Flushing %d Shadow pPTEs: 0x%lx - 0x%lx\n",
> +		    vcpu->arch.hpte_cache_count, pa_start, pa_end);
> +
> +	for (i = 0; i<  HPTEG_HASH_NUM_VPTE_LONG; i++) {
> +		struct list_head *list =&vcpu->arch.hpte_hash_vpte_long[i];
> +
> +		list_for_each_entry_safe(pte, tmp, list, list_vpte_long) {
> +			/* Jump over the helper entry */
> +			if (&pte->list_vpte_long == list)
> +				continue;
> +
> +			if ((pte->pte.raddr>= pa_start)&&
> +			    (pte->pte.raddr<  pa_end)) {
> +				invalidate_pte(vcpu, pte);
> +			}
>    

Extra braces.

> +		}
> +	}
> +}
> +
>
> +
> +static void kvmppc_mmu_hpte_init_hash(struct list_head *hash_list, int len)
> +{
> +	int i;
> +
> +	for (i = 0; i<  len; i++) {
> +		INIT_LIST_HEAD(&hash_list[i]);
> +	}
> +}
>    

Extra braces.

> +
> +int kvmppc_mmu_hpte_init(struct kvm_vcpu *vcpu)
> +{
> +	char kmem_name[128];
> +
> +	/* init hpte slab cache */
> +	snprintf(kmem_name, 128, "kvm-spt-%p", vcpu);
> +	vcpu->arch.hpte_cache = kmem_cache_create(kmem_name,
> +		sizeof(struct hpte_cache), sizeof(struct hpte_cache), 0, NULL);
>    

Why not one global cache?

> +
> +	/* init hpte lookup hashes */
> +	kvmppc_mmu_hpte_init_hash(vcpu->arch.hpte_hash_pte,
> +				  ARRAY_SIZE(vcpu->arch.hpte_hash_pte));
> +	kvmppc_mmu_hpte_init_hash(vcpu->arch.hpte_hash_vpte,
> +				  ARRAY_SIZE(vcpu->arch.hpte_hash_vpte));
> +	kvmppc_mmu_hpte_init_hash(vcpu->arch.hpte_hash_vpte_long,
> +				  ARRAY_SIZE(vcpu->arch.hpte_hash_vpte_long));
> +
> +	return 0;
> +}
>    


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH 18/26] KVM: PPC: KVM PV guest stubs
From: Alexander Graf @ 2010-06-28  8:23 UTC (permalink / raw)
  To: Avi Kivity; +Cc: linuxppc-dev, kvm-ppc, Matt Evans, KVM list
In-Reply-To: <4C285A13.8070208@redhat.com>

On 28.06.2010, at 10:15, Avi Kivity wrote:

> On 06/28/2010 09:33 AM, Alexander Graf wrote:
>>=20
>>> Could you do something similar in module_finalize() to patch loaded =
modules' .text sections?
>>>    =20
>> I could, but do we need it? I objdump -d | grep'ed all my modules and =
didn't find any need to do so.
>>  =20
>=20
> You mean even kvm.ko doesn't use privileged instructions?

It does, but I don't think it's worth speeding those up. There are only =
a couple. Most of the privileged instructions in PPC KVM are statically =
compiled into the kernel because we need to guarantee they're in the RMO =
(first 8MB for the PS3).

Even with the magic page in use, trapping instructions still works =
exactly as before, so we're only talking about a speed difference.

Alex

^ permalink raw reply

* Re: [PATCH 26/26] KVM: PPC: Add Documentation about PV interface
From: Alexander Graf @ 2010-06-28  8:21 UTC (permalink / raw)
  To: Avi Kivity; +Cc: linuxppc-dev, kvm-ppc, Milton Miller, KVM list
In-Reply-To: <4C285991.1050303@redhat.com>

On 28.06.2010, at 10:13, Avi Kivity wrote:

> On 06/28/2010 10:49 AM, Alexander Graf wrote:
>>=20
>>> I don't believe we support the kernel actually doing a syscall to =
itself
>>> anymore, at least on powerpc.  The callers call the underlying =
system
>>> call function, or kernel_thread.
>>>=20
>>> That said, I would suggest we allocate a syscall number for this, as =
it
>>> would document the usage.  (In additon to 0..nr_syscalls - 1 we have
>>> 0x1ebe in use).
>>>    =20
>> That's actually a pretty good idea.
>>  =20
>=20
> Since the syscall register is not architectual (or rather it is =
architectural but Linux ignores it) I don't see the point.  It would =
work for Linux but may alias some random parameter for a different =
guest.  We need a reliable method of distinguishing between syscalls and =
hypercalls.  Matching pc would work (but is defeated by inlining) so =
long as we find some other way of identifying the hc pc to the =
hypervisor.

The other alternative I'd see is to reuse an instruction that is not sc. =
We could for example pull the mfpvr trick again, but pass a different =
magic value in the register this time that tells the hypervisor "this is =
a hypercall".

Or we could reserve a different SPR. But from what I've seen there are =
already quite a lot of SPRs out there. More than available numbers :).

The hypercall technique I used here is actually inspired by MOL. They =
use magic constants in r3 and r4 for their "OSI" identification. I'm =
frankly not sure what the best approach is, but considering that =
syscalls from the kernel lie in the guest kernel's hand, we could just =
declare any breakage a guest kernel bug.

Alex

^ permalink raw reply

* Re: [PATCH 18/26] KVM: PPC: KVM PV guest stubs
From: Avi Kivity @ 2010-06-28  8:15 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, kvm-ppc, Matt Evans, KVM list
In-Reply-To: <168E1B5F-44F7-4FF5-80A5-64B0E2E94D68@suse.de>

On 06/28/2010 09:33 AM, Alexander Graf wrote:
>
>> Could you do something similar in module_finalize() to patch loaded modules' .text sections?
>>      
> I could, but do we need it? I objdump -d | grep'ed all my modules and didn't find any need to do so.
>    

You mean even kvm.ko doesn't use privileged instructions?

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH 26/26] KVM: PPC: Add Documentation about PV interface
From: Avi Kivity @ 2010-06-28  8:13 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, kvm-ppc, Milton Miller, KVM list
In-Reply-To: <92F4A3F3-A89F-418D-BD4D-66E2489F2E42@suse.de>

On 06/28/2010 10:49 AM, Alexander Graf wrote:
>
>> I don't believe we support the kernel actually doing a syscall to itself
>> anymore, at least on powerpc.  The callers call the underlying system
>> call function, or kernel_thread.
>>
>> That said, I would suggest we allocate a syscall number for this, as it
>> would document the usage.  (In additon to 0..nr_syscalls - 1 we have
>> 0x1ebe in use).
>>      
> That's actually a pretty good idea.
>    

Since the syscall register is not architectual (or rather it is 
architectural but Linux ignores it) I don't see the point.  It would 
work for Linux but may alias some random parameter for a different 
guest.  We need a reliable method of distinguishing between syscalls and 
hypercalls.  Matching pc would work (but is defeated by inlining) so 
long as we find some other way of identifying the hc pc to the hypervisor.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH 26/26] KVM: PPC: Add Documentation about PV interface
From: Alexander Graf @ 2010-06-28  7:49 UTC (permalink / raw)
  To: Milton Miller; +Cc: linuxppc-dev, Avi Kivity, kvm-ppc, KVM list
In-Reply-To: <1277709531_13308@mail4.comsite.net>


On 28.06.2010, at 09:18, Milton Miller wrote:

> On Sun Jun 27 around 19:33:52 EST 2010 Alexander Graf wrote:
>> Am 27.06.2010 um 10:14 schrieb Avi Kivity <avi at redhat.com>:
>>> On 06/26/2010 02:25 AM, Alexander Graf wrote:
>=20
>>>> +
>>>> +PPC hypercalls
>>>> +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>>> +
>>>> +The only viable ways to reliably get from guest context to host =20=

>>>> context are:
>>>> +
>>>> +    1) Call an invalid instruction
>>>> +    2) Call the "sc" instruction with a parameter to "sc"
>>>> +    3) Call the "sc" instruction with parameters in GPRs
>>>> +
>>>> +Method 1 is always a bad idea. Invalid instructions can be =20
>>>> replaced later on
>>>> +by valid instructions, rendering the interface broken.
>>>> +
>>>> +Method 2 also has downfalls. If the parameter to "sc" is !=3D 0 =
the =20
>>>> spec is
>>>> +rather unclear if the sc is targeted directly for the hypervisor =20=

>>>> or the
>>>> +supervisor. It would also require that we read the syscall issuing =
=20
>>>> instruction
>>>> +every time a syscall is issued, slowing down guest syscalls.
>>>> +
>=20
> It goes to the hypervisor, and it would require the hypervisor to
> return to the supervisor, but I believe it just returns to the user =
with
> permission denied.

That's what I assumed, yeah :(.

>=20
>>>> +Method 3 is what KVM uses. We pass magic constants =20
>>>> (KVM_SC_MAGIC_R3 and
>>>> +KVM_SC_MAGIC_R4) in r3 and r4 respectively. If a syscall =20
>>>> instruction with these
>>>> +magic values arrives from the guest's kernel mode, we take the =20
>>>> syscall as a
>>>> +hypercall.
>>>>=20
>>>=20
>>> Is there any chance a normal syscall will have those values in r3 =20=

>>> and r4?
>>=20
>> r3 is the syscall number. So as long as the guest doesn't reuse that =20=

>> value, we're safe. Since in general syscall numbers are not randomly =20=

>> scattered throughout the number range, we should be ok here.
>>=20
>=20
> No, r0 has the system call number.  Registers 3 and 4 are the first
> 2 args in c abi (or first 64 bit arg in 32 bit c abi), but the linux
> syscall abi special.  (In addition, it returns success or failure in
> cr0).

Oh. Ahem :)

>=20
>>>=20
>>> If so, maybe it's better to use pc as they key for hypercalls.  Let =20=

>>> the guest designate one instruction address as the hypercall call =20=

>>> point; kvm can easily check it and reflect it back to the guest if =20=

>>> it doesn't match.
>>>=20
>>=20
>> You mean the guest would tell the hv where the hypercall lies? That =20=

>> would require a hypercall, no? Defining it statically is tricky. I =20=

>> want to PV'nize osx using a kernel module later, so I don't have =20
>> control over the physical layout.
>>=20
>>> Is it valid and useful to issue sc from privileged mode anyway, =20
>>> except for calling the hypervisor?
>>=20
>> Same as a syscall on x86 really. The kernel can and does issue =20
>> syscalls within itself.
>>=20
>>=20
>=20
> I don't believe we support the kernel actually doing a syscall to =
itself
> anymore, at least on powerpc.  The callers call the underlying system
> call function, or kernel_thread.
>=20
> That said, I would suggest we allocate a syscall number for this, as =
it
> would document the usage.  (In additon to 0..nr_syscalls - 1 we have
> 0x1ebe in use).

That's actually a pretty good idea.

>=20
> Also, is there any desire to nest such emulation?

Nesting should just work, right? Since we only accept hypercalls from =
PR=3D0 and guests run in PR=3D1, we get the sc interrupt in the l1 guest =
by then.

The only issue I'm aware of that completely breaks when using nested KVM =
on PPC is the MSR_IR !=3D MSR_DR logic. We fetch the instruction we got =
an interrupt on for certain interrupts in the world switch handler by =
keeping MSR_IR=3D0, but setting MSR_DR=3D1. And KVM speeds up MSR_DR !=3D =
MSR_IR by mapping both of them lazily in a special address space. So if =
you access the same page as instruction and as data, you get an invalid =
result.

Alex

^ permalink raw reply

* Re: [PATCH 26/26] KVM: PPC: Add Documentation about PV interface
From: Milton Miller @ 2010-06-28  7:18 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, Avi Kivity, kvm-ppc, KVM list
In-Reply-To: <07C9A4B8-881A-438C-AA99-AEC23887C6B8@suse.de>

On Sun Jun 27 around 19:33:52 EST 2010 Alexander Graf wrote:
> Am 27.06.2010 um 10:14 schrieb Avi Kivity <avi at redhat.com>:
> > On 06/26/2010 02:25 AM, Alexander Graf wrote:

> > > +
> > > +PPC hypercalls
> > > +==============
> > > +
> > > +The only viable ways to reliably get from guest context to host  
> > > context are:
> > > +
> > > +    1) Call an invalid instruction
> > > +    2) Call the "sc" instruction with a parameter to "sc"
> > > +    3) Call the "sc" instruction with parameters in GPRs
> > > +
> > > +Method 1 is always a bad idea. Invalid instructions can be  
> > > replaced later on
> > > +by valid instructions, rendering the interface broken.
> > > +
> > > +Method 2 also has downfalls. If the parameter to "sc" is != 0 the  
> > > spec is
> > > +rather unclear if the sc is targeted directly for the hypervisor  
> > > or the
> > > +supervisor. It would also require that we read the syscall issuing  
> > > instruction
> > > +every time a syscall is issued, slowing down guest syscalls.
> > > +

It goes to the hypervisor, and it would require the hypervisor to
return to the supervisor, but I believe it just returns to the user with
permission denied.

> > > +Method 3 is what KVM uses. We pass magic constants  
> > > (KVM_SC_MAGIC_R3 and
> > > +KVM_SC_MAGIC_R4) in r3 and r4 respectively. If a syscall  
> > > instruction with these
> > > +magic values arrives from the guest's kernel mode, we take the  
> > > syscall as a
> > > +hypercall.
> > >
> >
> > Is there any chance a normal syscall will have those values in r3  
> > and r4?
> 
> r3 is the syscall number. So as long as the guest doesn't reuse that  
> value, we're safe. Since in general syscall numbers are not randomly  
> scattered throughout the number range, we should be ok here.
> 

No, r0 has the system call number.  Registers 3 and 4 are the first
2 args in c abi (or first 64 bit arg in 32 bit c abi), but the linux
syscall abi special.  (In addition, it returns success or failure in
cr0).

> >
> > If so, maybe it's better to use pc as they key for hypercalls.  Let  
> > the guest designate one instruction address as the hypercall call  
> > point; kvm can easily check it and reflect it back to the guest if  
> > it doesn't match.
> >
> 
> You mean the guest would tell the hv where the hypercall lies? That  
> would require a hypercall, no? Defining it statically is tricky. I  
> want to PV'nize osx using a kernel module later, so I don't have  
> control over the physical layout.
> 
> > Is it valid and useful to issue sc from privileged mode anyway,  
> > except for calling the hypervisor?
> 
> Same as a syscall on x86 really. The kernel can and does issue  
> syscalls within itself.
> 
> 

I don't believe we support the kernel actually doing a syscall to itself
anymore, at least on powerpc.  The callers call the underlying system
call function, or kernel_thread.

That said, I would suggest we allocate a syscall number for this, as it
would document the usage.  (In additon to 0..nr_syscalls - 1 we have
0x1ebe in use).

Also, is there any desire to nest such emulation?

milton

^ permalink raw reply

* Re: of-flash: Unable to ioremap() both 128MB NOR flashes on 32-bit system with 2GB+ RAM
From: Milton Miller @ 2010-06-28  7:18 UTC (permalink / raw)
  To: Kyle Moffett; +Cc: linux-mtd, linuxppc-dev
In-Reply-To: <AANLkTikMtZsg5ameUY9dnZoEqV-hfa7tkfGIO1hBtDq1@mail.gmail.com>

On Fri Jun 25 around 14:01:51 EST 2010 Kyle Moffett wrote:
> Oops... put the old linuxppc list on the CC, sorry!
> 
> On Thu, Jun 24, 2010 at 23:45, Kyle Moffett <kyle at moffetthome.net> wrote:
> > Hello,
> >
> > I've got a new P2020 (32bit mpc85xx family) board I'm working on a
> > port for that includes 2 NOR flashes (128MB each) and a removable
> > SO-RDIMM of 2GB or 4GB.  Unfortunately when I configure both flashes
> > in the device-tree off my elbc, Linux is completely unable to access
> > the second one because it attempts to ioremap() the entire virtual
> > address space of both FLASH chips.
> >
> > Even with only one flash chip enabled, there's a bit of a noticeable
> > performance degradation because the mapping consumes almost all of my
> > available vmalloc space and forces bounce-buffering for all my
> > HIGHMEM.
> >
> > It looks like the "of-flash" driver currently requires that the whole
> > chip be mapped in the kernel at once.  I would much rather have a 50%
> > performance penalty on flash accesses (which are already very slow)
> > and regain most of the vmap space.
> >
> > So the question is, is there a way to convince the MTD layer to
> > iomap() only what it needs to access to do reads and writes?  If not,
> > what changes would need to be made to MTD and/or "of-flash" to create
> > such functionality?
> >
> > Cheers,
> > Kyle Moffett
> >

I believe the MTD layer would be happy, but it is beyond the scope of
the physmap_of driver.  A look at drivers/mtd/maps/pcmciamtd.c shows
the concept of paging in a section of flash, although it has the advantage
of hardware to move the window instead of calling ioremap or swapping
translations.  Another example is drivers/mtd/maps/pci.c, also with
hardware assist.

milton

^ permalink raw reply

* Re: [PATCH 18/26] KVM: PPC: KVM PV guest stubs
From: Alexander Graf @ 2010-06-28  6:33 UTC (permalink / raw)
  To: Matt Evans; +Cc: linuxppc-dev, KVM list, kvm-ppc
In-Reply-To: <4C282794.1040209@ozlabs.org>


On 28.06.2010, at 06:39, Matt Evans wrote:

> Howdy Alex!
>=20
> Alexander Graf wrote:
>> We will soon start and replace instructions from the text section =
with
>> other, paravirtualized versions. To ease the readability of those =
patches
>> I split out the generic looping and magic page mapping code out.
>>=20
>> This patch still only contains stubs. But at least it loops through =
the
>> text section :).
>>=20
>> Signed-off-by: Alexander Graf <agraf@suse.de>
>> ---
>> arch/powerpc/kernel/kvm.c |   59 =
+++++++++++++++++++++++++++++++++++++++++++++
>> 1 files changed, 59 insertions(+), 0 deletions(-)
>>=20
>> diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
>> index 2d8dd73..d873bc6 100644
>> --- a/arch/powerpc/kernel/kvm.c
>> +++ b/arch/powerpc/kernel/kvm.c
>> @@ -32,3 +32,62 @@
>> #define KVM_MAGIC_PAGE		(-4096L)
>> #define magic_var(x) KVM_MAGIC_PAGE + offsetof(struct =
kvm_vcpu_arch_shared, x)
>>=20
>> +static bool kvm_patching_worked =3D true;
>> +
>> +static void kvm_map_magic_page(void *data)
>> +{
>> +	kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE,
>> +		       KVM_MAGIC_PAGE,  /* Physical Address */
>> +		       KVM_MAGIC_PAGE); /* Effective Address */
>> +}
>> +
>> +static void kvm_check_ins(u32 *inst)
>> +{
>> +	u32 _inst =3D *inst;
>> +	u32 inst_no_rt =3D _inst & ~KVM_MASK_RT;
>> +	u32 inst_rt =3D _inst & KVM_MASK_RT;
>> +
>> +	switch (inst_no_rt) {
>> +	}
>> +
>> +	switch (_inst) {
>> +	}
>> +
>> +	flush_icache_range((ulong)inst, (ulong)inst + 4);
>> +}
>> +
>> +static void kvm_use_magic_page(void)
>> +{
>> +	u32 *p;
>> +	u32 *start, *end;
>> +
>> +	/* Tell the host to map the magic page to -4096 on all CPUs */
>> +
>> +	on_each_cpu(kvm_map_magic_page, NULL, 1);
>> +
>> +	/* Now loop through all code and find instructions */
>> +
>> +	start =3D (void*)_stext;
>> +	end =3D (void*)_etext;
>> +
>> +	for (p =3D start; p < end; p++)
>> +		kvm_check_ins(p);
>> +}
>=20
> Could you do something similar in module_finalize() to patch loaded =
modules' .text sections?

I could, but do we need it? I objdump -d | grep'ed all my modules and =
didn't find any need to do so.


Alex

^ permalink raw reply

* Re: [PATCH v3 2/2] powerpc: add support for new hcall H_BEST_ENERGY
From: Michael Neuling @ 2010-06-28  6:11 UTC (permalink / raw)
  To: svaidy; +Cc: Paul Mackerras, Anton Blanchard, linuxppc-dev
In-Reply-To: <20100628053252.GA12751@dirshya.in.ibm.com>



In message <20100628053252.GA12751@dirshya.in.ibm.com> you wrote:
> * Michael Neuling <mikey@neuling.org> [2010-06-28 11:44:31]:
> 
> > Vaidy,
> > 
> > > 	Create sysfs interface to export data from H_BEST_ENERGY hcall
> > > 	that can be used by administrative tools on supported pseries
> > > 	platforms for energy management	optimizations.
> > > 
> > > 	/sys/device/system/cpu/pseries_(de)activate_hint_list and
> > > 	/sys/device/system/cpu/cpuN/pseries_(de)activate_hint will provide
> > > 	hints for activation and deactivation of cpus respectively.
> > > 
> > > 	Added new driver module
> > > 		arch/powerpc/platforms/pseries/pseries_energy.c
> > > 	under new config option CONFIG_PSERIES_ENERGY
> > 
> > Can you provide some documentation on how to use these hints and what
> > format they are provided from sysfs.  Looks like two separate interfaces
> > two the same thing (one a comma sep list and 1 per cpu, why do need
> > both?).  What is the difference between activate and deactivate, with
> > out me having to read PAPR :-) ??
> 
> Hi Mike,
> 
> Thanks for reviewing this patch series.  Sure, I can provide
> additional information.
> 
> These hints are abstract number given by the hypervisor based on
> the extended knowledge the hypervisor has regarding the current system
> topology and resource mappings.
> 
> The activate and the deactivate part is for the two distinct
> operations that we could do for energy savings.  When we have more
> capacity than required, we could deactivate few core to save energy.
> The choice of the core to deactivate will be based on
> /sys/devices/system/cpu/deactivate_hint_list.  The comma separated
> list of cpus (cores) will be the preferred choice.
> 
> Once the system has few deactivated cores, based on workload demand we
> may have to activate them to meet the demand.  In that case the
> /sys/devices/system/cpu/activate_hint_list will be used to prefer the
> core in-order among the deactivated cores.
> 
> In simple terms, activate_hint_list will be null until we deactivate
> few cores.  Then we could look at the corresponding list for
> activation or deactivation.

Can you put these details in the code and in the check-in comments.

> 
> Regarding your second point, there is a reason for both a list and
> per-cpu interface.  The list gives us a system wide list of cores in
> one shot for userspace to base their decision.  This will be the
> preferred interface for most cases.  On the other hand, per-cpu file
> /sys/device/system/cpu/cpuN/pseries_(de)activate_hint provide more
> information since it exports the hint value as such.
> 
> The idea is that the list interface will be used to get a suggested
> list of cores to manage, while the per-cpu value can be used to
> further get fine grain information on a per-core bases from the
> hypervisor.  This allows Linux to have access to all information that
> the hypervisor has offered through this hcall interface.

OK, I didn't realise that they contained different info.  Just more
reasons that this interface needs better documentation :-)

Overall, I'm mostly happy with the interface.  It's pretty light weight.

> > Other comments below.
> > 
> > > 
> > > Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
> > > ---
> > >  arch/powerpc/include/asm/hvcall.h               |    3 
> > >  arch/powerpc/platforms/pseries/Kconfig          |   10 +
> > >  arch/powerpc/platforms/pseries/Makefile         |    1 
> > >  arch/powerpc/platforms/pseries/pseries_energy.c |  258 +++++++++++++++++
++++
> > ++
> > >  4 files changed, 271 insertions(+), 1 deletions(-)
> > >  create mode 100644 arch/powerpc/platforms/pseries/pseries_energy.c
> > > 
> > > diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm
/hvc
> > all.h
> > > index 5119b7d..34b66e0 100644
> > > --- a/arch/powerpc/include/asm/hvcall.h
> > > +++ b/arch/powerpc/include/asm/hvcall.h
> > > @@ -231,7 +231,8 @@
> > >  #define H_GET_EM_PARMS		0x2B8
> > >  #define H_SET_MPP		0x2D0
> > >  #define H_GET_MPP		0x2D4
> > > -#define MAX_HCALL_OPCODE	H_GET_MPP
> > > +#define H_BEST_ENERGY		0x2F4
> > > +#define MAX_HCALL_OPCODE	H_BEST_ENERGY
> > > 
> > >  #ifndef __ASSEMBLY__
> > > 
> > > diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platfo
rms/
> > pseries/Kconfig
> > > index c667f0f..b3dd108 100644
> > > --- a/arch/powerpc/platforms/pseries/Kconfig
> > > +++ b/arch/powerpc/platforms/pseries/Kconfig
> > > @@ -33,6 +33,16 @@ config PSERIES_MSI
> > >         depends on PCI_MSI && EEH
> > >         default y
> > > 
> > > +config PSERIES_ENERGY
> > 
> > Probably need a less generic name.  PSERIES_ENERGY_MANAGEMENT?
> > PSERIES_ENERGY_HOTPLUG_HINTS?
> 
> PSERIES_ENERGY_MANAGEMENT may be good but too long for a config
> option.
> 
> The idea is to collect all energy management functions in this module
> as and when new features are introduced in the pseries platform.  This
> hcall interface is the first to be included, but going forward in
> future I do not propose to have different modules for other energy
> management related features.
> 
> The name is specific enough for IBM pseries platform and energy
> management functions and enablements.  Having less generic name below
> this level will make it difficult to add all varieties of energy
> management functions in future.

OK, I thought this might be the case but you never said.  Please say
something like "This adds CONFIG_PSERIES_ENERGY which will be used for
future power saving code" or some such.

> 
> > > +	tristate "pseries energy management capabilities driver"
> > > +	depends on PPC_PSERIES
> > > +	default y
> > > +	help
> > > +	  Provides interface to platform energy management capabilities
> > > +	  on supported PSERIES platforms.
> > > +	  Provides: /sys/devices/system/cpu/pseries_(de)activation_hint_list
> > > +	  and /sys/devices/system/cpu/cpuN/pseries_(de)activation_hint
> > > +
> > >  config SCANLOG
> > >  	tristate "Scanlog dump interface"
> > >  	depends on RTAS_PROC && PPC_PSERIES
> > > diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platf
orms
> > /pseries/Makefile
> > > index 3dbef30..32ae72e 100644
> > > --- a/arch/powerpc/platforms/pseries/Makefile
> > > +++ b/arch/powerpc/platforms/pseries/Makefile
> > > @@ -16,6 +16,7 @@ obj-$(CONFIG_EEH)	+= eeh.o eeh_cache.o eeh_driver
.o eeh_e
> > vent.o eeh_sysfs.o
> > >  obj-$(CONFIG_KEXEC)	+= kexec.o
> > >  obj-$(CONFIG_PCI)	+= pci.o pci_dlpar.o
> > >  obj-$(CONFIG_PSERIES_MSI)	+= msi.o
> > > +obj-$(CONFIG_PSERIES_ENERGY)	+= pseries_energy.o
> > > 
> > >  obj-$(CONFIG_HOTPLUG_CPU)	+= hotplug-cpu.o
> > >  obj-$(CONFIG_MEMORY_HOTPLUG)	+= hotplug-memory.o
> > > diff --git a/arch/powerpc/platforms/pseries/pseries_energy.c b/arch/power
pc/p
> > latforms/pseries/pseries_energy.c
> > > new file mode 100644
> > > index 0000000..9a936b1
> > > --- /dev/null
> > > +++ b/arch/powerpc/platforms/pseries/pseries_energy.c
> > > @@ -0,0 +1,258 @@
> > > +/*
> > > + * POWER platform energy management driver
> > > + * Copyright (C) 2010 IBM Corporation
> > > + *
> > > + * This program is free software; you can redistribute it and/or
> > > + * modify it under the terms of the GNU General Public License
> > > + * version 2 as published by the Free Software Foundation.
> > > + *
> > > + * This pseries platform device driver provides access to
> > > + * platform energy management capabilities.
> > > + */
> > > +
> > > +#include <linux/module.h>
> > > +#include <linux/types.h>
> > > +#include <linux/errno.h>
> > > +#include <linux/init.h>
> > > +#include <linux/seq_file.h>
> > > +#include <linux/sysdev.h>
> > > +#include <linux/cpu.h>
> > > +#include <linux/of.h>
> > > +#include <asm/cputhreads.h>
> > > +#include <asm/page.h>
> > > +#include <asm/hvcall.h>
> > > +
> > > +
> > > +#define MODULE_VERS "1.0"
> > 
> > Argh, I hate module versions... but this one is less of an issue since
> > it doesn't seem to be being used anyway :-)
> > 
> > > +#define MODULE_NAME "pseries_energy"
> > 
> > Unused too.
> 
> Yes.  This keep the module template complete.  No overhead as yet :)
> We will certainly need the MODULE_VERS in future if we add/change
> sysfs interfaces.

Argh, change sysfs interfaces?!?  We don't even have the first one and
now we are going to change it?  :-)

> > > +
> > > +/* Helper Routines to convert between drc_index to cpu numbers */
> > > +
> > > +static u32 cpu_to_drc_index(int cpu)
> > > +{
> > > +	struct device_node *dn = NULL;
> > > +	const int *indexes;
> > > +	int i;
> > > +	dn = of_find_node_by_path("/cpus");
> > > +	if (dn == NULL)
> > > +		goto err;
> > 
> > Humm, I not sure this is really needed.  If you don't have /cpus you are
> > probably not going to boot.
> 
> Good suggestion.  I could add all these checks in module_init.   I was
> think if any of the functions being called is allocating memory and in
> case they fail, we need to abort.
> 
> I just reviewed and look like of_find_node_by_path() will not sleep or
> allocate any memory.  So if it succeeds once in module_init(), then it
> will never fail! 
> 
> 
> > > +	indexes = of_get_property(dn, "ibm,drc-indexes", NULL);
> > > +	if (indexes == NULL)
> > > +		goto err;
> > 
> > These checks should probably be moved to module init rather than /sfs
> > read time.  If they fail, don't load the module and print a warning.  
> > 
> > These HCALLS and device-tree entire aren't going to be dynamic.
> 
> Agreed.  Only cause of runtime failure is OOM.  If none of these
> allocate memory, moving these checks once at module_init() will be
> a good optimization.

Cool, thanks.  

I also noticed you are doing this per cpu, so it's got the potential to
really suck on our big machines.

> But still I am wondering if it is worth the risk.  These are not in
> hot paths and these are just quick null comparisons.  Also in most other
> call sites, we do check for return values.

I found a /proc file the other day that took 60sec to read on a big
machine.  The file wasn't a hot path, but 60sec was way too long.

Things get bad quickly on our big machines.  I'd prefer it in module
init.

> 
> > > +	/* Convert logical cpu number to core number */
> > > +	i = cpu_core_of_thread(cpu);
> > > +	/*
> > > +	 * The first element indexes[0] is the number of drc_indexes
> > > +	 * returned in the list.  Hence i+1 will get the drc_index
> > > +	 * corresponding to core number i.
> > > +	 */
> > > +	WARN_ON(i > indexes[0]);
> > > +	return indexes[i + 1];
> > > +err:
> > > +	printk(KERN_WARNING "cpu_to_drc_index(%d) failed", cpu);
> > > +	return 0;
> > > +}
> > > +
> > > +static int drc_index_to_cpu(u32 drc_index)
> > > +{
> > > +	struct device_node *dn = NULL;
> > > +	const int *indexes;
> > > +	int i, cpu;
> > > +	dn = of_find_node_by_path("/cpus");
> > > +	if (dn == NULL)
> > > +		goto err;
> > 
> > same here
> 
> agreed, comments mentioned above.
> 
> > > +	indexes = of_get_property(dn, "ibm,drc-indexes", NULL);
> > > +	if (indexes == NULL)
> > > +		goto err;
> > > +	/*
> > > +	 * First element in the array is the number of drc_indexes
> > > +	 * returned.  Search through the list to find the matching
> > > +	 * drc_index and get the core number
> > > +	 */
> > > +	for (i = 0; i < indexes[0]; i++) {
> > > +		if (indexes[i + 1] == drc_index)
> > > +			break;
> > > +	}
> > > +	/* Convert core number to logical cpu number */
> > > +	cpu = cpu_first_thread_of_core(i);
> > > +	return cpu;
> > > +err:
> > > +	printk(KERN_WARNING "drc_index_to_cpu(%d) failed", drc_index);
> > > +	return 0;
> > > +}
> > > +
> > > +/*
> > > + * pseries hypervisor call H_BEST_ENERGY provides hints to OS on
> > > + * preferred logical cpus to activate or deactivate for optimized
> > > + * energy consumption.
> > > + */
> > > +
> > > +#define FLAGS_MODE1	0x004E200000080E01
> > > +#define FLAGS_MODE2	0x004E200000080401
> > > +#define FLAGS_ACTIVATE  0x100
> > > +
> > > +static ssize_t get_best_energy_list(char *page, int activate)
> > > +{
> > > +	int rc, cnt, i, cpu;
> > > +	unsigned long retbuf[PLPAR_HCALL9_BUFSIZE];
> > > +	unsigned long flags = 0;
> > > +	u32 *buf_page;
> > > +	char *s = page;
> > > +
> > > +	buf_page = (u32 *) get_zeroed_page(GFP_KERNEL);
> > > +	if (!buf_page)
> > > +		return -ENOMEM;
> > > +
> > > +	flags = FLAGS_MODE1;
> > > +	if (activate)
> > > +		flags |= FLAGS_ACTIVATE;
> > > +
> > > +	rc = plpar_hcall9(H_BEST_ENERGY, retbuf, flags, 0, __pa(buf_page),
> > > +				0, 0, 0, 0, 0, 0);
> > > +	if (rc != H_SUCCESS) {
> > > +		free_page((unsigned long) buf_page);
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	cnt = retbuf[0];
> > > +	for (i = 0; i < cnt; i++) {
> > > +		cpu = drc_index_to_cpu(buf_page[2*i+1]);
> > > +		if ((cpu_online(cpu) && !activate) ||
> > > +		    (!cpu_online(cpu) && activate))
> > > +			s += sprintf(s, "%d,", cpu);
> > > +	}
> > > +	if (s > page) { /* Something to show */
> > > +		s--; /* Suppress last comma */
> > > +		s += sprintf(s, "\n");
> > > +	}
> > > +
> > > +	free_page((unsigned long) buf_page);
> > > +	return s-page;
> > > +}
> > > +
> > > +static ssize_t get_best_energy_data(struct sys_device *dev,
> > > +					char *page, int activate)
> > > +{
> > > +	int rc;
> > > +	unsigned long retbuf[PLPAR_HCALL9_BUFSIZE];
> > > +	unsigned long flags = 0;
> > > +
> > > +	flags = FLAGS_MODE2;
> > > +	if (activate)
> > > +		flags |= FLAGS_ACTIVATE;
> > > +
> > > +	rc = plpar_hcall9(H_BEST_ENERGY, retbuf, flags,
> > > +				cpu_to_drc_index(dev->id),
> > > +				0, 0, 0, 0, 0, 0, 0);
> > > +
> > > +	if (rc != H_SUCCESS)
> > > +		return -EINVAL;
> > > +
> > > +	return sprintf(page, "%lu\n", retbuf[1] >> 32);
> > > +}
> > > +
> > > +/* Wrapper functions */
> > > +
> > > +static ssize_t cpu_activate_hint_list_show(struct sysdev_class *class,
> > > +			struct sysdev_class_attribute *attr, char *page)
> > > +{
> > > +	return get_best_energy_list(page, 1);
> > > +}
> > > +
> > > +static ssize_t cpu_deactivate_hint_list_show(struct sysdev_class *class,
> > > +			struct sysdev_class_attribute *attr, char *page)
> > > +{
> > > +	return get_best_energy_list(page, 0);
> > > +}
> > > +
> > > +static ssize_t percpu_activate_hint_show(struct sys_device *dev,
> > > +			struct sysdev_attribute *attr, char *page)
> > > +{
> > > +	return get_best_energy_data(dev, page, 1);
> > > +}
> > > +
> > > +static ssize_t percpu_deactivate_hint_show(struct sys_device *dev,
> > > +			struct sysdev_attribute *attr, char *page)
> > > +{
> > > +	return get_best_energy_data(dev, page, 0);
> > > +}
> > > +
> > > +/*
> > > + * Create sysfs interface:
> > > + * /sys/devices/system/cpu/pseries_activate_hint_list
> > > + * /sys/devices/system/cpu/pseries_deactivate_hint_list
> > > + * 	Comma separated list of cpus to activate or deactivate
> > > + * /sys/devices/system/cpu/cpuN/pseries_activate_hint
> > > + * /sys/devices/system/cpu/cpuN/pseries_deactivate_hint
> > > + *	Per-cpu value of the hint
> > 
> > Do we really need both interfaces?  Seems like awk could generate one
> > from the other in userspace?
> 
> Yes, it is possible, but will not scale.  Generating a list from set
> of per-cpu values is possible but will be too much overhead to build
> the list.  Having the list interface and deleting the per-cpu ones
> will reduce information available from hypervisor.  Having both the
> interface is a good balance between amount of information exported and
> quick access to a consolidated view.

OK.

"pseries_activate_hint" doesn't say what it's hinting about.  Can we
change the name to add it being a power save hint?  We might need to add
other hints later.  Performance hint etc..

> 
> > > + */
> > > +
> > > +struct sysdev_class_attribute attr_cpu_activate_hint_list =
> > > +		_SYSDEV_CLASS_ATTR(pseries_activate_hint_list, 0444,
> > > +		cpu_activate_hint_list_show, NULL);
> > > +
> > > +struct sysdev_class_attribute attr_cpu_deactivate_hint_list =
> > > +		_SYSDEV_CLASS_ATTR(pseries_deactivate_hint_list, 0444,
> > > +		cpu_deactivate_hint_list_show, NULL);
> > > +
> > > +struct sysdev_attribute attr_percpu_activate_hint =
> > > +		_SYSDEV_ATTR(pseries_activate_hint, 0444,
> > > +		percpu_activate_hint_show, NULL);
> > > +
> > > +struct sysdev_attribute attr_percpu_deactivate_hint =
> > > +		_SYSDEV_ATTR(pseries_deactivate_hint, 0444,
> > > +		percpu_deactivate_hint_show, NULL);
> > 
> > > +
> > > +static int __init pseries_energy_init(void)
> > > +{
> > > +	int cpu, err;
> > > +	struct sys_device *cpu_sys_dev;
> > > +
> > > +	/* Create the sysfs files */
> > > +	err = sysfs_create_file(&cpu_sysdev_class.kset.kobj,
> > > +				&attr_cpu_activate_hint_list.attr);
> > > +	if (!err)
> > > +		err = sysfs_create_file(&cpu_sysdev_class.kset.kobj,
> > > +				&attr_cpu_deactivate_hint_list.attr);
> > > +
> > > +	for_each_possible_cpu(cpu) {
> > > +		cpu_sys_dev = get_cpu_sysdev(cpu);
> > > +		err = sysfs_create_file(&cpu_sys_dev->kobj,
> > > +				&attr_percpu_activate_hint.attr);
> > > +		if (err)
> > > +			break;
> > > +		err = sysfs_create_file(&cpu_sys_dev->kobj,
> > > +				&attr_percpu_deactivate_hint.attr);
> > > +		if (err)
> > > +			break;
> > > +	}
> > > +	return err;
> > > +
> > > +}
> > > +
> > > +static void __exit pseries_energy_cleanup(void)
> > > +{
> > > +	int cpu;
> > > +	struct sys_device *cpu_sys_dev;
> > > +
> > > +	/* Remove the sysfs files */
> > > +	sysfs_remove_file(&cpu_sysdev_class.kset.kobj,
> > > +				&attr_cpu_activate_hint_list.attr);
> > > +
> > > +	sysfs_remove_file(&cpu_sysdev_class.kset.kobj,
> > > +				&attr_cpu_deactivate_hint_list.attr);
> > > +
> > > +	for_each_possible_cpu(cpu) {
> > > +		cpu_sys_dev = get_cpu_sysdev(cpu);
> > > +		sysfs_remove_file(&cpu_sys_dev->kobj,
> > > +				&attr_percpu_activate_hint.attr);
> > > +		sysfs_remove_file(&cpu_sys_dev->kobj,
> > > +				&attr_percpu_deactivate_hint.attr);
> > > +	}
> > > +}
> > > +
> > > +module_init(pseries_energy_init);
> > > +module_exit(pseries_energy_cleanup);
> > > +MODULE_DESCRIPTION("Driver for pseries platform energy management");
> > 
> > Needs a less generic description. 
> 
> Explained above.
> 
> Thanks,
> Vaidy
> 

^ permalink raw reply

* Re: [PATCH v3 2/2] powerpc: add support for new hcall H_BEST_ENERGY
From: Vaidyanathan Srinivasan @ 2010-06-28  5:33 UTC (permalink / raw)
  To: Michael Neuling; +Cc: Paul Mackerras, Anton Blanchard, linuxppc-dev
In-Reply-To: <25049.1277689471@neuling.org>

* Michael Neuling <mikey@neuling.org> [2010-06-28 11:44:31]:

> Vaidy,
> 
> > 	Create sysfs interface to export data from H_BEST_ENERGY hcall
> > 	that can be used by administrative tools on supported pseries
> > 	platforms for energy management	optimizations.
> > 
> > 	/sys/device/system/cpu/pseries_(de)activate_hint_list and
> > 	/sys/device/system/cpu/cpuN/pseries_(de)activate_hint will provide
> > 	hints for activation and deactivation of cpus respectively.
> > 
> > 	Added new driver module
> > 		arch/powerpc/platforms/pseries/pseries_energy.c
> > 	under new config option CONFIG_PSERIES_ENERGY
> 
> Can you provide some documentation on how to use these hints and what
> format they are provided from sysfs.  Looks like two separate interfaces
> two the same thing (one a comma sep list and 1 per cpu, why do need
> both?).  What is the difference between activate and deactivate, with
> out me having to read PAPR :-) ??

Hi Mike,

Thanks for reviewing this patch series.  Sure, I can provide
additional information.

These hints are abstract number given by the hypervisor based on
the extended knowledge the hypervisor has regarding the current system
topology and resource mappings.

The activate and the deactivate part is for the two distinct
operations that we could do for energy savings.  When we have more
capacity than required, we could deactivate few core to save energy.
The choice of the core to deactivate will be based on
/sys/devices/system/cpu/deactivate_hint_list.  The comma separated
list of cpus (cores) will be the preferred choice.

Once the system has few deactivated cores, based on workload demand we
may have to activate them to meet the demand.  In that case the
/sys/devices/system/cpu/activate_hint_list will be used to prefer the
core in-order among the deactivated cores.

In simple terms, activate_hint_list will be null until we deactivate
few cores.  Then we could look at the corresponding list for
activation or deactivation.

Regarding your second point, there is a reason for both a list and
per-cpu interface.  The list gives us a system wide list of cores in
one shot for userspace to base their decision.  This will be the
preferred interface for most cases.  On the other hand, per-cpu file
/sys/device/system/cpu/cpuN/pseries_(de)activate_hint provide more
information since it exports the hint value as such.

The idea is that the list interface will be used to get a suggested
list of cores to manage, while the per-cpu value can be used to
further get fine grain information on a per-core bases from the
hypervisor.  This allows Linux to have access to all information that
the hypervisor has offered through this hcall interface.

> Other comments below.
> 
> > 
> > Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
> > ---
> >  arch/powerpc/include/asm/hvcall.h               |    3 
> >  arch/powerpc/platforms/pseries/Kconfig          |   10 +
> >  arch/powerpc/platforms/pseries/Makefile         |    1 
> >  arch/powerpc/platforms/pseries/pseries_energy.c |  258 +++++++++++++++++++++
> ++
> >  4 files changed, 271 insertions(+), 1 deletions(-)
> >  create mode 100644 arch/powerpc/platforms/pseries/pseries_energy.c
> > 
> > diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvc
> all.h
> > index 5119b7d..34b66e0 100644
> > --- a/arch/powerpc/include/asm/hvcall.h
> > +++ b/arch/powerpc/include/asm/hvcall.h
> > @@ -231,7 +231,8 @@
> >  #define H_GET_EM_PARMS		0x2B8
> >  #define H_SET_MPP		0x2D0
> >  #define H_GET_MPP		0x2D4
> > -#define MAX_HCALL_OPCODE	H_GET_MPP
> > +#define H_BEST_ENERGY		0x2F4
> > +#define MAX_HCALL_OPCODE	H_BEST_ENERGY
> > 
> >  #ifndef __ASSEMBLY__
> > 
> > diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/
> pseries/Kconfig
> > index c667f0f..b3dd108 100644
> > --- a/arch/powerpc/platforms/pseries/Kconfig
> > +++ b/arch/powerpc/platforms/pseries/Kconfig
> > @@ -33,6 +33,16 @@ config PSERIES_MSI
> >         depends on PCI_MSI && EEH
> >         default y
> > 
> > +config PSERIES_ENERGY
> 
> Probably need a less generic name.  PSERIES_ENERGY_MANAGEMENT?
> PSERIES_ENERGY_HOTPLUG_HINTS?

PSERIES_ENERGY_MANAGEMENT may be good but too long for a config
option.

The idea is to collect all energy management functions in this module
as and when new features are introduced in the pseries platform.  This
hcall interface is the first to be included, but going forward in
future I do not propose to have different modules for other energy
management related features.

The name is specific enough for IBM pseries platform and energy
management functions and enablements.  Having less generic name below
this level will make it difficult to add all varieties of energy
management functions in future.

> > +	tristate "pseries energy management capabilities driver"
> > +	depends on PPC_PSERIES
> > +	default y
> > +	help
> > +	  Provides interface to platform energy management capabilities
> > +	  on supported PSERIES platforms.
> > +	  Provides: /sys/devices/system/cpu/pseries_(de)activation_hint_list
> > +	  and /sys/devices/system/cpu/cpuN/pseries_(de)activation_hint
> > +
> >  config SCANLOG
> >  	tristate "Scanlog dump interface"
> >  	depends on RTAS_PROC && PPC_PSERIES
> > diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms
> /pseries/Makefile
> > index 3dbef30..32ae72e 100644
> > --- a/arch/powerpc/platforms/pseries/Makefile
> > +++ b/arch/powerpc/platforms/pseries/Makefile
> > @@ -16,6 +16,7 @@ obj-$(CONFIG_EEH)	+= eeh.o eeh_cache.o eeh_driver.o eeh_e
> vent.o eeh_sysfs.o
> >  obj-$(CONFIG_KEXEC)	+= kexec.o
> >  obj-$(CONFIG_PCI)	+= pci.o pci_dlpar.o
> >  obj-$(CONFIG_PSERIES_MSI)	+= msi.o
> > +obj-$(CONFIG_PSERIES_ENERGY)	+= pseries_energy.o
> > 
> >  obj-$(CONFIG_HOTPLUG_CPU)	+= hotplug-cpu.o
> >  obj-$(CONFIG_MEMORY_HOTPLUG)	+= hotplug-memory.o
> > diff --git a/arch/powerpc/platforms/pseries/pseries_energy.c b/arch/powerpc/p
> latforms/pseries/pseries_energy.c
> > new file mode 100644
> > index 0000000..9a936b1
> > --- /dev/null
> > +++ b/arch/powerpc/platforms/pseries/pseries_energy.c
> > @@ -0,0 +1,258 @@
> > +/*
> > + * POWER platform energy management driver
> > + * Copyright (C) 2010 IBM Corporation
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License
> > + * version 2 as published by the Free Software Foundation.
> > + *
> > + * This pseries platform device driver provides access to
> > + * platform energy management capabilities.
> > + */
> > +
> > +#include <linux/module.h>
> > +#include <linux/types.h>
> > +#include <linux/errno.h>
> > +#include <linux/init.h>
> > +#include <linux/seq_file.h>
> > +#include <linux/sysdev.h>
> > +#include <linux/cpu.h>
> > +#include <linux/of.h>
> > +#include <asm/cputhreads.h>
> > +#include <asm/page.h>
> > +#include <asm/hvcall.h>
> > +
> > +
> > +#define MODULE_VERS "1.0"
> 
> Argh, I hate module versions... but this one is less of an issue since
> it doesn't seem to be being used anyway :-)
> 
> > +#define MODULE_NAME "pseries_energy"
> 
> Unused too.

Yes.  This keep the module template complete.  No overhead as yet :)
We will certainly need the MODULE_VERS in future if we add/change
sysfs interfaces.

> > +
> > +/* Helper Routines to convert between drc_index to cpu numbers */
> > +
> > +static u32 cpu_to_drc_index(int cpu)
> > +{
> > +	struct device_node *dn = NULL;
> > +	const int *indexes;
> > +	int i;
> > +	dn = of_find_node_by_path("/cpus");
> > +	if (dn == NULL)
> > +		goto err;
> 
> Humm, I not sure this is really needed.  If you don't have /cpus you are
> probably not going to boot.

Good suggestion.  I could add all these checks in module_init.   I was
think if any of the functions being called is allocating memory and in
case they fail, we need to abort.

I just reviewed and look like of_find_node_by_path() will not sleep or
allocate any memory.  So if it succeeds once in module_init(), then it
will never fail! 

> > +	indexes = of_get_property(dn, "ibm,drc-indexes", NULL);
> > +	if (indexes == NULL)
> > +		goto err;
> 
> These checks should probably be moved to module init rather than /sfs
> read time.  If they fail, don't load the module and print a warning.  
> 
> These HCALLS and device-tree entire aren't going to be dynamic.

Agreed.  Only cause of runtime failure is OOM.  If none of these
allocate memory, moving these checks once at module_init() will be
a good optimization.

But still I am wondering if it is worth the risk.  These are not in
hot paths and these are just quick null comparisons.  Also in most other
call sites, we do check for return values.

> > +	/* Convert logical cpu number to core number */
> > +	i = cpu_core_of_thread(cpu);
> > +	/*
> > +	 * The first element indexes[0] is the number of drc_indexes
> > +	 * returned in the list.  Hence i+1 will get the drc_index
> > +	 * corresponding to core number i.
> > +	 */
> > +	WARN_ON(i > indexes[0]);
> > +	return indexes[i + 1];
> > +err:
> > +	printk(KERN_WARNING "cpu_to_drc_index(%d) failed", cpu);
> > +	return 0;
> > +}
> > +
> > +static int drc_index_to_cpu(u32 drc_index)
> > +{
> > +	struct device_node *dn = NULL;
> > +	const int *indexes;
> > +	int i, cpu;
> > +	dn = of_find_node_by_path("/cpus");
> > +	if (dn == NULL)
> > +		goto err;
> 
> same here

agreed, comments mentioned above.

> > +	indexes = of_get_property(dn, "ibm,drc-indexes", NULL);
> > +	if (indexes == NULL)
> > +		goto err;
> > +	/*
> > +	 * First element in the array is the number of drc_indexes
> > +	 * returned.  Search through the list to find the matching
> > +	 * drc_index and get the core number
> > +	 */
> > +	for (i = 0; i < indexes[0]; i++) {
> > +		if (indexes[i + 1] == drc_index)
> > +			break;
> > +	}
> > +	/* Convert core number to logical cpu number */
> > +	cpu = cpu_first_thread_of_core(i);
> > +	return cpu;
> > +err:
> > +	printk(KERN_WARNING "drc_index_to_cpu(%d) failed", drc_index);
> > +	return 0;
> > +}
> > +
> > +/*
> > + * pseries hypervisor call H_BEST_ENERGY provides hints to OS on
> > + * preferred logical cpus to activate or deactivate for optimized
> > + * energy consumption.
> > + */
> > +
> > +#define FLAGS_MODE1	0x004E200000080E01
> > +#define FLAGS_MODE2	0x004E200000080401
> > +#define FLAGS_ACTIVATE  0x100
> > +
> > +static ssize_t get_best_energy_list(char *page, int activate)
> > +{
> > +	int rc, cnt, i, cpu;
> > +	unsigned long retbuf[PLPAR_HCALL9_BUFSIZE];
> > +	unsigned long flags = 0;
> > +	u32 *buf_page;
> > +	char *s = page;
> > +
> > +	buf_page = (u32 *) get_zeroed_page(GFP_KERNEL);
> > +	if (!buf_page)
> > +		return -ENOMEM;
> > +
> > +	flags = FLAGS_MODE1;
> > +	if (activate)
> > +		flags |= FLAGS_ACTIVATE;
> > +
> > +	rc = plpar_hcall9(H_BEST_ENERGY, retbuf, flags, 0, __pa(buf_page),
> > +				0, 0, 0, 0, 0, 0);
> > +	if (rc != H_SUCCESS) {
> > +		free_page((unsigned long) buf_page);
> > +		return -EINVAL;
> > +	}
> > +
> > +	cnt = retbuf[0];
> > +	for (i = 0; i < cnt; i++) {
> > +		cpu = drc_index_to_cpu(buf_page[2*i+1]);
> > +		if ((cpu_online(cpu) && !activate) ||
> > +		    (!cpu_online(cpu) && activate))
> > +			s += sprintf(s, "%d,", cpu);
> > +	}
> > +	if (s > page) { /* Something to show */
> > +		s--; /* Suppress last comma */
> > +		s += sprintf(s, "\n");
> > +	}
> > +
> > +	free_page((unsigned long) buf_page);
> > +	return s-page;
> > +}
> > +
> > +static ssize_t get_best_energy_data(struct sys_device *dev,
> > +					char *page, int activate)
> > +{
> > +	int rc;
> > +	unsigned long retbuf[PLPAR_HCALL9_BUFSIZE];
> > +	unsigned long flags = 0;
> > +
> > +	flags = FLAGS_MODE2;
> > +	if (activate)
> > +		flags |= FLAGS_ACTIVATE;
> > +
> > +	rc = plpar_hcall9(H_BEST_ENERGY, retbuf, flags,
> > +				cpu_to_drc_index(dev->id),
> > +				0, 0, 0, 0, 0, 0, 0);
> > +
> > +	if (rc != H_SUCCESS)
> > +		return -EINVAL;
> > +
> > +	return sprintf(page, "%lu\n", retbuf[1] >> 32);
> > +}
> > +
> > +/* Wrapper functions */
> > +
> > +static ssize_t cpu_activate_hint_list_show(struct sysdev_class *class,
> > +			struct sysdev_class_attribute *attr, char *page)
> > +{
> > +	return get_best_energy_list(page, 1);
> > +}
> > +
> > +static ssize_t cpu_deactivate_hint_list_show(struct sysdev_class *class,
> > +			struct sysdev_class_attribute *attr, char *page)
> > +{
> > +	return get_best_energy_list(page, 0);
> > +}
> > +
> > +static ssize_t percpu_activate_hint_show(struct sys_device *dev,
> > +			struct sysdev_attribute *attr, char *page)
> > +{
> > +	return get_best_energy_data(dev, page, 1);
> > +}
> > +
> > +static ssize_t percpu_deactivate_hint_show(struct sys_device *dev,
> > +			struct sysdev_attribute *attr, char *page)
> > +{
> > +	return get_best_energy_data(dev, page, 0);
> > +}
> > +
> > +/*
> > + * Create sysfs interface:
> > + * /sys/devices/system/cpu/pseries_activate_hint_list
> > + * /sys/devices/system/cpu/pseries_deactivate_hint_list
> > + * 	Comma separated list of cpus to activate or deactivate
> > + * /sys/devices/system/cpu/cpuN/pseries_activate_hint
> > + * /sys/devices/system/cpu/cpuN/pseries_deactivate_hint
> > + *	Per-cpu value of the hint
> 
> Do we really need both interfaces?  Seems like awk could generate one
> from the other in userspace?

Yes, it is possible, but will not scale.  Generating a list from set
of per-cpu values is possible but will be too much overhead to build
the list.  Having the list interface and deleting the per-cpu ones
will reduce information available from hypervisor.  Having both the
interface is a good balance between amount of information exported and
quick access to a consolidated view.

> > + */
> > +
> > +struct sysdev_class_attribute attr_cpu_activate_hint_list =
> > +		_SYSDEV_CLASS_ATTR(pseries_activate_hint_list, 0444,
> > +		cpu_activate_hint_list_show, NULL);
> > +
> > +struct sysdev_class_attribute attr_cpu_deactivate_hint_list =
> > +		_SYSDEV_CLASS_ATTR(pseries_deactivate_hint_list, 0444,
> > +		cpu_deactivate_hint_list_show, NULL);
> > +
> > +struct sysdev_attribute attr_percpu_activate_hint =
> > +		_SYSDEV_ATTR(pseries_activate_hint, 0444,
> > +		percpu_activate_hint_show, NULL);
> > +
> > +struct sysdev_attribute attr_percpu_deactivate_hint =
> > +		_SYSDEV_ATTR(pseries_deactivate_hint, 0444,
> > +		percpu_deactivate_hint_show, NULL);
> 
> > +
> > +static int __init pseries_energy_init(void)
> > +{
> > +	int cpu, err;
> > +	struct sys_device *cpu_sys_dev;
> > +
> > +	/* Create the sysfs files */
> > +	err = sysfs_create_file(&cpu_sysdev_class.kset.kobj,
> > +				&attr_cpu_activate_hint_list.attr);
> > +	if (!err)
> > +		err = sysfs_create_file(&cpu_sysdev_class.kset.kobj,
> > +				&attr_cpu_deactivate_hint_list.attr);
> > +
> > +	for_each_possible_cpu(cpu) {
> > +		cpu_sys_dev = get_cpu_sysdev(cpu);
> > +		err = sysfs_create_file(&cpu_sys_dev->kobj,
> > +				&attr_percpu_activate_hint.attr);
> > +		if (err)
> > +			break;
> > +		err = sysfs_create_file(&cpu_sys_dev->kobj,
> > +				&attr_percpu_deactivate_hint.attr);
> > +		if (err)
> > +			break;
> > +	}
> > +	return err;
> > +
> > +}
> > +
> > +static void __exit pseries_energy_cleanup(void)
> > +{
> > +	int cpu;
> > +	struct sys_device *cpu_sys_dev;
> > +
> > +	/* Remove the sysfs files */
> > +	sysfs_remove_file(&cpu_sysdev_class.kset.kobj,
> > +				&attr_cpu_activate_hint_list.attr);
> > +
> > +	sysfs_remove_file(&cpu_sysdev_class.kset.kobj,
> > +				&attr_cpu_deactivate_hint_list.attr);
> > +
> > +	for_each_possible_cpu(cpu) {
> > +		cpu_sys_dev = get_cpu_sysdev(cpu);
> > +		sysfs_remove_file(&cpu_sys_dev->kobj,
> > +				&attr_percpu_activate_hint.attr);
> > +		sysfs_remove_file(&cpu_sys_dev->kobj,
> > +				&attr_percpu_deactivate_hint.attr);
> > +	}
> > +}
> > +
> > +module_init(pseries_energy_init);
> > +module_exit(pseries_energy_cleanup);
> > +MODULE_DESCRIPTION("Driver for pseries platform energy management");
> 
> Needs a less generic description. 

Explained above.

Thanks,
Vaidy

^ permalink raw reply

* Re: [PATCH v3 2/2] powerpc: add support for new hcall H_BEST_ENERGY
From: Michael Neuling @ 2010-06-28  1:44 UTC (permalink / raw)
  To: Vaidyanathan Srinivasan; +Cc: Paul Mackerras, Anton Blanchard, linuxppc-dev
In-Reply-To: <20100623060415.4957.24478.stgit@drishya.in.ibm.com>

Vaidy,

> 	Create sysfs interface to export data from H_BEST_ENERGY hcall
> 	that can be used by administrative tools on supported pseries
> 	platforms for energy management	optimizations.
> 
> 	/sys/device/system/cpu/pseries_(de)activate_hint_list and
> 	/sys/device/system/cpu/cpuN/pseries_(de)activate_hint will provide
> 	hints for activation and deactivation of cpus respectively.
> 
> 	Added new driver module
> 		arch/powerpc/platforms/pseries/pseries_energy.c
> 	under new config option CONFIG_PSERIES_ENERGY

Can you provide some documentation on how to use these hints and what
format they are provided from sysfs.  Looks like two separate interfaces
two the same thing (one a comma sep list and 1 per cpu, why do need
both?).  What is the difference between activate and deactivate, with
out me having to read PAPR :-) ??

Other comments below.

> 
> Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/hvcall.h               |    3 
>  arch/powerpc/platforms/pseries/Kconfig          |   10 +
>  arch/powerpc/platforms/pseries/Makefile         |    1 
>  arch/powerpc/platforms/pseries/pseries_energy.c |  258 +++++++++++++++++++++
++
>  4 files changed, 271 insertions(+), 1 deletions(-)
>  create mode 100644 arch/powerpc/platforms/pseries/pseries_energy.c
> 
> diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvc
all.h
> index 5119b7d..34b66e0 100644
> --- a/arch/powerpc/include/asm/hvcall.h
> +++ b/arch/powerpc/include/asm/hvcall.h
> @@ -231,7 +231,8 @@
>  #define H_GET_EM_PARMS		0x2B8
>  #define H_SET_MPP		0x2D0
>  #define H_GET_MPP		0x2D4
> -#define MAX_HCALL_OPCODE	H_GET_MPP
> +#define H_BEST_ENERGY		0x2F4
> +#define MAX_HCALL_OPCODE	H_BEST_ENERGY
> 
>  #ifndef __ASSEMBLY__
> 
> diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/
pseries/Kconfig
> index c667f0f..b3dd108 100644
> --- a/arch/powerpc/platforms/pseries/Kconfig
> +++ b/arch/powerpc/platforms/pseries/Kconfig
> @@ -33,6 +33,16 @@ config PSERIES_MSI
>         depends on PCI_MSI && EEH
>         default y
> 
> +config PSERIES_ENERGY

Probably need a less generic name.  PSERIES_ENERGY_MANAGEMENT?
PSERIES_ENERGY_HOTPLUG_HINTS?

> +	tristate "pseries energy management capabilities driver"
> +	depends on PPC_PSERIES
> +	default y
> +	help
> +	  Provides interface to platform energy management capabilities
> +	  on supported PSERIES platforms.
> +	  Provides: /sys/devices/system/cpu/pseries_(de)activation_hint_list
> +	  and /sys/devices/system/cpu/cpuN/pseries_(de)activation_hint
> +
>  config SCANLOG
>  	tristate "Scanlog dump interface"
>  	depends on RTAS_PROC && PPC_PSERIES
> diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms
/pseries/Makefile
> index 3dbef30..32ae72e 100644
> --- a/arch/powerpc/platforms/pseries/Makefile
> +++ b/arch/powerpc/platforms/pseries/Makefile
> @@ -16,6 +16,7 @@ obj-$(CONFIG_EEH)	+= eeh.o eeh_cache.o eeh_driver.o eeh_e
vent.o eeh_sysfs.o
>  obj-$(CONFIG_KEXEC)	+= kexec.o
>  obj-$(CONFIG_PCI)	+= pci.o pci_dlpar.o
>  obj-$(CONFIG_PSERIES_MSI)	+= msi.o
> +obj-$(CONFIG_PSERIES_ENERGY)	+= pseries_energy.o
> 
>  obj-$(CONFIG_HOTPLUG_CPU)	+= hotplug-cpu.o
>  obj-$(CONFIG_MEMORY_HOTPLUG)	+= hotplug-memory.o
> diff --git a/arch/powerpc/platforms/pseries/pseries_energy.c b/arch/powerpc/p
latforms/pseries/pseries_energy.c
> new file mode 100644
> index 0000000..9a936b1
> --- /dev/null
> +++ b/arch/powerpc/platforms/pseries/pseries_energy.c
> @@ -0,0 +1,258 @@
> +/*
> + * POWER platform energy management driver
> + * Copyright (C) 2010 IBM Corporation
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * version 2 as published by the Free Software Foundation.
> + *
> + * This pseries platform device driver provides access to
> + * platform energy management capabilities.
> + */
> +
> +#include <linux/module.h>
> +#include <linux/types.h>
> +#include <linux/errno.h>
> +#include <linux/init.h>
> +#include <linux/seq_file.h>
> +#include <linux/sysdev.h>
> +#include <linux/cpu.h>
> +#include <linux/of.h>
> +#include <asm/cputhreads.h>
> +#include <asm/page.h>
> +#include <asm/hvcall.h>
> +
> +
> +#define MODULE_VERS "1.0"

Argh, I hate module versions... but this one is less of an issue since
it doesn't seem to be being used anyway :-)

> +#define MODULE_NAME "pseries_energy"

Unused too.

> +
> +/* Helper Routines to convert between drc_index to cpu numbers */
> +
> +static u32 cpu_to_drc_index(int cpu)
> +{
> +	struct device_node *dn = NULL;
> +	const int *indexes;
> +	int i;
> +	dn = of_find_node_by_path("/cpus");
> +	if (dn == NULL)
> +		goto err;

Humm, I not sure this is really needed.  If you don't have /cpus you are
probably not going to boot.

> +	indexes = of_get_property(dn, "ibm,drc-indexes", NULL);
> +	if (indexes == NULL)
> +		goto err;

These checks should probably be moved to module init rather than /sfs
read time.  If they fail, don't load the module and print a warning.  

These HCALLS and device-tree entire aren't going to be dynamic.

> +	/* Convert logical cpu number to core number */
> +	i = cpu_core_of_thread(cpu);
> +	/*
> +	 * The first element indexes[0] is the number of drc_indexes
> +	 * returned in the list.  Hence i+1 will get the drc_index
> +	 * corresponding to core number i.
> +	 */
> +	WARN_ON(i > indexes[0]);
> +	return indexes[i + 1];
> +err:
> +	printk(KERN_WARNING "cpu_to_drc_index(%d) failed", cpu);
> +	return 0;
> +}
> +
> +static int drc_index_to_cpu(u32 drc_index)
> +{
> +	struct device_node *dn = NULL;
> +	const int *indexes;
> +	int i, cpu;
> +	dn = of_find_node_by_path("/cpus");
> +	if (dn == NULL)
> +		goto err;

same here

> +	indexes = of_get_property(dn, "ibm,drc-indexes", NULL);
> +	if (indexes == NULL)
> +		goto err;
> +	/*
> +	 * First element in the array is the number of drc_indexes
> +	 * returned.  Search through the list to find the matching
> +	 * drc_index and get the core number
> +	 */
> +	for (i = 0; i < indexes[0]; i++) {
> +		if (indexes[i + 1] == drc_index)
> +			break;
> +	}
> +	/* Convert core number to logical cpu number */
> +	cpu = cpu_first_thread_of_core(i);
> +	return cpu;
> +err:
> +	printk(KERN_WARNING "drc_index_to_cpu(%d) failed", drc_index);
> +	return 0;
> +}
> +
> +/*
> + * pseries hypervisor call H_BEST_ENERGY provides hints to OS on
> + * preferred logical cpus to activate or deactivate for optimized
> + * energy consumption.
> + */
> +
> +#define FLAGS_MODE1	0x004E200000080E01
> +#define FLAGS_MODE2	0x004E200000080401
> +#define FLAGS_ACTIVATE  0x100
> +
> +static ssize_t get_best_energy_list(char *page, int activate)
> +{
> +	int rc, cnt, i, cpu;
> +	unsigned long retbuf[PLPAR_HCALL9_BUFSIZE];
> +	unsigned long flags = 0;
> +	u32 *buf_page;
> +	char *s = page;
> +
> +	buf_page = (u32 *) get_zeroed_page(GFP_KERNEL);
> +	if (!buf_page)
> +		return -ENOMEM;
> +
> +	flags = FLAGS_MODE1;
> +	if (activate)
> +		flags |= FLAGS_ACTIVATE;
> +
> +	rc = plpar_hcall9(H_BEST_ENERGY, retbuf, flags, 0, __pa(buf_page),
> +				0, 0, 0, 0, 0, 0);
> +	if (rc != H_SUCCESS) {
> +		free_page((unsigned long) buf_page);
> +		return -EINVAL;
> +	}
> +
> +	cnt = retbuf[0];
> +	for (i = 0; i < cnt; i++) {
> +		cpu = drc_index_to_cpu(buf_page[2*i+1]);
> +		if ((cpu_online(cpu) && !activate) ||
> +		    (!cpu_online(cpu) && activate))
> +			s += sprintf(s, "%d,", cpu);
> +	}
> +	if (s > page) { /* Something to show */
> +		s--; /* Suppress last comma */
> +		s += sprintf(s, "\n");
> +	}
> +
> +	free_page((unsigned long) buf_page);
> +	return s-page;
> +}
> +
> +static ssize_t get_best_energy_data(struct sys_device *dev,
> +					char *page, int activate)
> +{
> +	int rc;
> +	unsigned long retbuf[PLPAR_HCALL9_BUFSIZE];
> +	unsigned long flags = 0;
> +
> +	flags = FLAGS_MODE2;
> +	if (activate)
> +		flags |= FLAGS_ACTIVATE;
> +
> +	rc = plpar_hcall9(H_BEST_ENERGY, retbuf, flags,
> +				cpu_to_drc_index(dev->id),
> +				0, 0, 0, 0, 0, 0, 0);
> +
> +	if (rc != H_SUCCESS)
> +		return -EINVAL;
> +
> +	return sprintf(page, "%lu\n", retbuf[1] >> 32);
> +}
> +
> +/* Wrapper functions */
> +
> +static ssize_t cpu_activate_hint_list_show(struct sysdev_class *class,
> +			struct sysdev_class_attribute *attr, char *page)
> +{
> +	return get_best_energy_list(page, 1);
> +}
> +
> +static ssize_t cpu_deactivate_hint_list_show(struct sysdev_class *class,
> +			struct sysdev_class_attribute *attr, char *page)
> +{
> +	return get_best_energy_list(page, 0);
> +}
> +
> +static ssize_t percpu_activate_hint_show(struct sys_device *dev,
> +			struct sysdev_attribute *attr, char *page)
> +{
> +	return get_best_energy_data(dev, page, 1);
> +}
> +
> +static ssize_t percpu_deactivate_hint_show(struct sys_device *dev,
> +			struct sysdev_attribute *attr, char *page)
> +{
> +	return get_best_energy_data(dev, page, 0);
> +}
> +
> +/*
> + * Create sysfs interface:
> + * /sys/devices/system/cpu/pseries_activate_hint_list
> + * /sys/devices/system/cpu/pseries_deactivate_hint_list
> + * 	Comma separated list of cpus to activate or deactivate
> + * /sys/devices/system/cpu/cpuN/pseries_activate_hint
> + * /sys/devices/system/cpu/cpuN/pseries_deactivate_hint
> + *	Per-cpu value of the hint

Do we really need both interfaces?  Seems like awk could generate one
from the other in userspace?

> + */
> +
> +struct sysdev_class_attribute attr_cpu_activate_hint_list =
> +		_SYSDEV_CLASS_ATTR(pseries_activate_hint_list, 0444,
> +		cpu_activate_hint_list_show, NULL);
> +
> +struct sysdev_class_attribute attr_cpu_deactivate_hint_list =
> +		_SYSDEV_CLASS_ATTR(pseries_deactivate_hint_list, 0444,
> +		cpu_deactivate_hint_list_show, NULL);
> +
> +struct sysdev_attribute attr_percpu_activate_hint =
> +		_SYSDEV_ATTR(pseries_activate_hint, 0444,
> +		percpu_activate_hint_show, NULL);
> +
> +struct sysdev_attribute attr_percpu_deactivate_hint =
> +		_SYSDEV_ATTR(pseries_deactivate_hint, 0444,
> +		percpu_deactivate_hint_show, NULL);

> +
> +static int __init pseries_energy_init(void)
> +{
> +	int cpu, err;
> +	struct sys_device *cpu_sys_dev;
> +
> +	/* Create the sysfs files */
> +	err = sysfs_create_file(&cpu_sysdev_class.kset.kobj,
> +				&attr_cpu_activate_hint_list.attr);
> +	if (!err)
> +		err = sysfs_create_file(&cpu_sysdev_class.kset.kobj,
> +				&attr_cpu_deactivate_hint_list.attr);
> +
> +	for_each_possible_cpu(cpu) {
> +		cpu_sys_dev = get_cpu_sysdev(cpu);
> +		err = sysfs_create_file(&cpu_sys_dev->kobj,
> +				&attr_percpu_activate_hint.attr);
> +		if (err)
> +			break;
> +		err = sysfs_create_file(&cpu_sys_dev->kobj,
> +				&attr_percpu_deactivate_hint.attr);
> +		if (err)
> +			break;
> +	}
> +	return err;
> +
> +}
> +
> +static void __exit pseries_energy_cleanup(void)
> +{
> +	int cpu;
> +	struct sys_device *cpu_sys_dev;
> +
> +	/* Remove the sysfs files */
> +	sysfs_remove_file(&cpu_sysdev_class.kset.kobj,
> +				&attr_cpu_activate_hint_list.attr);
> +
> +	sysfs_remove_file(&cpu_sysdev_class.kset.kobj,
> +				&attr_cpu_deactivate_hint_list.attr);
> +
> +	for_each_possible_cpu(cpu) {
> +		cpu_sys_dev = get_cpu_sysdev(cpu);
> +		sysfs_remove_file(&cpu_sys_dev->kobj,
> +				&attr_percpu_activate_hint.attr);
> +		sysfs_remove_file(&cpu_sys_dev->kobj,
> +				&attr_percpu_deactivate_hint.attr);
> +	}
> +}
> +
> +module_init(pseries_energy_init);
> +module_exit(pseries_energy_cleanup);
> +MODULE_DESCRIPTION("Driver for pseries platform energy management");

Needs a less generic description. 

> +MODULE_AUTHOR("Vaidyanathan Srinivasan");
> +MODULE_LICENSE("GPL");
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 

^ permalink raw reply

* section .data..init_task
From: Sean MacLennan @ 2010-06-28  4:59 UTC (permalink / raw)
  To: linuxppc-dev

Anybody else seeing these messages?

ppc_4xxFP-ld: .tmp_vmlinux1: section .data..init_task lma 0xc0374000 overlaps previous sections
ppc_4xxFP-ld: .tmp_vmlinux2: section .data..init_task lma 0xc03a2000 overlaps previous sections
ppc_4xxFP-ld: vmlinux: section .data..init_task lma 0xc03a2000 overlaps previous sections

Or does anybody know what they mean? They started showing up in 2.6.35.

Very easy to reproduce, so don't hesitate to ask for more info.

Cheers,
   Sean

^ permalink raw reply

* Re: [PATCH 18/26] KVM: PPC: KVM PV guest stubs
From: Matt Evans @ 2010-06-28  4:39 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc
In-Reply-To: <1277508314-915-19-git-send-email-agraf@suse.de>

Howdy Alex!

Alexander Graf wrote:
> We will soon start and replace instructions from the text section with
> other, paravirtualized versions. To ease the readability of those patches
> I split out the generic looping and magic page mapping code out.
> 
> This patch still only contains stubs. But at least it loops through the
> text section :).
> 
> Signed-off-by: Alexander Graf <agraf@suse.de>
> ---
>  arch/powerpc/kernel/kvm.c |   59 +++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 59 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
> index 2d8dd73..d873bc6 100644
> --- a/arch/powerpc/kernel/kvm.c
> +++ b/arch/powerpc/kernel/kvm.c
> @@ -32,3 +32,62 @@
>  #define KVM_MAGIC_PAGE		(-4096L)
>  #define magic_var(x) KVM_MAGIC_PAGE + offsetof(struct kvm_vcpu_arch_shared, x)
>  
> +static bool kvm_patching_worked = true;
> +
> +static void kvm_map_magic_page(void *data)
> +{
> +	kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE,
> +		       KVM_MAGIC_PAGE,  /* Physical Address */
> +		       KVM_MAGIC_PAGE); /* Effective Address */
> +}
> +
> +static void kvm_check_ins(u32 *inst)
> +{
> +	u32 _inst = *inst;
> +	u32 inst_no_rt = _inst & ~KVM_MASK_RT;
> +	u32 inst_rt = _inst & KVM_MASK_RT;
> +
> +	switch (inst_no_rt) {
> +	}
> +
> +	switch (_inst) {
> +	}
> +
> +	flush_icache_range((ulong)inst, (ulong)inst + 4);
> +}
> +
> +static void kvm_use_magic_page(void)
> +{
> +	u32 *p;
> +	u32 *start, *end;
> +
> +	/* Tell the host to map the magic page to -4096 on all CPUs */
> +
> +	on_each_cpu(kvm_map_magic_page, NULL, 1);
> +
> +	/* Now loop through all code and find instructions */
> +
> +	start = (void*)_stext;
> +	end = (void*)_etext;
> +
> +	for (p = start; p < end; p++)
> +		kvm_check_ins(p);
> +}

Could you do something similar in module_finalize() to patch loaded modules' .text sections?

> +
> +static int __init kvm_guest_init(void)
> +{
> +	char *p;
> +
> +	if (!kvm_para_available())
> +		return 0;
> +
> +	if (kvm_para_has_feature(KVM_FEATURE_MAGIC_PAGE))
> +		kvm_use_magic_page();
> +
> +	printk(KERN_INFO "KVM: Live patching for a fast VM %s\n",
> +			 kvm_patching_worked ? "worked" : "failed");
> +
> +	return 0;
> +}
> +
> +postcore_initcall(kvm_guest_init);

Cheers,


Matt

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox