* Re: [PATCH 3/3] edac/85xx: Enable the EDAC PCI err driver by device_initcall
From: Scott Wood @ 2012-10-01 19:11 UTC (permalink / raw)
To: Chunhe Lan
Cc: Wood Scott-B07421, Gala Kumar-B11780,
linuxppc-dev@lists.ozlabs.org
In-Reply-To: <506708BE.1090905@freescale.com>
On 09/29/2012 09:42:06 AM, Chunhe Lan wrote:
> On 09/28/2012 01:35 PM, Scott Wood wrote:
>> On 09/27/2012 05:33:26 PM, Kumar Gala wrote:
>>>=20
>>> On Sep 27, 2012, at 4:51 PM, Scott Wood wrote:
>>>=20
>>> > On 09/27/2012 04:45:08 PM, Gala Kumar-B11780 wrote:
>>> >> On Sep 27, 2012, at 11:09 AM, Scott Wood wrote:
>>> >>> On 09/27/2012 02:02:03 PM, Chunhe Lan wrote:
>>> >>>> Original process of call:
>>> >>>> The mpc85xx_pci_err_probe function completes to been =20
>>> registered
>>> >>>> and enabled of EDAC PCI err driver at the latter time =20
>>> stage of
>>> >>>> kernel boot in the mpc85xx_edac.c.
>>> >>>> Current process of call:
>>> >>>> The mpc85xx_pci_err_probe function completes to been =20
>>> registered
>>> >>>> and enabled of EDAC PCI err driver at the first time =20
>>> stage of
>>> >>>> kernel boot in the fsl_pci.c.
>>> >>>> So in this case the following error messages appear in the =20
>>> boot log:
>>> >>>> PCI: Probing PCI hardware
>>> >>>> pci 0000:00:00.0: ignoring class b20 (doesn't match header =20
>>> type 01)
>>> >>>> PCIE error(s) detected
>>> >>>> PCIE ERR_DR register: 0x00020000
>>> >>>> PCIE ERR_CAP_STAT register: 0x80000001
>>> >>>> PCIE ERR_CAP_R0 register: 0x00000800
>>> >>>> PCIE ERR_CAP_R1 register: 0x00000000
>>> >>>> PCIE ERR_CAP_R2 register: 0x00000000
>>> >>>> PCIE ERR_CAP_R3 register: 0x00000000
>>> >>>> Because the EDAC PCI err driver is registered and enabled =20
>>> earlier than
>>> >>>> original point of call. But at this point of time, PCI =20
>>> hardware is not
>>> >>>> probed and initialized, and it is in unknowable state.
>>> >>>> So, move enable function into mpc85xx_pci_err_en which is =20
>>> called at the
>>> >>>> middle time stage of kernel boot and after PCI hardware is =20
>>> probed and
>>> >>>> initialized by device_initcall in the fsl_pci.c.
>>> >>>> Signed-off-by: Chunhe Lan <Chunhe.Lan@freescale.com>
>>> >>>> ---
>>> >>>> arch/powerpc/sysdev/fsl_pci.c | 12 ++++++++++
>>> >>>> arch/powerpc/sysdev/fsl_pci.h | 5 ++++
>>> >>>> drivers/edac/mpc85xx_edac.c | 47 =20
>>> ++++++++++++++++++++++++++++------------
>>> >>>> 3 files changed, 50 insertions(+), 14 deletions(-)
>>> >>>> diff --git a/arch/powerpc/sysdev/fsl_pci.c =20
>>> b/arch/powerpc/sysdev/fsl_pci.c
>>> >>>> index 3d6f4d8..a591965 100644
>>> >>>> --- a/arch/powerpc/sysdev/fsl_pci.c
>>> >>>> +++ b/arch/powerpc/sysdev/fsl_pci.c
>>> >>>> @@ -904,4 +904,16 @@ static int __init fsl_pci_init(void)
>>> >>>> return platform_driver_register(&fsl_pci_driver);
>>> >>>> }
>>> >>>> arch_initcall(fsl_pci_init);
>>> >>>> +
>>> >>>> +static int __init fsl_pci_err_en(void)
>>> >>>> +{
>>> >>>> + struct device_node *np;
>>> >>>> +
>>> >>>> + for_each_node_by_type(np, "pci")
>>> >>>> + if (of_match_node(pci_ids, np))
>>> >>>> + mpc85xx_pci_err_en(np);
>>> >>>> +
>>> >>>> + return 0;
>>> >>>> +}
>>> >>>> +device_initcall(fsl_pci_err_en);
>>> >>>
>>> >>> Why can't you call this from the normal PCIe controller init, =20
>>> instead of searching for the node independently?
>>> >> Don't we have this now with mpc85xx_pci_err_probe() ??
>>> >
>>> > What do you mean by "this"?
>>>=20
>>> I'm saying don't we replace fsl_pci_err_en() with =20
>>> mpc85xx_pci_err_probe()...
>>>=20
>>> I need to look at this more, but not clear why mpc85xx_pci_err_en() =20
>>> can just be part of mpc85xx_pci_err_probe()
>>=20
>> OK, I was confused -- I thought the point was to make it happen =20
>> earlier, not later. The changelog is not clear at all.
>>=20
>> Don't we want to be able to capture errors that happen during PCI =20
>> driver initialization, though?
> Yes.
> When PCI controller is probing slot which if the any device does =20
> not have on, happens the invalid address errors.
> Then the edac driver prints the many error massages. This makes =20
> sense as normal, but this is ugly.
> So, move the enable edac driver to later, and only detect the =20
> errors of the follow-up pci operations.
Is there any way to identify whether the error is the result of such a =20
probe? If nothing else, you could identify whether a probe is taking =20
place -- better than not having any error detection during driver init.
-Scott=
^ permalink raw reply
* Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory
From: Ni zhan Chen @ 2012-10-01 23:45 UTC (permalink / raw)
To: Yasuaki Ishimatsu
Cc: linux-s390, linux-ia64, wency, linux-acpi, linux-sh, len.brown,
x86, linux-kernel, cmetcalf, linux-mm, paulus, minchan.kim,
kosaki.motohiro, rientjes, sparclinux, cl, linuxppc-dev, akpm,
liuj97
In-Reply-To: <50691FBA.5080107@jp.fujitsu.com>
On 10/01/2012 12:44 PM, Yasuaki Ishimatsu wrote:
> Hi Chen,
>
> 2012/09/29 17:19, Ni zhan Chen wrote:
>> On 09/05/2012 05:25 PM, wency@cn.fujitsu.com wrote:
>>> From: Wen Congyang <wency@cn.fujitsu.com>
>>>
>>> This patch series aims to support physical memory hot-remove.
>>>
>>> The patches can free/remove the following things:
>>>
>>> - acpi_memory_info : [RFC PATCH 4/19]
>>> - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 8/19]
>>> - iomem_resource : [RFC PATCH 9/19]
>>> - mem_section and related sysfs files : [RFC PATCH 10-11,
>>> 13-16/19]
>>> - page table of removed memory : [RFC PATCH 12/19]
>>> - node and related sysfs files : [RFC PATCH 18-19/19]
>>>
>>> If you find lack of function for physical memory hot-remove, please
>>> let me
>>> know.
>>>
>>> How to test this patchset?
>>> 1. apply this patchset and build the kernel. MEMORY_HOTPLUG,
>>> MEMORY_HOTREMOVE,
>>> ACPI_HOTPLUG_MEMORY must be selected.
>>> 2. load the module acpi_memhotplug
>>
>> Hi Yasuaki,
>>
>> where is the acpi_memhotplug module?
>
> If you build acpi_memhotplug as module, it is created under
> /lib/modules/<kernel-version>/driver/acpi/ directory. It depends
> on config ACPI_HOTPLUG_MEMORY. The confing is [*], it becomes built-in
> function. So you don't need to care about it.
> Thanks,
> Yasuaki Ishimatsu
Hi Yasuaki,
I build the kernel, MEMORY_HOTPLUG, MEMORY_HOTREMOVE,
ACPI_HOTPLUG_MEMORY are seleted as [*], but I can't find PNP0C80:XX
under the directory /sys/bus/acpi/devices/.
[root@localhost ~]# ls /sys/bus/acpi/devices/
device:00 device:07 device:0e device:15 device:1c device:23
device:2a LNXCPU:00 LNXCPU:07 PNP0501:00 PNP0C02:00 PNP0C0F:02
PNP0C14:01
device:01 device:08 device:0f device:16 device:1d device:24
device:2b LNXCPU:01 LNXPWRBN:00 PNP0800:00 PNP0C02:01 PNP0C0F:03
PNP0C31:00
device:02 device:09 device:10 device:17 device:1e device:25
device:2c LNXCPU:02 LNXSYSTM:00 PNP0A08:00 PNP0C02:02 PNP0C0F:04
device:03 device:0a device:11 device:18 device:1f device:26
device:2d LNXCPU:03 PNP0000:00 PNP0B00:00 PNP0C04:00 PNP0C0F:05
device:04 device:0b device:12 device:19 device:20 device:27
device:2e LNXCPU:04 PNP0100:00 PNP0C01:00 PNP0C0C:00 PNP0C0F:06
device:05 device:0c device:13 device:1a device:21 device:28
device:2f LNXCPU:05 PNP0103:00 PNP0C01:01 PNP0C0F:00 PNP0C0F:07
device:06 device:0d device:14 device:1b device:22 device:29
INT3F0D:00 LNXCPU:06 PNP0200:00 PNP0C01:02 PNP0C0F:01 PNP0C14:00
then what I miss ? thanks.
>
>>
>>> 3. hotplug the memory device(it depends on your hardware)
>>> You will see the memory device under the directory
>>> /sys/bus/acpi/devices/.
>>> Its name is PNP0C80:XX.
>>> 4. online/offline pages provided by this memory device
>>> You can write online/offline to
>>> /sys/devices/system/memory/memoryX/state to
>>> online/offline pages provided by this memory device
>>> 5. hotremove the memory device
>>> You can hotremove the memory device by the hardware, or writing
>>> 1 to
>>> /sys/bus/acpi/devices/PNP0C80:XX/eject.
>>>
>>> Note: if the memory provided by the memory device is used by the
>>> kernel, it
>>> can't be offlined. It is not a bug.
>>>
>>> Known problems:
>>> 1. memory can't be offlined when CONFIG_MEMCG is selected.
>>> For example: there is a memory device on node 1. The address range
>>> is [1G, 1.5G). You will find 4 new directories memory8, memory9,
>>> memory10,
>>> and memory11 under the directory /sys/devices/system/memory/.
>>> If CONFIG_MEMCG is selected, we will allocate memory to store
>>> page cgroup
>>> when we online pages. When we online memory8, the memory stored
>>> page cgroup
>>> is not provided by this memory device. But when we online
>>> memory9, the memory
>>> stored page cgroup may be provided by memory8. So we can't
>>> offline memory8
>>> now. We should offline the memory in the reversed order.
>>> When the memory device is hotremoved, we will auto offline
>>> memory provided
>>> by this memory device. But we don't know which memory is onlined
>>> first, so
>>> offlining memory may fail. In such case, you should offline the
>>> memory by
>>> hand before hotremoving the memory device.
>>> 2. hotremoving memory device may cause kernel panicked
>>> This bug will be fixed by Liu Jiang's patch:
>>> https://lkml.org/lkml/2012/7/3/1
>>>
>>> change log of v9:
>>> [RFC PATCH v9 8/21]
>>> * add a lock to protect the list map_entries
>>> * add an indicator to firmware_map_entry to remember whether the
>>> memory
>>> is allocated from bootmem
>>> [RFC PATCH v9 10/21]
>>> * change the macro to inline function
>>> [RFC PATCH v9 19/21]
>>> * don't offline the node if the cpu on the node is onlined
>>> [RFC PATCH v9 21/21]
>>> * create new patch: auto offline page_cgroup when onlining
>>> memory block
>>> failed
>>>
>>> change log of v8:
>>> [RFC PATCH v8 17/20]
>>> * Fix problems when one node's range include the other nodes
>>> [RFC PATCH v8 18/20]
>>> * fix building error when CONFIG_MEMORY_HOTPLUG_SPARSE or
>>> CONFIG_HUGETLBFS
>>> is not defined.
>>> [RFC PATCH v8 19/20]
>>> * don't offline node when some memory sections are not removed
>>> [RFC PATCH v8 20/20]
>>> * create new patch: clear hwpoisoned flag when onlining pages
>>>
>>> change log of v7:
>>> [RFC PATCH v7 4/19]
>>> * do not continue if acpi_memory_device_remove_memory() fails.
>>> [RFC PATCH v7 15/19]
>>> * handle usemap in register_page_bootmem_info_section() too.
>>>
>>> change log of v6:
>>> [RFC PATCH v6 12/19]
>>> * fix building error on other archtitectures than x86
>>>
>>> [RFC PATCH v6 15-16/19]
>>> * fix building error on other archtitectures than x86
>>>
>>> change log of v5:
>>> * merge the patchset to clear page table and the patchset to hot
>>> remove
>>> memory(from ishimatsu) to one big patchset.
>>>
>>> [RFC PATCH v5 1/19]
>>> * rename remove_memory() to offline_memory()/offline_pages()
>>>
>>> [RFC PATCH v5 2/19]
>>> * new patch: implement offline_memory(). This function offlines
>>> pages,
>>> update memory block's state, and notify the userspace that the
>>> memory
>>> block's state is changed.
>>>
>>> [RFC PATCH v5 4/19]
>>> * offline and remove memory in acpi_memory_disable_device() too.
>>>
>>> [RFC PATCH v5 17/19]
>>> * new patch: add a new function __remove_zone() to revert the
>>> things done
>>> in the function __add_zone().
>>>
>>> [RFC PATCH v5 18/19]
>>> * flush work befor reseting node device.
>>>
>>> change log of v4:
>>> * remove "memory-hotplug : unify argument of
>>> firmware_map_add_early/hotplug"
>>> from the patch series, since the patch is a bugfix. It is being
>>> disccussed
>>> on other thread. But for testing the patch series, the patch is
>>> needed.
>>> So I added the patch as [PATCH 0/13].
>>>
>>> [RFC PATCH v4 2/13]
>>> * check memory is online or not at remove_memory()
>>> * add memory_add_physaddr_to_nid() to
>>> acpi_memory_device_remove() for
>>> getting node id
>>> [RFC PATCH v4 3/13]
>>> * create new patch : check memory is online or not at
>>> online_pages()
>>>
>>> [RFC PATCH v4 4/13]
>>> * add __ref section to remove_memory()
>>> * call firmware_map_remove_entry() before
>>> remove_sysfs_fw_map_entry()
>>>
>>> [RFC PATCH v4 11/13]
>>> * rewrite register_page_bootmem_memmap() for removing page used
>>> as PT/PMD
>>>
>>> change log of v3:
>>> * rebase to 3.5.0-rc6
>>>
>>> [RFC PATCH v2 2/13]
>>> * remove extra kobject_put()
>>>
>>> * The patch was commented by Wen. Wen's comment is
>>> "acpi_memory_device_remove() should ignore a return value of
>>> remove_memory() since caller does not care the return value".
>>> But I did not change it since I think caller should care the
>>> return value. And I am trying to fix it as follow:
>>>
>>> https://lkml.org/lkml/2012/7/5/624
>>>
>>> [RFC PATCH v2 4/13]
>>> * remove a firmware_memmap_entry allocated by kzmalloc()
>>>
>>> change log of v2:
>>> [RFC PATCH v2 2/13]
>>> * check whether memory block is offline or not before calling
>>> offline_memory()
>>> * check whether section is valid or not in is_memblk_offline()
>>> * call kobject_put() for each memory_block in is_memblk_offline()
>>>
>>> [RFC PATCH v2 3/13]
>>> * unify the end argument of firmware_map_add_early/hotplug
>>>
>>> [RFC PATCH v2 4/13]
>>> * add release_firmware_map_entry() for freeing firmware_map_entry
>>>
>>> [RFC PATCH v2 6/13]
>>> * add release_memory_block() for freeing memory_block
>>>
>>> [RFC PATCH v2 11/13]
>>> * fix wrong arguments of free_pages()
>>>
>>>
>>> Wen Congyang (8):
>>> memory-hotplug: implement offline_memory()
>>> memory-hotplug: store the node id in acpi_memory_device
>>> memory-hotplug: export the function acpi_bus_remove()
>>> memory-hotplug: call acpi_bus_remove() to remove memory device
>>> memory-hotplug: introduce new function arch_remove_memory()
>>> memory-hotplug: remove sysfs file of node
>>> memory-hotplug: clear hwpoisoned flag when onlining pages
>>> memory-hotplug: auto offline page_cgroup when onlining memory block
>>> failed
>>>
>>> Yasuaki Ishimatsu (13):
>>> memory-hotplug: rename remove_memory() to
>>> offline_memory()/offline_pages()
>>> memory-hotplug: offline and remove memory when removing the memory
>>> device
>>> memory-hotplug: check whether memory is present or not
>>> memory-hotplug: remove /sys/firmware/memmap/X sysfs
>>> memory-hotplug: does not release memory region in PAGES_PER_SECTION
>>> chunks
>>> memory-hotplug: add memory_block_release
>>> memory-hotplug: remove_memory calls __remove_pages
>>> memory-hotplug: check page type in get_page_bootmem
>>> memory-hotplug: move register_page_bootmem_info_node and
>>> put_page_bootmem for sparse-vmemmap
>>> memory-hotplug: implement register_page_bootmem_info_section of
>>> sparse-vmemmap
>>> memory-hotplug: free memmap of sparse-vmemmap
>>> memory_hotplug: clear zone when the memory is removed
>>> memory-hotplug: add node_device_release
>>>
>>> arch/ia64/mm/discontig.c | 14 +
>>> arch/ia64/mm/init.c | 16 +
>>> arch/powerpc/mm/init_64.c | 14 +
>>> arch/powerpc/mm/mem.c | 14 +
>>> arch/powerpc/platforms/pseries/hotplug-memory.c | 16 +-
>>> arch/s390/mm/init.c | 12 +
>>> arch/s390/mm/vmem.c | 14 +
>>> arch/sh/mm/init.c | 15 +
>>> arch/sparc/mm/init_64.c | 14 +
>>> arch/tile/mm/init.c | 8 +
>>> arch/x86/include/asm/pgtable_types.h | 1 +
>>> arch/x86/mm/init_32.c | 10 +
>>> arch/x86/mm/init_64.c | 331
>>> ++++++++++++++++++
>>> arch/x86/mm/pageattr.c | 47 ++--
>>> drivers/acpi/acpi_memhotplug.c | 54 +++-
>>> drivers/acpi/scan.c | 3 +-
>>> drivers/base/memory.c | 88 ++++-
>>> drivers/base/node.c | 11 +
>>> drivers/firmware/memmap.c | 98 +++++-
>>> include/acpi/acpi_bus.h | 1 +
>>> include/linux/firmware-map.h | 6 +
>>> include/linux/memory.h | 5 +
>>> include/linux/memory_hotplug.h | 25 +-
>>> include/linux/mm.h | 5 +-
>>> include/linux/mmzone.h | 19 +
>>> mm/memory_hotplug.c | 424
>>> +++++++++++++++++++++--
>>> mm/page_cgroup.c | 3 +
>>> mm/sparse.c | 5 +-
>>> 28 files changed, 1181 insertions(+), 92 deletions(-)
>>>
>>> --
>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>> the body to majordomo@kvack.org. For more info on Linux MM,
>>> see: http://www.linux-mm.org/ .
>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>>
>>
>
>
>
^ permalink raw reply
* Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory
From: Yasuaki Ishimatsu @ 2012-10-02 0:02 UTC (permalink / raw)
To: Ni zhan Chen
Cc: linux-s390, linux-ia64, wency, linux-acpi, linux-sh, len.brown,
x86, linux-kernel, cmetcalf, linux-mm, paulus, minchan.kim,
kosaki.motohiro, rientjes, sparclinux, cl, linuxppc-dev, akpm,
liuj97
In-Reply-To: <506A2B33.80603@gmail.com>
Hi Chen,
2012/10/02 8:45, Ni zhan Chen wrote:
> On 10/01/2012 12:44 PM, Yasuaki Ishimatsu wrote:
>> Hi Chen,
>>
>> 2012/09/29 17:19, Ni zhan Chen wrote:
>>> On 09/05/2012 05:25 PM, wency@cn.fujitsu.com wrote:
>>>> From: Wen Congyang <wency@cn.fujitsu.com>
>>>>
>>>> This patch series aims to support physical memory hot-remove.
>>>>
>>>> The patches can free/remove the following things:
>>>>
>>>> - acpi_memory_info : [RFC PATCH 4/19]
>>>> - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 8/19]
>>>> - iomem_resource : [RFC PATCH 9/19]
>>>> - mem_section and related sysfs files : [RFC PATCH 10-11, 13-16/19]
>>>> - page table of removed memory : [RFC PATCH 12/19]
>>>> - node and related sysfs files : [RFC PATCH 18-19/19]
>>>>
>>>> If you find lack of function for physical memory hot-remove, please let me
>>>> know.
>>>>
>>>> How to test this patchset?
>>>> 1. apply this patchset and build the kernel. MEMORY_HOTPLUG, MEMORY_HOTREMOVE,
>>>> ACPI_HOTPLUG_MEMORY must be selected.
>>>> 2. load the module acpi_memhotplug
>>>
>>> Hi Yasuaki,
>>>
>>> where is the acpi_memhotplug module?
>>
>> If you build acpi_memhotplug as module, it is created under
>> /lib/modules/<kernel-version>/driver/acpi/ directory. It depends
>> on config ACPI_HOTPLUG_MEMORY. The confing is [*], it becomes built-in
>> function. So you don't need to care about it.
>> Thanks,
>> Yasuaki Ishimatsu
>
> Hi Yasuaki,
>
> I build the kernel, MEMORY_HOTPLUG, MEMORY_HOTREMOVE, ACPI_HOTPLUG_MEMORY are seleted as [*], but I can't find PNP0C80:XX under the directory /sys/bus/acpi/devices/.
>
> [root@localhost ~]# ls /sys/bus/acpi/devices/
> device:00 device:07 device:0e device:15 device:1c device:23 device:2a LNXCPU:00 LNXCPU:07 PNP0501:00 PNP0C02:00 PNP0C0F:02 PNP0C14:01
> device:01 device:08 device:0f device:16 device:1d device:24 device:2b LNXCPU:01 LNXPWRBN:00 PNP0800:00 PNP0C02:01 PNP0C0F:03 PNP0C31:00
> device:02 device:09 device:10 device:17 device:1e device:25 device:2c LNXCPU:02 LNXSYSTM:00 PNP0A08:00 PNP0C02:02 PNP0C0F:04
> device:03 device:0a device:11 device:18 device:1f device:26 device:2d LNXCPU:03 PNP0000:00 PNP0B00:00 PNP0C04:00 PNP0C0F:05
> device:04 device:0b device:12 device:19 device:20 device:27 device:2e LNXCPU:04 PNP0100:00 PNP0C01:00 PNP0C0C:00 PNP0C0F:06
> device:05 device:0c device:13 device:1a device:21 device:28 device:2f LNXCPU:05 PNP0103:00 PNP0C01:01 PNP0C0F:00 PNP0C0F:07
> device:06 device:0d device:14 device:1b device:22 device:29 INT3F0D:00 LNXCPU:06 PNP0200:00 PNP0C01:02 PNP0C0F:01 PNP0C14:00
>
> then what I miss ? thanks.
It depend on hardware. It seems that your system does not support
memory hotplug. If you use KVM, you can try memory hotplug on KVM
guest by applying Vasilis' patch-set.
http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg01389.html
Thanks,
Yasuaki Ishimatsu
>
>>
>>>
>>>> 3. hotplug the memory device(it depends on your hardware)
>>>> You will see the memory device under the directory /sys/bus/acpi/devices/.
>>>> Its name is PNP0C80:XX.
>>>> 4. online/offline pages provided by this memory device
>>>> You can write online/offline to /sys/devices/system/memory/memoryX/state to
>>>> online/offline pages provided by this memory device
>>>> 5. hotremove the memory device
>>>> You can hotremove the memory device by the hardware, or writing 1 to
>>>> /sys/bus/acpi/devices/PNP0C80:XX/eject.
>>>>
>>>> Note: if the memory provided by the memory device is used by the kernel, it
>>>> can't be offlined. It is not a bug.
>>>>
>>>> Known problems:
>>>> 1. memory can't be offlined when CONFIG_MEMCG is selected.
>>>> For example: there is a memory device on node 1. The address range
>>>> is [1G, 1.5G). You will find 4 new directories memory8, memory9, memory10,
>>>> and memory11 under the directory /sys/devices/system/memory/.
>>>> If CONFIG_MEMCG is selected, we will allocate memory to store page cgroup
>>>> when we online pages. When we online memory8, the memory stored page cgroup
>>>> is not provided by this memory device. But when we online memory9, the memory
>>>> stored page cgroup may be provided by memory8. So we can't offline memory8
>>>> now. We should offline the memory in the reversed order.
>>>> When the memory device is hotremoved, we will auto offline memory provided
>>>> by this memory device. But we don't know which memory is onlined first, so
>>>> offlining memory may fail. In such case, you should offline the memory by
>>>> hand before hotremoving the memory device.
>>>> 2. hotremoving memory device may cause kernel panicked
>>>> This bug will be fixed by Liu Jiang's patch:
>>>> https://lkml.org/lkml/2012/7/3/1
>>>>
>>>> change log of v9:
>>>> [RFC PATCH v9 8/21]
>>>> * add a lock to protect the list map_entries
>>>> * add an indicator to firmware_map_entry to remember whether the memory
>>>> is allocated from bootmem
>>>> [RFC PATCH v9 10/21]
>>>> * change the macro to inline function
>>>> [RFC PATCH v9 19/21]
>>>> * don't offline the node if the cpu on the node is onlined
>>>> [RFC PATCH v9 21/21]
>>>> * create new patch: auto offline page_cgroup when onlining memory block
>>>> failed
>>>>
>>>> change log of v8:
>>>> [RFC PATCH v8 17/20]
>>>> * Fix problems when one node's range include the other nodes
>>>> [RFC PATCH v8 18/20]
>>>> * fix building error when CONFIG_MEMORY_HOTPLUG_SPARSE or CONFIG_HUGETLBFS
>>>> is not defined.
>>>> [RFC PATCH v8 19/20]
>>>> * don't offline node when some memory sections are not removed
>>>> [RFC PATCH v8 20/20]
>>>> * create new patch: clear hwpoisoned flag when onlining pages
>>>>
>>>> change log of v7:
>>>> [RFC PATCH v7 4/19]
>>>> * do not continue if acpi_memory_device_remove_memory() fails.
>>>> [RFC PATCH v7 15/19]
>>>> * handle usemap in register_page_bootmem_info_section() too.
>>>>
>>>> change log of v6:
>>>> [RFC PATCH v6 12/19]
>>>> * fix building error on other archtitectures than x86
>>>>
>>>> [RFC PATCH v6 15-16/19]
>>>> * fix building error on other archtitectures than x86
>>>>
>>>> change log of v5:
>>>> * merge the patchset to clear page table and the patchset to hot remove
>>>> memory(from ishimatsu) to one big patchset.
>>>>
>>>> [RFC PATCH v5 1/19]
>>>> * rename remove_memory() to offline_memory()/offline_pages()
>>>>
>>>> [RFC PATCH v5 2/19]
>>>> * new patch: implement offline_memory(). This function offlines pages,
>>>> update memory block's state, and notify the userspace that the memory
>>>> block's state is changed.
>>>>
>>>> [RFC PATCH v5 4/19]
>>>> * offline and remove memory in acpi_memory_disable_device() too.
>>>>
>>>> [RFC PATCH v5 17/19]
>>>> * new patch: add a new function __remove_zone() to revert the things done
>>>> in the function __add_zone().
>>>>
>>>> [RFC PATCH v5 18/19]
>>>> * flush work befor reseting node device.
>>>>
>>>> change log of v4:
>>>> * remove "memory-hotplug : unify argument of firmware_map_add_early/hotplug"
>>>> from the patch series, since the patch is a bugfix. It is being disccussed
>>>> on other thread. But for testing the patch series, the patch is needed.
>>>> So I added the patch as [PATCH 0/13].
>>>>
>>>> [RFC PATCH v4 2/13]
>>>> * check memory is online or not at remove_memory()
>>>> * add memory_add_physaddr_to_nid() to acpi_memory_device_remove() for
>>>> getting node id
>>>> [RFC PATCH v4 3/13]
>>>> * create new patch : check memory is online or not at online_pages()
>>>>
>>>> [RFC PATCH v4 4/13]
>>>> * add __ref section to remove_memory()
>>>> * call firmware_map_remove_entry() before remove_sysfs_fw_map_entry()
>>>>
>>>> [RFC PATCH v4 11/13]
>>>> * rewrite register_page_bootmem_memmap() for removing page used as PT/PMD
>>>>
>>>> change log of v3:
>>>> * rebase to 3.5.0-rc6
>>>>
>>>> [RFC PATCH v2 2/13]
>>>> * remove extra kobject_put()
>>>>
>>>> * The patch was commented by Wen. Wen's comment is
>>>> "acpi_memory_device_remove() should ignore a return value of
>>>> remove_memory() since caller does not care the return value".
>>>> But I did not change it since I think caller should care the
>>>> return value. And I am trying to fix it as follow:
>>>>
>>>> https://lkml.org/lkml/2012/7/5/624
>>>>
>>>> [RFC PATCH v2 4/13]
>>>> * remove a firmware_memmap_entry allocated by kzmalloc()
>>>>
>>>> change log of v2:
>>>> [RFC PATCH v2 2/13]
>>>> * check whether memory block is offline or not before calling offline_memory()
>>>> * check whether section is valid or not in is_memblk_offline()
>>>> * call kobject_put() for each memory_block in is_memblk_offline()
>>>>
>>>> [RFC PATCH v2 3/13]
>>>> * unify the end argument of firmware_map_add_early/hotplug
>>>>
>>>> [RFC PATCH v2 4/13]
>>>> * add release_firmware_map_entry() for freeing firmware_map_entry
>>>>
>>>> [RFC PATCH v2 6/13]
>>>> * add release_memory_block() for freeing memory_block
>>>>
>>>> [RFC PATCH v2 11/13]
>>>> * fix wrong arguments of free_pages()
>>>>
>>>>
>>>> Wen Congyang (8):
>>>> memory-hotplug: implement offline_memory()
>>>> memory-hotplug: store the node id in acpi_memory_device
>>>> memory-hotplug: export the function acpi_bus_remove()
>>>> memory-hotplug: call acpi_bus_remove() to remove memory device
>>>> memory-hotplug: introduce new function arch_remove_memory()
>>>> memory-hotplug: remove sysfs file of node
>>>> memory-hotplug: clear hwpoisoned flag when onlining pages
>>>> memory-hotplug: auto offline page_cgroup when onlining memory block
>>>> failed
>>>>
>>>> Yasuaki Ishimatsu (13):
>>>> memory-hotplug: rename remove_memory() to
>>>> offline_memory()/offline_pages()
>>>> memory-hotplug: offline and remove memory when removing the memory
>>>> device
>>>> memory-hotplug: check whether memory is present or not
>>>> memory-hotplug: remove /sys/firmware/memmap/X sysfs
>>>> memory-hotplug: does not release memory region in PAGES_PER_SECTION
>>>> chunks
>>>> memory-hotplug: add memory_block_release
>>>> memory-hotplug: remove_memory calls __remove_pages
>>>> memory-hotplug: check page type in get_page_bootmem
>>>> memory-hotplug: move register_page_bootmem_info_node and
>>>> put_page_bootmem for sparse-vmemmap
>>>> memory-hotplug: implement register_page_bootmem_info_section of
>>>> sparse-vmemmap
>>>> memory-hotplug: free memmap of sparse-vmemmap
>>>> memory_hotplug: clear zone when the memory is removed
>>>> memory-hotplug: add node_device_release
>>>>
>>>> arch/ia64/mm/discontig.c | 14 +
>>>> arch/ia64/mm/init.c | 16 +
>>>> arch/powerpc/mm/init_64.c | 14 +
>>>> arch/powerpc/mm/mem.c | 14 +
>>>> arch/powerpc/platforms/pseries/hotplug-memory.c | 16 +-
>>>> arch/s390/mm/init.c | 12 +
>>>> arch/s390/mm/vmem.c | 14 +
>>>> arch/sh/mm/init.c | 15 +
>>>> arch/sparc/mm/init_64.c | 14 +
>>>> arch/tile/mm/init.c | 8 +
>>>> arch/x86/include/asm/pgtable_types.h | 1 +
>>>> arch/x86/mm/init_32.c | 10 +
>>>> arch/x86/mm/init_64.c | 331 ++++++++++++++++++
>>>> arch/x86/mm/pageattr.c | 47 ++--
>>>> drivers/acpi/acpi_memhotplug.c | 54 +++-
>>>> drivers/acpi/scan.c | 3 +-
>>>> drivers/base/memory.c | 88 ++++-
>>>> drivers/base/node.c | 11 +
>>>> drivers/firmware/memmap.c | 98 +++++-
>>>> include/acpi/acpi_bus.h | 1 +
>>>> include/linux/firmware-map.h | 6 +
>>>> include/linux/memory.h | 5 +
>>>> include/linux/memory_hotplug.h | 25 +-
>>>> include/linux/mm.h | 5 +-
>>>> include/linux/mmzone.h | 19 +
>>>> mm/memory_hotplug.c | 424 +++++++++++++++++++++--
>>>> mm/page_cgroup.c | 3 +
>>>> mm/sparse.c | 5 +-
>>>> 28 files changed, 1181 insertions(+), 92 deletions(-)
>>>>
>>>> --
>>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>>> the body to majordomo@kvack.org. For more info on Linux MM,
>>>> see: http://www.linux-mm.org/ .
>>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>>>
>>>
>>
>>
>>
>
^ permalink raw reply
* Re: [RFC v9 PATCH 06/21] memory-hotplug: export the function acpi_bus_remove()
From: Ni zhan Chen @ 2012-10-02 0:34 UTC (permalink / raw)
To: wency
Cc: linux-s390, linux-ia64, len.brown, linux-acpi, linux-sh, x86,
linux-kernel, cmetcalf, linux-mm, isimatu.yasuaki, paulus,
minchan.kim, kosaki.motohiro, rientjes, sparclinux, cl,
linuxppc-dev, akpm, liuj97
In-Reply-To: <1346837155-534-7-git-send-email-wency@cn.fujitsu.com>
On 09/05/2012 05:25 PM, wency@cn.fujitsu.com wrote:
> From: Wen Congyang <wency@cn.fujitsu.com>
>
> The function acpi_bus_remove() can remove a acpi device from acpi device.
IIUC, s/acpi device/acpi bus
>
> When a acpi device is removed, we need to call this function to remove
> the acpi device from acpi bus. So export this function.
>
> CC: David Rientjes <rientjes@google.com>
> CC: Jiang Liu <liuj97@gmail.com>
> CC: Len Brown <len.brown@intel.com>
> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> CC: Paul Mackerras <paulus@samba.org>
> CC: Christoph Lameter <cl@linux.com>
> Cc: Minchan Kim <minchan.kim@gmail.com>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> CC: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> ---
> drivers/acpi/scan.c | 3 ++-
> include/acpi/acpi_bus.h | 1 +
> 2 files changed, 3 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> index d1ecca2..1cefc34 100644
> --- a/drivers/acpi/scan.c
> +++ b/drivers/acpi/scan.c
> @@ -1224,7 +1224,7 @@ static int acpi_device_set_context(struct acpi_device *device)
> return -ENODEV;
> }
>
> -static int acpi_bus_remove(struct acpi_device *dev, int rmdevice)
> +int acpi_bus_remove(struct acpi_device *dev, int rmdevice)
> {
> if (!dev)
> return -EINVAL;
> @@ -1246,6 +1246,7 @@ static int acpi_bus_remove(struct acpi_device *dev, int rmdevice)
>
> return 0;
> }
> +EXPORT_SYMBOL(acpi_bus_remove);
>
> static int acpi_add_single_object(struct acpi_device **child,
> acpi_handle handle, int type,
> diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
> index bde976e..2ccf109 100644
> --- a/include/acpi/acpi_bus.h
> +++ b/include/acpi/acpi_bus.h
> @@ -360,6 +360,7 @@ bool acpi_bus_power_manageable(acpi_handle handle);
> bool acpi_bus_can_wakeup(acpi_handle handle);
> int acpi_power_resource_register_device(struct device *dev, acpi_handle handle);
> void acpi_power_resource_unregister_device(struct device *dev, acpi_handle handle);
> +int acpi_bus_remove(struct acpi_device *dev, int rmdevice);
> #ifdef CONFIG_ACPI_PROC_EVENT
> int acpi_bus_generate_proc_event(struct acpi_device *device, u8 type, int data);
> int acpi_bus_generate_proc_event4(const char *class, const char *bid, u8 type, int data);
^ permalink raw reply
* Re: [REGRESSION] nfsd crashing with 3.6.0-rc7 on PowerPC
From: Benjamin Herrenschmidt @ 2012-10-02 0:58 UTC (permalink / raw)
To: Alexander Graf
Cc: linux-nfs, Jan Kara, Linus Torvalds, LKML List, J. Bruce Fields,
anton, skinsbursky, bfields, linuxppc-dev
In-Reply-To: <2A52FC96-148C-4F7A-9950-E152E0C6698D@suse.de>
On Mon, 2012-10-01 at 16:03 +0200, Alexander Graf wrote:
> Phew. Here we go :). It looks to be more of a PPC specific problem than it appeared as at first:
Ok, so I suspect the problem is the pushing down of the locks which
breaks with iommu backends that have a separate flush callback. In
that case, the flush moves out of the allocator lock.
Now we do call flush before we return, still, but it becomes racy
I suspect, but somebody needs to give it a closer look. I'm hoping
Anton or Nish will later today.
Cheers,
Ben.
>
> b4c3a8729ae57b4f84d661e16a192f828eca1d03 is first bad commit
> commit b4c3a8729ae57b4f84d661e16a192f828eca1d03
> Author: Anton Blanchard <anton@samba.org>
> Date: Thu Jun 7 18:14:48 2012 +0000
>
> powerpc/iommu: Implement IOMMU pools to improve multiqueue adapter performance
>
> At the moment all queues in a multiqueue adapter will serialise
> against the IOMMU table lock. This is proving to be a big issue,
> especially with 10Gbit ethernet.
>
> This patch creates 4 pools and tries to spread the load across
> them. If the table is under 1GB in size we revert back to the
> original behaviour of 1 pool and 1 largealloc pool.
>
> We create a hash to map CPUs to pools. Since we prefer interrupts to
> be affinitised to primary CPUs, without some form of hashing we are
> very likely to end up using the same pool. As an example, POWER7
> has 4 way SMT and with 4 pools all primary threads will map to the
> same pool.
>
> The largealloc pool is reduced from 1/2 to 1/4 of the space to
> partially offset the overhead of breaking the table up into pools.
>
> Some performance numbers were obtained with a Chelsio T3 adapter on
> two POWER7 boxes, running a 100 session TCP round robin test.
>
> Performance improved 69% with this patch applied.
>
> Signed-off-by: Anton Blanchard <anton@samba.org>
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>
> :040000 040000 039ae3cbdcfded9c6b13e58a3fc67609f1b587b0 6755a8c4a690cc80dcf834d1127f21db925476d6 M arch
>
>
> Alex
^ permalink raw reply
* [PATCH] powerpc: fix VMX fix for memcpy case
From: Nishanth Aravamudan @ 2012-10-02 0:59 UTC (permalink / raw)
To: Anton Blanchard; +Cc: paulus, linuxppc-dev
[urgh, sorry Anton, Ben & Paul, inadvertently hit send before adding
linuxppc-dev to the cc!]
Hi Anton,
In 2fae7cdb60240e2e2d9b378afbf6d9fcce8a3890 ("powerpc: Fix VMX in
interrupt check in POWER7 copy loops"), I think you inadvertently
introduced a regression for memcpy on POWER7 machines. copyuer and
memcpy diverge slightly in their use of cr1 (copyuser doesn't use it,
but memcpy does) and you end up clobbering that register with your fix.
That results in (taken from an FC18 kernel):
[ 18.824604] Unrecoverable VMX/Altivec Unavailable Exception f20 at c000000000052f40
[ 18.824618] Oops: Unrecoverable VMX/Altivec Unavailable Exception, sig: 6 [#1]
[ 18.824623] SMP NR_CPUS=1024 NUMA pSeries
[ 18.824633] Modules linked in: tg3(+) be2net(+) cxgb4(+) ipr(+) sunrpc xts lrw gf128mul dm_crypt dm_round_robin dm_multipath linear raid10 raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 raid0 scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua squashfs cramfs
[ 18.824705] NIP: c000000000052f40 LR: c00000000020b874 CTR: 0000000000000512
[ 18.824709] REGS: c000001f1fef7790 TRAP: 0f20 Not tainted (3.6.0-0.rc6.git0.2.fc18.ppc64)
[ 18.824713] MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI> CR: 4802802e XER: 20000010
[ 18.824726] SOFTE: 0
[ 18.824728] CFAR: 0000000000000f20
[ 18.824731] TASK = c000000fa7128400[0] 'swapper/24' THREAD: c000000fa7480000 CPU: 24
GPR00: 00000000ffffffc0 c000001f1fef7a10 c00000000164edc0 c000000f9b9a8120
GPR04: c000000f9b9a8124 0000000000001438 0000000000000060 03ffffff064657ee
GPR08: 0000000080000000 0000000000000010 0000000000000020 0000000000000030
GPR12: 0000000028028022 c00000000ff25400 0000000000000001 0000000000000000
GPR16: 0000000000000000 7fffffffffffffff c0000000016b2180 c00000000156a500
GPR20: c000000f968c7a90 c0000000131c31d8 c000001f1fef4000 c000000001561d00
GPR24: 000000000000000a 0000000000000000 0000000000000001 0000000000000012
GPR28: c000000fa5c04f80 00000000000008bc c0000000015c0a28 000000000000022e
[ 18.824792] NIP [c000000000052f40] .memcpy_power7+0x5a0/0x7c4
[ 18.824797] LR [c00000000020b874] .pcpu_free_area+0x174/0x2d0
[ 18.824800] Call Trace:
[ 18.824803] [c000001f1fef7a10] [c000000000052c14] .memcpy_power7+0x274/0x7c4 (unreliable)
[ 18.824809] [c000001f1fef7b10] [c00000000020b874] .pcpu_free_area+0x174/0x2d0
[ 18.824813] [c000001f1fef7bb0] [c00000000020ba88] .free_percpu+0xb8/0x1b0
[ 18.824819] [c000001f1fef7c50] [c00000000043d144] .throtl_pd_exit+0x94/0xd0
[ 18.824824] [c000001f1fef7cf0] [c00000000043acf8] .blkg_free+0x88/0xe0
[ 18.824829] [c000001f1fef7d90] [c00000000018c048] .rcu_process_callbacks+0x2e8/0x8a0
[ 18.824835] [c000001f1fef7e90] [c0000000000a8ce8] .__do_softirq+0x158/0x4d0
[ 18.824840] [c000001f1fef7f90] [c000000000025ecc] .call_do_softirq+0x14/0x24
[ 18.824845] [c000000fa7483650] [c000000000010e80] .do_softirq+0x160/0x1a0
[ 18.824850] [c000000fa74836f0] [c0000000000a94a4] .irq_exit+0xf4/0x120
[ 18.824854] [c000000fa7483780] [c000000000020c44] .timer_interrupt+0x154/0x4d0
[ 18.824859] [c000000fa7483830] [c000000000003be0] decrementer_common+0x160/0x180
[ 18.824866] --- Exception: 901 at .plpar_hcall_norets+0x84/0xd4
[ 18.824866] LR = .check_and_cede_processor+0x48/0x80
[ 18.824871] [c000000fa7483b20] [c00000000007f018] .check_and_cede_processor+0x18/0x80 (unreliable)
[ 18.824877] [c000000fa7483b90] [c00000000007f104] .dedicated_cede_loop+0x84/0x150
[ 18.824883] [c000000fa7483c50] [c0000000006bc030] .cpuidle_enter+0x30/0x50
[ 18.824887] [c000000fa7483cc0] [c0000000006bc9f4] .cpuidle_idle_call+0x104/0x720
[ 18.824892] [c000000fa7483d80] [c000000000070af8] .pSeries_idle+0x18/0x40
[ 18.824897] [c000000fa7483df0] [c000000000019084] .cpu_idle+0x1a4/0x380
[ 18.824902] [c000000fa7483ec0] [c0000000008a4c18] .start_secondary+0x520/0x528
[ 18.824907] [c000000fa7483f90] [c0000000000093f0] .start_secondary_prolog+0x10/0x14
[ 18.824911] Instruction dump:
[ 18.824914] 38840008 90030000 90e30004 38630008 7ca62850 7cc300d0 78c7e102 7cf01120
[ 18.824923] 78c60660 39200010 39400020 39600030 <7e00200c> 7c0020ce 38840010 409f001c
[ 18.824935] ---[ end trace 0bb95124affaaa45 ]---
[ 18.825046] Unrecoverable VMX/Altivec Unavailable Exception f20 at c000000000052d08
I believe the right fix is to make memcpy match usercopy and not use
cr1.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
---
I've not tested this fix yet, but I think it's logically correct.
Probably needs to go to 3.6-stable as well.
diff --git a/arch/powerpc/lib/memcpy_power7.S b/arch/powerpc/lib/memcpy_power7.S
index 7ba6c96..0663630 100644
--- a/arch/powerpc/lib/memcpy_power7.S
+++ b/arch/powerpc/lib/memcpy_power7.S
@@ -239,8 +239,8 @@ _GLOBAL(memcpy_power7)
ori r9,r9,1 /* stream=1 */
srdi r7,r5,7 /* length in cachelines, capped at 0x3FF */
- cmpldi cr1,r7,0x3FF
- ble cr1,1f
+ cmpldi r7,0x3FF
+ ble 1f
li r7,0x3FF
1: lis r0,0x0E00 /* depth=7 */
sldi r7,r7,7
^ permalink raw reply related
* Re: [RFC v9 PATCH 01/21] memory-hotplug: rename remove_memory() to offline_memory()/offline_pages()
From: Yasuaki Ishimatsu @ 2012-10-02 1:18 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: linux-s390, linux-ia64, wency, linux-acpi, linux-sh, len.brown,
x86, Ni zhan Chen, linux-kernel, cmetcalf, linux-mm, paulus,
minchan.kim, rientjes, sparclinux, cl, linuxppc-dev, akpm, liuj97
In-Reply-To: <CAHGf_=oJ_Jmjqcdr4cPJghf7PX+vfmZe=CV2sdQQhS5agzG15w@mail.gmail.com>
Hi Kosaki-san,
2012/09/29 7:15, KOSAKI Motohiro wrote:
> On Thu, Sep 27, 2012 at 11:50 PM, Yasuaki Ishimatsu
> <isimatu.yasuaki@jp.fujitsu.com> wrote:
>> Hi Chen,
>>
>>
>> 2012/09/28 11:22, Ni zhan Chen wrote:
>>>
>>> On 09/05/2012 05:25 PM, wency@cn.fujitsu.com wrote:
>>>>
>>>> From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>>>>
>>>> remove_memory() only try to offline pages. It is called in two cases:
>>>> 1. hot remove a memory device
>>>> 2. echo offline >/sys/devices/system/memory/memoryXX/state
>>>>
>>>> In the 1st case, we should also change memory block's state, and notify
>>>> the userspace that the memory block's state is changed after offlining
>>>> pages.
>>>>
>>>> So rename remove_memory() to offline_memory()/offline_pages(). And in
>>>> the 1st case, offline_memory() will be used. The function
>>>> offline_memory()
>>>> is not implemented. In the 2nd case, offline_pages() will be used.
>>>
>>>
>>> But this time there is not a function associated with add_memory.
>>
>>
>> To associate with add_memory() later, we renamed it.
>
> Then, you introduced bisect breakage. It is definitely unacceptable.
What is "bisect breakage" meaning?
Thanks,
Yasuaki Ishimatsu
>
> NAK.
>
^ permalink raw reply
* Re: [RFC v9 PATCH 16/21] memory-hotplug: free memmap of sparse-vmemmap
From: Ni zhan Chen @ 2012-10-02 4:21 UTC (permalink / raw)
To: isimatu.yasuaki
Cc: linux-s390, linux-ia64, Wen Congyang, len.brown, linux-acpi,
linux-sh, x86, linux-kernel, cmetcalf, linux-mm, paulus,
minchan.kim, kosaki.motohiro, rientjes, sparclinux, cl,
linuxppc-dev, akpm, liuj97
In-Reply-To: <1346837155-534-17-git-send-email-wency@cn.fujitsu.com>
On 09/05/2012 05:25 PM, wency@cn.fujitsu.com wrote:
> From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>
> All pages of virtual mapping in removed memory cannot be freed, since some pages
> used as PGD/PUD includes not only removed memory but also other memory. So the
> patch checks whether page can be freed or not.
>
> How to check whether page can be freed or not?
> 1. When removing memory, the page structs of the revmoved memory are filled
> with 0FD.
> 2. All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared.
> In this case, the page used as PT/PMD can be freed.
>
> Applying patch, __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is integrated
> into one. So __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is deleted.
>
> Note: vmemmap_kfree() and vmemmap_free_bootmem() are not implemented for ia64,
> ppc, s390, and sparc.
>
> CC: David Rientjes <rientjes@google.com>
> CC: Jiang Liu <liuj97@gmail.com>
> CC: Len Brown <len.brown@intel.com>
> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> CC: Paul Mackerras <paulus@samba.org>
> CC: Christoph Lameter <cl@linux.com>
> Cc: Minchan Kim <minchan.kim@gmail.com>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> CC: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> ---
> arch/ia64/mm/discontig.c | 8 +++
> arch/powerpc/mm/init_64.c | 8 +++
> arch/s390/mm/vmem.c | 8 +++
> arch/sparc/mm/init_64.c | 8 +++
> arch/x86/mm/init_64.c | 119 +++++++++++++++++++++++++++++++++++++++++++++
> include/linux/mm.h | 2 +
> mm/memory_hotplug.c | 17 +------
> mm/sparse.c | 5 +-
> 8 files changed, 158 insertions(+), 17 deletions(-)
>
> diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c
> index 33943db..0d23b69 100644
> --- a/arch/ia64/mm/discontig.c
> +++ b/arch/ia64/mm/discontig.c
> @@ -823,6 +823,14 @@ int __meminit vmemmap_populate(struct page *start_page,
> return vmemmap_populate_basepages(start_page, size, node);
> }
>
> +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
> +{
> +}
> +
> +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
> +{
> +}
> +
> void register_page_bootmem_memmap(unsigned long section_nr,
> struct page *start_page, unsigned long size)
> {
> diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
> index 3690c44..835a2b3 100644
> --- a/arch/powerpc/mm/init_64.c
> +++ b/arch/powerpc/mm/init_64.c
> @@ -299,6 +299,14 @@ int __meminit vmemmap_populate(struct page *start_page,
> return 0;
> }
>
> +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
> +{
> +}
> +
> +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
> +{
> +}
> +
> void register_page_bootmem_memmap(unsigned long section_nr,
> struct page *start_page, unsigned long size)
> {
> diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
> index eda55cd..4b42b0b 100644
> --- a/arch/s390/mm/vmem.c
> +++ b/arch/s390/mm/vmem.c
> @@ -227,6 +227,14 @@ out:
> return ret;
> }
>
> +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
> +{
> +}
> +
> +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
> +{
> +}
> +
> void register_page_bootmem_memmap(unsigned long section_nr,
> struct page *start_page, unsigned long size)
> {
> diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
> index add1cc7..1384826 100644
> --- a/arch/sparc/mm/init_64.c
> +++ b/arch/sparc/mm/init_64.c
> @@ -2078,6 +2078,14 @@ void __meminit vmemmap_populate_print_last(void)
> }
> }
>
> +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
> +{
> +}
> +
> +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
> +{
> +}
> +
> void register_page_bootmem_memmap(unsigned long section_nr,
> struct page *start_page, unsigned long size)
> {
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 0075592..4e8f8a4 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -1138,6 +1138,125 @@ vmemmap_populate(struct page *start_page, unsigned long size, int node)
> return 0;
> }
>
> +#define PAGE_INUSE 0xFD
> +
> +unsigned long find_and_clear_pte_page(unsigned long addr, unsigned long end,
> + struct page **pp, int *page_size)
> +{
> + pgd_t *pgd;
> + pud_t *pud;
> + pmd_t *pmd;
> + pte_t *pte;
> + void *page_addr;
> + unsigned long next;
> +
> + *pp = NULL;
> +
> + pgd = pgd_offset_k(addr);
> + if (pgd_none(*pgd))
> + return pgd_addr_end(addr, end);
> +
> + pud = pud_offset(pgd, addr);
> + if (pud_none(*pud))
> + return pud_addr_end(addr, end);
> +
> + if (!cpu_has_pse) {
> + next = (addr + PAGE_SIZE) & PAGE_MASK;
> + pmd = pmd_offset(pud, addr);
> + if (pmd_none(*pmd))
> + return next;
> +
> + pte = pte_offset_kernel(pmd, addr);
> + if (pte_none(*pte))
> + return next;
> +
> + *page_size = PAGE_SIZE;
> + *pp = pte_page(*pte);
> + } else {
> + next = pmd_addr_end(addr, end);
> +
> + pmd = pmd_offset(pud, addr);
> + if (pmd_none(*pmd))
> + return next;
> +
> + *page_size = PMD_SIZE;
> + *pp = pmd_page(*pmd);
> + }
> +
> + /*
> + * Removed page structs are filled with 0xFD.
> + */
> + memset((void *)addr, PAGE_INUSE, next - addr);
> +
> + page_addr = page_address(*pp);
> +
> + /*
> + * Check the page is filled with 0xFD or not.
> + * memchr_inv() returns the address. In this case, we cannot
> + * clear PTE/PUD entry, since the page is used by other.
> + * So we cannot also free the page.
> + *
> + * memchr_inv() returns NULL. In this case, we can clear
> + * PTE/PUD entry, since the page is not used by other.
> + * So we can also free the page.
> + */
> + if (memchr_inv(page_addr, PAGE_INUSE, *page_size)) {
> + *pp = NULL;
> + return next;
> + }
> +
Hi Yasuaki,
why call memchr_inv check after memset, this time the page can always be
filled with 0xFD.
> + if (!cpu_has_pse)
> + pte_clear(&init_mm, addr, pte);
> + else
> + pmd_clear(pmd);
> +
> + return next;
> +}
> +
> +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
> +{
> + unsigned long addr = (unsigned long)memmap;
> + unsigned long end = (unsigned long)(memmap + nr_pages);
> + unsigned long next;
> + struct page *page;
> + int page_size;
> +
> + for (; addr < end; addr = next) {
> + page = NULL;
> + page_size = 0;
> + next = find_and_clear_pte_page(addr, end, &page, &page_size);
> + if (!page)
> + continue;
> +
> + free_pages((unsigned long)page_address(page),
> + get_order(page_size));
> + __flush_tlb_one(addr);
> + }
> +}
> +
> +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
> +{
> + unsigned long addr = (unsigned long)memmap;
> + unsigned long end = (unsigned long)(memmap + nr_pages);
> + unsigned long next;
> + struct page *page;
> + int page_size;
> + unsigned long magic;
> +
> + for (; addr < end; addr = next) {
> + page = NULL;
> + page_size = 0;
> + next = find_and_clear_pte_page(addr, end, &page, &page_size);
> + if (!page)
> + continue;
> +
> + magic = (unsigned long) page->lru.next;
> + if (magic == SECTION_INFO)
> + put_page_bootmem(page);
> + flush_tlb_kernel_range(addr, end);
> + }
> +}
> +
> void register_page_bootmem_memmap(unsigned long section_nr,
> struct page *start_page, unsigned long size)
> {
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index c607913..fb0d1fc 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1620,6 +1620,8 @@ int vmemmap_populate(struct page *start_page, unsigned long pages, int node);
> void vmemmap_populate_print_last(void);
> void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
> unsigned long size);
> +void vmemmap_kfree(struct page *memmpa, unsigned long nr_pages);
> +void vmemmap_free_bootmem(struct page *memmpa, unsigned long nr_pages);
>
> enum mf_flags {
> MF_COUNT_INCREASED = 1 << 0,
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 647a7f2..c54922c 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -308,19 +308,6 @@ static int __meminit __add_section(int nid, struct zone *zone,
> return register_new_memory(nid, __pfn_to_section(phys_start_pfn));
> }
>
> -#ifdef CONFIG_SPARSEMEM_VMEMMAP
> -static int __remove_section(struct zone *zone, struct mem_section *ms)
> -{
> - int ret = -EINVAL;
> -
> - if (!valid_section(ms))
> - return ret;
> -
> - ret = unregister_memory_section(ms);
> -
> - return ret;
> -}
> -#else
> static int __remove_section(struct zone *zone, struct mem_section *ms)
> {
> unsigned long flags;
> @@ -337,9 +324,9 @@ static int __remove_section(struct zone *zone, struct mem_section *ms)
> pgdat_resize_lock(pgdat, &flags);
> sparse_remove_one_section(zone, ms);
> pgdat_resize_unlock(pgdat, &flags);
> - return 0;
> +
> + return ret;
> }
> -#endif
>
> /*
> * Reasonably generic function for adding memory. It is
> diff --git a/mm/sparse.c b/mm/sparse.c
> index fac95f2..ab9d755 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -613,12 +613,13 @@ static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid,
> /* This will make the necessary allocations eventually. */
> return sparse_mem_map_populate(pnum, nid);
> }
> -static void __kfree_section_memmap(struct page *memmap, unsigned long nr_pages)
> +static void __kfree_section_memmap(struct page *page, unsigned long nr_pages)
> {
> - return; /* XXX: Not implemented yet */
> + vmemmap_kfree(page, nr_pages);
> }
> static void free_map_bootmem(struct page *page, unsigned long nr_pages)
> {
> + vmemmap_free_bootmem(page, nr_pages);
> }
> #else
> static struct page *__kmalloc_section_memmap(unsigned long nr_pages)
^ permalink raw reply
* Re: [PATCH 3/6] arch/powerpc/kvm/e500_tlb.c: fix error return code
From: Alexander Graf @ 2012-10-02 11:04 UTC (permalink / raw)
To: Julia Lawall
Cc: kvm, Marcelo Tosatti, kernel-janitors, linux-kernel, kvm-ppc,
Paul Mackerras, Avi Kivity, Julia Lawall, linuxppc-dev
In-Reply-To: <1344160356-387-4-git-send-email-Julia.Lawall@lip6.fr>
On 05.08.2012, at 11:52, Julia Lawall wrote:
> From: Julia Lawall <julia@diku.dk>
>=20
> Convert a 0 error return code to a negative one, as returned elsewhere =
in the
> function.
>=20
> A new label is also added to avoid freeing things that are known to =
not yet
> be allocated.
>=20
> A simplified version of the semantic match that finds the first =
problem is as
> follows: (http://coccinelle.lip6.fr/)
>=20
> // <smpl>
> @@
> identifier ret;
> expression e,e1,e2,e3,e4,x;
> @@
>=20
> (
> if (\(ret !=3D 0\|ret < 0\) || ...) { ... return ...; }
> |
> ret =3D 0
> )
> ... when !=3D ret =3D e1
> *x =3D =
\(kmalloc\|kzalloc\|kcalloc\|devm_kzalloc\|ioremap\|ioremap_nocache\|devm_=
ioremap\|devm_ioremap_nocache\)(...);
> ... when !=3D x =3D e2
> when !=3D ret =3D e3
> *if (x =3D=3D NULL || ...)
> {
> ... when !=3D ret =3D e4
> * return ret;
> }
> // </smpl>
>=20
> Signed-off-by: Julia Lawall <julia@diku.dk>
Thanks, applied to kvm-ppc-next.
Alex
^ permalink raw reply
* Re: [RFC v9 PATCH 13/21] memory-hotplug: check page type in get_page_bootmem
From: Ni zhan Chen @ 2012-10-02 12:24 UTC (permalink / raw)
To: Yasuaki Ishimatsu
Cc: linux-s390, linux-ia64, Wen Congyang, len.brown, linux-acpi,
linux-sh, x86, linux-kernel, cmetcalf, linux-mm, paulus,
minchan.kim, kosaki.motohiro, rientjes, sparclinux, cl,
linuxppc-dev, akpm, liuj97
In-Reply-To: <506907E5.2080609@jp.fujitsu.com>
On 10/01/2012 11:03 AM, Yasuaki Ishimatsu wrote:
> Hi Chen,
>
> 2012/09/29 11:15, Ni zhan Chen wrote:
>> On 09/05/2012 05:25 PM, wency@cn.fujitsu.com wrote:
>>> From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>>>
>>> The function get_page_bootmem() may be called more than one time to
>>> the same
>>> page. There is no need to set page's type, private if the function
>>> is not
>>> the first time called to the page.
>>>
>>> Note: the patch is just optimization and does not fix any problem.
>>
>> Hi Yasuaki,
>>
>> this patch is reasonable to me. I have another question associated to
>> get_page_bootmem(), the question is from another fujitsu guy's patch
>> changelog [commit : 04753278769f3], the changelog said that:
>>
>> 1) When the memmap of removing section is allocated on other
>> section by bootmem, it should/can be free.
>> 2) When the memmap of removing section is allocated on the
>> same section, it shouldn't be freed. Because the section has to be
>> logical memory offlined already and all pages must be isolated
>> against
>> page allocater. If it is freed, page allocator may use it which
>> will
>> be removed physically soon.
>>
>> but I don't see his patch guarantee 2), it means that his patch
>> doesn't guarantee the memmap of removing section which is allocated
>> on other section by bootmem doesn't be freed. Hopefully get your
>> explaination in details, thanks in advance. :-)
>
> In my understanding, the patch does not guarantee it.
> Please see [commit : 0c0a4a517a31e]. free_map_bootmem() in the commit
> guarantees it.
Thanks Yasuaki, I have already seen the commit you mentioned. But the
changelog of the commit I point out 2), why it said that "If it is
freed, page allocator may use it which will be removed physically soon",
does it mean that use-after-free ? AFAK, the isolated pages will be free
if no users use it, so why not free the associated memmap?
>
> Thanks,
> Yasuaki Ishimatsu
>
>>
>>>
>>> CC: David Rientjes <rientjes@google.com>
>>> CC: Jiang Liu <liuj97@gmail.com>
>>> CC: Len Brown <len.brown@intel.com>
>>> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>>> CC: Paul Mackerras <paulus@samba.org>
>>> CC: Christoph Lameter <cl@linux.com>
>>> Cc: Minchan Kim <minchan.kim@gmail.com>
>>> CC: Andrew Morton <akpm@linux-foundation.org>
>>> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>>> CC: Wen Congyang <wency@cn.fujitsu.com>
>>> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>>> ---
>>> mm/memory_hotplug.c | 15 +++++++++++----
>>> 1 files changed, 11 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>> index d736df3..26a5012 100644
>>> --- a/mm/memory_hotplug.c
>>> +++ b/mm/memory_hotplug.c
>>> @@ -95,10 +95,17 @@ static void release_memory_resource(struct
>>> resource *res)
>>> static void get_page_bootmem(unsigned long info, struct page *page,
>>> unsigned long type)
>>> {
>>> - page->lru.next = (struct list_head *) type;
>>> - SetPagePrivate(page);
>>> - set_page_private(page, info);
>>> - atomic_inc(&page->_count);
>>> + unsigned long page_type;
>>> +
>>> + page_type = (unsigned long)page->lru.next;
>>> + if (page_type < MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE ||
>>> + page_type > MEMORY_HOTPLUG_MAX_BOOTMEM_TYPE){
>>> + page->lru.next = (struct list_head *)type;
>>> + SetPagePrivate(page);
>>> + set_page_private(page, info);
>>> + atomic_inc(&page->_count);
>>> + } else
>>> + atomic_inc(&page->_count);
>>> }
>>> /* reference to __meminit __free_pages_bootmem is valid
>>
>
>
>
^ permalink raw reply
* Re: [RFC v9 PATCH 06/21] memory-hotplug: export the function acpi_bus_remove()
From: KOSAKI Motohiro @ 2012-10-02 17:28 UTC (permalink / raw)
To: Ni zhan Chen
Cc: linux-s390, linux-ia64, wency, linux-acpi, linux-sh, len.brown,
x86, linux-kernel, cmetcalf, linux-mm, isimatu.yasuaki, paulus,
minchan.kim, rientjes, sparclinux, cl, linuxppc-dev, akpm, liuj97
In-Reply-To: <506A36A1.6030709@gmail.com>
On Mon, Oct 1, 2012 at 8:34 PM, Ni zhan Chen <nizhan.chen@gmail.com> wrote:
> On 09/05/2012 05:25 PM, wency@cn.fujitsu.com wrote:
>>
>> From: Wen Congyang <wency@cn.fujitsu.com>
>>
>> The function acpi_bus_remove() can remove a acpi device from acpi device.
>
> IIUC, s/acpi device/acpi bus
IIUC, acpi_bus_remove() mean "remove the device from a bus".
^ permalink raw reply
* Re: [RFC v9 PATCH 01/21] memory-hotplug: rename remove_memory() to offline_memory()/offline_pages()
From: KOSAKI Motohiro @ 2012-10-02 17:29 UTC (permalink / raw)
To: Yasuaki Ishimatsu
Cc: linux-s390, linux-ia64, wency, linux-acpi, linux-sh, len.brown,
x86, Ni zhan Chen, linux-kernel, cmetcalf, linux-mm, paulus,
minchan.kim, rientjes, sparclinux, cl, linuxppc-dev, akpm, liuj97
In-Reply-To: <506A4100.7070305@jp.fujitsu.com>
>> Then, you introduced bisect breakage. It is definitely unacceptable.
>
> What is "bisect breakage" meaning?
Think what's happen when only applying path [1/21].
^ permalink raw reply
* [PATCH 0/5] Move some OF functionality from pseries to generic OF code
From: Nathan Fontenot @ 2012-10-02 18:11 UTC (permalink / raw)
To: devicetree-discuss, cbe-oss-dev, LKML, linuxppc-dev
This set of patches moves some OF code that has been living
in the pseries tree over to the generic OF code base. The
functionality being migrated over is something that, I believe,
should live in the generic code base. The specific functionality
being migrated to generic OF code is;
o Updating the device tree in /proc when adding/removing a node.
o Adding a notification chain for adding/removing nodes and
properties of the device tree.
o Re-naming the base OF code prom_* routines to of_* to better go
with the naming used for OF code.
-Nathan
^ permalink raw reply
* Re: Build regressions/improvements in v3.6
From: Geert Uytterhoeven @ 2012-10-02 18:38 UTC (permalink / raw)
To: linux-kernel; +Cc: Chris Zankel, linuxppc-dev
In-Reply-To: <1349202763-1601-1-git-send-email-geert@linux-m68k.org>
On Tue, Oct 2, 2012 at 8:32 PM, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> JFYI, when comparing v3.6 to v3.6-rc7[3], the summaries are:
> - build errors: +4/-1
+ arch/powerpc/platforms/512x/mpc512x_shared.c: error:
'FSL_DIU_PORT_DVI' undeclared (first use in this function): => 189:9
+ arch/powerpc/platforms/512x/mpc512x_shared.c: error: parameter 1
('port') has incomplete type: => 187:54, 83:56, 88:57, 69:56
+ arch/powerpc/platforms/512x/mpc512x_shared.c: error: return type
is an incomplete type: => 187:1
powerpc-randconfig
+ drivers/net/ethernet/realtek/r8169.c: error: expected identifier
before numeric constant: => 451:2
xtensa-allmodconfig (hmm, this is not a new one, it felt through the
cracks in -rc7 because
of log line interleaving).
Ugh, arch/xtensa/include/asm/regs.h defines way to generic symbols,
like "MISC" (which
causes the above breakage), and even a few 2-letter symbols, which
surprisingly don't
cause conflicts...
> [1] http://kisskb.ellerman.id.au/kisskb/head/5469/ (all 117 configs)
> [3] http://kisskb.ellerman.id.au/kisskb/head/5445/ (all 117 configs)
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply
* Re: [REGRESSION] nfsd crashing with 3.6.0-rc7 on PowerPC
From: Nishanth Aravamudan @ 2012-10-02 21:43 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: linux-nfs, Jan Kara, linuxppc-dev, Alexander Graf, LKML List,
J. Bruce Fields, anton, skinsbursky, bfields, Linus Torvalds
In-Reply-To: <1349139509.3847.2.camel@pasglop>
Hi Ben,
On 02.10.2012 [10:58:29 +1000], Benjamin Herrenschmidt wrote:
> On Mon, 2012-10-01 at 16:03 +0200, Alexander Graf wrote:
> > Phew. Here we go :). It looks to be more of a PPC specific problem
> > than it appeared as at first:
>
> Ok, so I suspect the problem is the pushing down of the locks which
> breaks with iommu backends that have a separate flush callback. In
> that case, the flush moves out of the allocator lock.
>
> Now we do call flush before we return, still, but it becomes racy
> I suspect, but somebody needs to give it a closer look. I'm hoping
> Anton or Nish will later today.
Started looking into this. If your suspicion were accurate, wouldn't the
bisection have stopped at 0e4bc95d87394364f408627067238453830bdbf3
("powerpc/iommu: Reduce spinlock coverage in iommu_alloc and
iommu_free")?
Alex, the error is reproducible, right? Does it go away by reverting
that commit against mainline? Just trying to narrow down my focus.
Thanks,
Nish
^ permalink raw reply
* Re: [REGRESSION] nfsd crashing with 3.6.0-rc7 on PowerPC
From: Alexander Graf @ 2012-10-02 21:47 UTC (permalink / raw)
To: Nishanth Aravamudan
Cc: linux-nfs, Jan Kara, linuxppc-dev, LKML List, J. Bruce Fields,
anton, skinsbursky, bfields, Linus Torvalds
In-Reply-To: <20121002214327.GA29218@linux.vnet.ibm.com>
On 02.10.2012, at 23:43, Nishanth Aravamudan wrote:
> Hi Ben,
>=20
> On 02.10.2012 [10:58:29 +1000], Benjamin Herrenschmidt wrote:
>> On Mon, 2012-10-01 at 16:03 +0200, Alexander Graf wrote:
>>> Phew. Here we go :). It looks to be more of a PPC specific problem
>>> than it appeared as at first:
>>=20
>> Ok, so I suspect the problem is the pushing down of the locks which
>> breaks with iommu backends that have a separate flush callback. In
>> that case, the flush moves out of the allocator lock.
>>=20
>> Now we do call flush before we return, still, but it becomes racy
>> I suspect, but somebody needs to give it a closer look. I'm hoping
>> Anton or Nish will later today.
>=20
> Started looking into this. If your suspicion were accurate, wouldn't =
the
> bisection have stopped at 0e4bc95d87394364f408627067238453830bdbf3
> ("powerpc/iommu: Reduce spinlock coverage in iommu_alloc and
> iommu_free")?
>=20
> Alex, the error is reproducible, right?
Yes. I'm having a hard time to figure out if the reason my U4 based G5 =
Mac crashes and fails reading data is the same since I don't have a =
serial connection there, but I assume so.
> Does it go away by reverting
> that commit against mainline? Just trying to narrow down my focus.
The patch doesn't revert that easily. Mind to provide a revert patch so =
I can try?
Alex
^ permalink raw reply
* RE: [LKML] Re: Build regressions/improvements in v3.6
From: Marc Gauthier @ 2012-10-02 21:33 UTC (permalink / raw)
To: Geert Uytterhoeven, linux-kernel@vger.kernel.org
Cc: Chris Zankel, linuxppc-dev@lists.ozlabs.org, Max Filippov
In-Reply-To: <CAMuHMdXcMVkcqTJdH2zdo9=a2efeeQmPGUr9GZqRMuRrAuMBvw@mail.gmail.com>
Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> xtensa-allmodconfig (hmm, this is not a new one, it felt through the
> cracks in -rc7 because of log line interleaving).
>
> Ugh, arch/xtensa/include/asm/regs.h defines way to generic symbols,
> like "MISC" (which causes the above breakage), and even a few
> 2-letter symbols, which surprisingly don't cause conflicts...
This particular one is already fixed in the xtensa linux-next tree
at git://github.com/czankel/xtensa-linux.git#for_next
(commit 36c74c2a16678e9ee6f08ef89eeebfdd1a96693d).
The short names in that header are for historical reasons.
It is indeed due for a cleanup.
-Marc
^ permalink raw reply
* Re: [REGRESSION] nfsd crashing with 3.6.0-rc7 on PowerPC
From: Nishanth Aravamudan @ 2012-10-02 22:17 UTC (permalink / raw)
To: Alexander Graf
Cc: linux-nfs, Jan Kara, linuxppc-dev, LKML List, J. Bruce Fields,
anton, skinsbursky, bfields, Linus Torvalds
In-Reply-To: <9257E705-4EF9-4347-945C-B4A7582C427F@suse.de>
On 02.10.2012 [23:47:39 +0200], Alexander Graf wrote:
>
> On 02.10.2012, at 23:43, Nishanth Aravamudan wrote:
>
> > Hi Ben,
> >
> > On 02.10.2012 [10:58:29 +1000], Benjamin Herrenschmidt wrote:
> >> On Mon, 2012-10-01 at 16:03 +0200, Alexander Graf wrote:
> >>> Phew. Here we go :). It looks to be more of a PPC specific problem
> >>> than it appeared as at first:
> >>
> >> Ok, so I suspect the problem is the pushing down of the locks which
> >> breaks with iommu backends that have a separate flush callback. In
> >> that case, the flush moves out of the allocator lock.
> >>
> >> Now we do call flush before we return, still, but it becomes racy
> >> I suspect, but somebody needs to give it a closer look. I'm hoping
> >> Anton or Nish will later today.
> >
> > Started looking into this. If your suspicion were accurate, wouldn't the
> > bisection have stopped at 0e4bc95d87394364f408627067238453830bdbf3
> > ("powerpc/iommu: Reduce spinlock coverage in iommu_alloc and
> > iommu_free")?
> >
> > Alex, the error is reproducible, right?
>
> Yes. I'm having a hard time to figure out if the reason my U4 based G5
> Mac crashes and fails reading data is the same since I don't have a
> serial connection there, but I assume so.
Ok, great, thanks. Yeah, that would imply (I think) that the I would
have thought the lock pushdown in the above commit (or even in one of
the others in Anton's series) would have been the real source if it was
a lock-based race. But that's just my first sniff at what Ben was
suggesting. Still reading/understanding the code.
> > Does it go away by reverting
> > that commit against mainline? Just trying to narrow down my focus.
>
> The patch doesn't revert that easily. Mind to provide a revert patch
> so I can try?
The following at least builds on defconfig here:
diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index cbfe678..957a83f 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -53,16 +53,6 @@ static __inline__ __attribute_const__ int get_iommu_order(unsigned long size)
*/
#define IOMAP_MAX_ORDER 13
-#define IOMMU_POOL_HASHBITS 2
-#define IOMMU_NR_POOLS (1 << IOMMU_POOL_HASHBITS)
-
-struct iommu_pool {
- unsigned long start;
- unsigned long end;
- unsigned long hint;
- spinlock_t lock;
-} ____cacheline_aligned_in_smp;
-
struct iommu_table {
unsigned long it_busno; /* Bus number this table belongs to */
unsigned long it_size; /* Size of iommu table in entries */
@@ -71,10 +61,10 @@ struct iommu_table {
unsigned long it_index; /* which iommu table this is */
unsigned long it_type; /* type: PCI or Virtual Bus */
unsigned long it_blocksize; /* Entries in each block (cacheline) */
- unsigned long poolsize;
- unsigned long nr_pools;
- struct iommu_pool large_pool;
- struct iommu_pool pools[IOMMU_NR_POOLS];
+ unsigned long it_hint; /* Hint for next alloc */
+ unsigned long it_largehint; /* Hint for large allocs */
+ unsigned long it_halfpoint; /* Breaking point for small/large allocs */
+ spinlock_t it_lock; /* Protects it_map */
unsigned long *it_map; /* A simple allocation bitmap for now */
};
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index ff5a6ce..9a31f3c 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -62,26 +62,6 @@ static int __init setup_iommu(char *str)
__setup("iommu=", setup_iommu);
-static DEFINE_PER_CPU(unsigned int, iommu_pool_hash);
-
-/*
- * We precalculate the hash to avoid doing it on every allocation.
- *
- * The hash is important to spread CPUs across all the pools. For example,
- * on a POWER7 with 4 way SMT we want interrupts on the primary threads and
- * with 4 pools all primary threads would map to the same pool.
- */
-static int __init setup_iommu_pool_hash(void)
-{
- unsigned int i;
-
- for_each_possible_cpu(i)
- per_cpu(iommu_pool_hash, i) = hash_32(i, IOMMU_POOL_HASHBITS);
-
- return 0;
-}
-subsys_initcall(setup_iommu_pool_hash);
-
#ifdef CONFIG_FAIL_IOMMU
static DECLARE_FAULT_ATTR(fail_iommu);
@@ -184,8 +164,6 @@ static unsigned long iommu_range_alloc(struct device *dev,
unsigned long align_mask;
unsigned long boundary_size;
unsigned long flags;
- unsigned int pool_nr;
- struct iommu_pool *pool;
align_mask = 0xffffffffffffffffl >> (64 - align_order);
@@ -201,46 +179,38 @@ static unsigned long iommu_range_alloc(struct device *dev,
if (should_fail_iommu(dev))
return DMA_ERROR_CODE;
- /*
- * We don't need to disable preemption here because any CPU can
- * safely use any IOMMU pool.
- */
- pool_nr = __raw_get_cpu_var(iommu_pool_hash) & (tbl->nr_pools - 1);
-
- if (largealloc)
- pool = &(tbl->large_pool);
- else
- pool = &(tbl->pools[pool_nr]);
-
- spin_lock_irqsave(&(pool->lock), flags);
+ spin_lock_irqsave(&(tbl->it_lock), flags);
-again:
- if ((pass == 0) && handle && *handle)
+ if (handle && *handle)
start = *handle;
else
- start = pool->hint;
+ start = largealloc ? tbl->it_largehint : tbl->it_hint;
- limit = pool->end;
+ /* Use only half of the table for small allocs (15 pages or less) */
+ limit = largealloc ? tbl->it_size : tbl->it_halfpoint;
+
+ if (largealloc && start < tbl->it_halfpoint)
+ start = tbl->it_halfpoint;
/* The case below can happen if we have a small segment appended
* to a large, or when the previous alloc was at the very end of
* the available space. If so, go back to the initial start.
*/
if (start >= limit)
- start = pool->start;
+ start = largealloc ? tbl->it_largehint : tbl->it_hint;
+
+ again:
if (limit + tbl->it_offset > mask) {
limit = mask - tbl->it_offset + 1;
/* If we're constrained on address range, first try
* at the masked hint to avoid O(n) search complexity,
- * but on second pass, start at 0 in pool 0.
+ * but on second pass, start at 0.
*/
- if ((start & mask) >= limit || pass > 0) {
- pool = &(tbl->pools[0]);
- start = pool->start;
- } else {
+ if ((start & mask) >= limit || pass > 0)
+ start = 0;
+ else
start &= mask;
- }
}
if (dev)
@@ -254,25 +224,17 @@ again:
tbl->it_offset, boundary_size >> IOMMU_PAGE_SHIFT,
align_mask);
if (n == -1) {
- if (likely(pass == 0)) {
- /* First try the pool from the start */
- pool->hint = pool->start;
- pass++;
- goto again;
-
- } else if (pass <= tbl->nr_pools) {
- /* Now try scanning all the other pools */
- spin_unlock(&(pool->lock));
- pool_nr = (pool_nr + 1) & (tbl->nr_pools - 1);
- pool = &tbl->pools[pool_nr];
- spin_lock(&(pool->lock));
- pool->hint = pool->start;
+ if (likely(pass < 2)) {
+ /* First failure, just rescan the half of the table.
+ * Second failure, rescan the other half of the table.
+ */
+ start = (largealloc ^ pass) ? tbl->it_halfpoint : 0;
+ limit = pass ? tbl->it_size : limit;
pass++;
goto again;
-
} else {
- /* Give up */
- spin_unlock_irqrestore(&(pool->lock), flags);
+ /* Third failure, give up */
+ spin_unlock_irqrestore(&(tbl->it_lock), flags);
return DMA_ERROR_CODE;
}
}
@@ -282,10 +244,10 @@ again:
/* Bump the hint to a new block for small allocs. */
if (largealloc) {
/* Don't bump to new block to avoid fragmentation */
- pool->hint = end;
+ tbl->it_largehint = end;
} else {
/* Overflow will be taken care of at the next allocation */
- pool->hint = (end + tbl->it_blocksize - 1) &
+ tbl->it_hint = (end + tbl->it_blocksize - 1) &
~(tbl->it_blocksize - 1);
}
@@ -293,8 +255,7 @@ again:
if (handle)
*handle = end;
- spin_unlock_irqrestore(&(pool->lock), flags);
-
+ spin_unlock_irqrestore(&(tbl->it_lock), flags);
return n;
}
@@ -369,45 +330,23 @@ static bool iommu_free_check(struct iommu_table *tbl, dma_addr_t dma_addr,
return true;
}
-static struct iommu_pool *get_pool(struct iommu_table *tbl,
- unsigned long entry)
-{
- struct iommu_pool *p;
- unsigned long largepool_start = tbl->large_pool.start;
-
- /* The large pool is the last pool at the top of the table */
- if (entry >= largepool_start) {
- p = &tbl->large_pool;
- } else {
- unsigned int pool_nr = entry / tbl->poolsize;
-
- BUG_ON(pool_nr > tbl->nr_pools);
- p = &tbl->pools[pool_nr];
- }
-
- return p;
-}
-
static void __iommu_free(struct iommu_table *tbl, dma_addr_t dma_addr,
unsigned int npages)
{
unsigned long entry, free_entry;
unsigned long flags;
- struct iommu_pool *pool;
entry = dma_addr >> IOMMU_PAGE_SHIFT;
free_entry = entry - tbl->it_offset;
- pool = get_pool(tbl, free_entry);
-
if (!iommu_free_check(tbl, dma_addr, npages))
return;
ppc_md.tce_free(tbl, entry, npages);
- spin_lock_irqsave(&(pool->lock), flags);
+ spin_lock_irqsave(&(tbl->it_lock), flags);
bitmap_clear(tbl->it_map, free_entry, npages);
- spin_unlock_irqrestore(&(pool->lock), flags);
+ spin_unlock_irqrestore(&(tbl->it_lock), flags);
}
static void iommu_free(struct iommu_table *tbl, dma_addr_t dma_addr,
@@ -649,8 +588,9 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid)
unsigned long sz;
static int welcomed = 0;
struct page *page;
- unsigned int i;
- struct iommu_pool *p;
+
+ /* Set aside 1/4 of the table for large allocations. */
+ tbl->it_halfpoint = tbl->it_size * 3 / 4;
/* number of bytes needed for the bitmap */
sz = (tbl->it_size + 7) >> 3;
@@ -669,28 +609,9 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid)
if (tbl->it_offset == 0)
set_bit(0, tbl->it_map);
- /* We only split the IOMMU table if we have 1GB or more of space */
- if ((tbl->it_size << IOMMU_PAGE_SHIFT) >= (1UL * 1024 * 1024 * 1024))
- tbl->nr_pools = IOMMU_NR_POOLS;
- else
- tbl->nr_pools = 1;
-
- /* We reserve the top 1/4 of the table for large allocations */
- tbl->poolsize = (tbl->it_size * 3 / 4) / tbl->nr_pools;
-
- for (i = 0; i < tbl->nr_pools; i++) {
- p = &tbl->pools[i];
- spin_lock_init(&(p->lock));
- p->start = tbl->poolsize * i;
- p->hint = p->start;
- p->end = p->start + tbl->poolsize;
- }
-
- p = &tbl->large_pool;
- spin_lock_init(&(p->lock));
- p->start = tbl->poolsize * i;
- p->hint = p->start;
- p->end = tbl->it_size;
+ tbl->it_hint = 0;
+ tbl->it_largehint = tbl->it_halfpoint;
+ spin_lock_init(&tbl->it_lock);
iommu_table_clear(tbl);
diff --git a/arch/powerpc/platforms/cell/iommu.c b/arch/powerpc/platforms/cell/iommu.c
index dca2136..b673200 100644
--- a/arch/powerpc/platforms/cell/iommu.c
+++ b/arch/powerpc/platforms/cell/iommu.c
@@ -518,6 +518,7 @@ cell_iommu_setup_window(struct cbe_iommu *iommu, struct device_node *np,
__set_bit(0, window->table.it_map);
tce_build_cell(&window->table, window->table.it_offset, 1,
(unsigned long)iommu->pad_page, DMA_TO_DEVICE, NULL);
+ window->table.it_hint = window->table.it_blocksize;
return window;
}
^ permalink raw reply related
* Re: [REGRESSION] nfsd crashing with 3.6.0-rc7 on PowerPC
From: Alexander Graf @ 2012-10-02 22:31 UTC (permalink / raw)
To: Nishanth Aravamudan
Cc: linux-nfs, Jan Kara, linuxppc-dev, LKML List, J. Bruce Fields,
anton, skinsbursky, bfields, Linus Torvalds
In-Reply-To: <20121002221736.GB29218@linux.vnet.ibm.com>
On 03.10.2012, at 00:17, Nishanth Aravamudan wrote:
> On 02.10.2012 [23:47:39 +0200], Alexander Graf wrote:
>>
>> On 02.10.2012, at 23:43, Nishanth Aravamudan wrote:
>>
>>> Hi Ben,
>>>
>>> On 02.10.2012 [10:58:29 +1000], Benjamin Herrenschmidt wrote:
>>>> On Mon, 2012-10-01 at 16:03 +0200, Alexander Graf wrote:
>>>>> Phew. Here we go :). It looks to be more of a PPC specific problem
>>>>> than it appeared as at first:
>>>>
>>>> Ok, so I suspect the problem is the pushing down of the locks which
>>>> breaks with iommu backends that have a separate flush callback. In
>>>> that case, the flush moves out of the allocator lock.
>>>>
>>>> Now we do call flush before we return, still, but it becomes racy
>>>> I suspect, but somebody needs to give it a closer look. I'm hoping
>>>> Anton or Nish will later today.
>>>
>>> Started looking into this. If your suspicion were accurate, wouldn't the
>>> bisection have stopped at 0e4bc95d87394364f408627067238453830bdbf3
>>> ("powerpc/iommu: Reduce spinlock coverage in iommu_alloc and
>>> iommu_free")?
>>>
>>> Alex, the error is reproducible, right?
>>
>> Yes. I'm having a hard time to figure out if the reason my U4 based G5
>> Mac crashes and fails reading data is the same since I don't have a
>> serial connection there, but I assume so.
>
> Ok, great, thanks. Yeah, that would imply (I think) that the I would
> have thought the lock pushdown in the above commit (or even in one of
> the others in Anton's series) would have been the real source if it was
> a lock-based race. But that's just my first sniff at what Ben was
> suggesting. Still reading/understanding the code.
>
>>> Does it go away by reverting
>>> that commit against mainline? Just trying to narrow down my focus.
>>
>> The patch doesn't revert that easily. Mind to provide a revert patch
>> so I can try?
>
> The following at least builds on defconfig here:
Yes. With that patch applied, things work for me again.
Alex
^ permalink raw reply
* [PATCH] powerpc/47x: Use the new ppc-opcode intrastructiure.
From: Tony Breeds @ 2012-10-03 1:52 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Josh Boyer; +Cc: LinuxPPC-dev
Don't use 47x only #defines for TLBIVAX or ICBT, supply and use helpers
in ppc-opcode.h
This fixes a compile breakage.
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
---
arch/powerpc/include/asm/ppc-opcode.h | 4 ++++
arch/powerpc/mm/tlb_nohash_low.S | 15 ++++-----------
2 files changed, 8 insertions(+), 11 deletions(-)
diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
index 4c25319..b5ed49c 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -86,6 +86,7 @@
#define PPC_INST_DCBA_MASK 0xfc0007fe
#define PPC_INST_DCBAL 0x7c2005ec
#define PPC_INST_DCBZL 0x7c2007ec
+#define PPC_INST_ICBT 0x7c00002c
#define PPC_INST_ISEL 0x7c00001e
#define PPC_INST_ISEL_MASK 0xfc00003e
#define PPC_INST_LDARX 0x7c0000a8
@@ -197,6 +198,7 @@
#define __PPC_MB(s) (((s) & 0x1f) << 6)
#define __PPC_ME(s) (((s) & 0x1f) << 1)
#define __PPC_BI(s) (((s) & 0x1f) << 16)
+#define __PPC_CT(t) (((t) & 0x0f) << 21)
/*
* Only use the larx hint bit on 64bit CPUs. e500v1/v2 based CPUs will treat a
@@ -259,6 +261,8 @@
__PPC_RS(t) | __PPC_RA0(a) | __PPC_RB(b))
#define PPC_SLBFEE_DOT(t, b) stringify_in_c(.long PPC_INST_SLBFEE | \
__PPC_RT(t) | __PPC_RB(b))
+#define PPC_ICBT(c,a,b) stringify_in_c(.long PPC_INST_ICBT | \
+ __PPC_CT(c) | __PPC_RA0(a) | __PPC_RB(b))
/* PASemi instructions */
#define LBZCIX(t,a,b) stringify_in_c(.long PPC_INST_LBZCIX | \
__PPC_RT(t) | __PPC_RA(a) | __PPC_RB(b))
diff --git a/arch/powerpc/mm/tlb_nohash_low.S b/arch/powerpc/mm/tlb_nohash_low.S
index fab919f..626ad08 100644
--- a/arch/powerpc/mm/tlb_nohash_low.S
+++ b/arch/powerpc/mm/tlb_nohash_low.S
@@ -191,12 +191,6 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_TYPE_47x)
#ifdef CONFIG_PPC_47x
/*
- * 47x variant of icbt
- */
-# define ICBT(CT,RA,RB) \
- .long 0x7c00002c | ((CT) << 21) | ((RA) << 16) | ((RB) << 11)
-
-/*
* _tlbivax_bcast is only on 47x. We don't bother doing a runtime
* check though, it will blow up soon enough if we mistakenly try
* to use it on a 440.
@@ -208,8 +202,7 @@ _GLOBAL(_tlbivax_bcast)
wrteei 0
mtspr SPRN_MMUCR,r5
isync
-/* tlbivax 0,r3 - use .long to avoid binutils deps */
- .long 0x7c000624 | (r3 << 11)
+ PPC_TLBIVAX(0, R3)
isync
eieio
tlbsync
@@ -227,11 +220,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_476_DD2)
bl 2f
2: mflr r6
li r7,32
- ICBT(0,r6,r7) /* touch next cache line */
+ PPC_ICBT(0,R6,R7) /* touch next cache line */
add r6,r6,r7
- ICBT(0,r6,r7) /* touch next cache line */
+ PPC_ICBT(0,R6,R7) /* touch next cache line */
add r6,r6,r7
- ICBT(0,r6,r7) /* touch next cache line */
+ PPC_ICBT(0,R6,R7) /* touch next cache line */
sync
nop
nop
--
1.7.7.6
^ permalink raw reply related
* [PATCH] powerpc: Add asm/debug.h to get powerpc_debugfs_root
From: Tony Breeds @ 2012-10-03 1:52 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Josh Boyer; +Cc: LinuxPPC-dev
Since the "Disintegrate asm/system.h for PowerPC"
(ae3a197e3d0bfe3f4bf1693723e82dc018c096f3) This has been failing when
DEBUG is #defined.
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
---
arch/powerpc/kernel/prom.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 8dfd43f..3bd6c84 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -57,6 +57,7 @@
#include <mm/mmu_decl.h>
#include <asm/udbg.h>
+#include <asm/debug.h>
#ifdef DEBUG
#define DBG(fmt...) udbg_printf(fmt)
--
1.7.7.6
^ permalink raw reply related
* [PATCH 1/5] Add /proc device tree updating to of node add/remove
From: Nathan Fontenot @ 2012-10-03 2:55 UTC (permalink / raw)
To: devicetree-discuss, cbe-oss-dev, LKML, linuxppc-dev
In-Reply-To: <506B2E63.5090900@linux.vnet.ibm.com>
When adding or removing a device tree node we should also update
the device tree in /proc/device-tree. This action is already done in the
generic OF code for adding/removing properties of a node. This patch adds
this functionality for nodes.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
arch/powerpc/platforms/pseries/dlpar.c | 24 -------------
arch/powerpc/platforms/pseries/reconfig.c | 47 -------------------------
drivers/of/base.c | 55 +++++++++++++++++++++++++++---
3 files changed, 51 insertions(+), 75 deletions(-)
Index: dt-next/arch/powerpc/platforms/pseries/dlpar.c
===================================================================
--- dt-next.orig/arch/powerpc/platforms/pseries/dlpar.c 2012-10-02 08:30:23.000000000 -0500
+++ dt-next/arch/powerpc/platforms/pseries/dlpar.c 2012-10-02 08:40:51.000000000 -0500
@@ -13,7 +13,6 @@
#include <linux/kernel.h>
#include <linux/kref.h>
#include <linux/notifier.h>
-#include <linux/proc_fs.h>
#include <linux/spinlock.h>
#include <linux/cpu.h>
#include <linux/slab.h>
@@ -255,9 +254,6 @@
int dlpar_attach_node(struct device_node *dn)
{
-#ifdef CONFIG_PROC_DEVICETREE
- struct proc_dir_entry *ent;
-#endif
int rc;
of_node_set_flag(dn, OF_DYNAMIC);
@@ -274,32 +270,12 @@
}
of_attach_node(dn);
-
-#ifdef CONFIG_PROC_DEVICETREE
- ent = proc_mkdir(strrchr(dn->full_name, '/') + 1, dn->parent->pde);
- if (ent)
- proc_device_tree_add_node(dn, ent);
-#endif
-
of_node_put(dn->parent);
return 0;
}
int dlpar_detach_node(struct device_node *dn)
{
-#ifdef CONFIG_PROC_DEVICETREE
- struct device_node *parent = dn->parent;
- struct property *prop = dn->properties;
-
- while (prop) {
- remove_proc_entry(prop->name, dn->pde);
- prop = prop->next;
- }
-
- if (dn->pde)
- remove_proc_entry(dn->pde->name, parent->pde);
-#endif
-
pSeries_reconfig_notify(PSERIES_RECONFIG_REMOVE, dn);
of_detach_node(dn);
of_node_put(dn); /* Must decrement the refcount */
Index: dt-next/arch/powerpc/platforms/pseries/reconfig.c
===================================================================
--- dt-next.orig/arch/powerpc/platforms/pseries/reconfig.c 2012-10-02 08:30:23.000000000 -0500
+++ dt-next/arch/powerpc/platforms/pseries/reconfig.c 2012-10-02 08:40:51.000000000 -0500
@@ -23,48 +23,6 @@
#include <asm/pSeries_reconfig.h>
#include <asm/mmu.h>
-
-
-/*
- * Routines for "runtime" addition and removal of device tree nodes.
- */
-#ifdef CONFIG_PROC_DEVICETREE
-/*
- * Add a node to /proc/device-tree.
- */
-static void add_node_proc_entries(struct device_node *np)
-{
- struct proc_dir_entry *ent;
-
- ent = proc_mkdir(strrchr(np->full_name, '/') + 1, np->parent->pde);
- if (ent)
- proc_device_tree_add_node(np, ent);
-}
-
-static void remove_node_proc_entries(struct device_node *np)
-{
- struct property *pp = np->properties;
- struct device_node *parent = np->parent;
-
- while (pp) {
- remove_proc_entry(pp->name, np->pde);
- pp = pp->next;
- }
- if (np->pde)
- remove_proc_entry(np->pde->name, parent->pde);
-}
-#else /* !CONFIG_PROC_DEVICETREE */
-static void add_node_proc_entries(struct device_node *np)
-{
- return;
-}
-
-static void remove_node_proc_entries(struct device_node *np)
-{
- return;
-}
-#endif /* CONFIG_PROC_DEVICETREE */
-
/**
* derive_parent - basically like dirname(1)
* @path: the full_name of a node to be added to the tree
@@ -149,9 +107,6 @@
}
of_attach_node(np);
-
- add_node_proc_entries(np);
-
of_node_put(np->parent);
return 0;
@@ -179,8 +134,6 @@
return -EBUSY;
}
- remove_node_proc_entries(np);
-
pSeries_reconfig_notify(PSERIES_RECONFIG_REMOVE, np);
of_detach_node(np);
Index: dt-next/drivers/of/base.c
===================================================================
--- dt-next.orig/drivers/of/base.c 2012-10-02 08:30:47.000000000 -0500
+++ dt-next/drivers/of/base.c 2012-10-02 08:40:51.000000000 -0500
@@ -1103,6 +1103,22 @@
* device tree nodes.
*/
+#ifdef CONFIG_PROC_DEVICETREE
+static void of_add_proc_dt_entry(struct device_node *dn)
+{
+ struct proc_dir_entry *ent;
+
+ ent = proc_mkdir(strrchr(dn->full_name, '/') + 1, dn->parent->pde);
+ if (ent)
+ proc_device_tree_add_node(dn, ent);
+}
+#else
+static void of_add_proc_dt_entry(struct device_node *dn)
+{
+ return;
+}
+#endif
+
/**
* of_attach_node - Plug a device node into the tree and global list.
*/
@@ -1116,7 +1132,30 @@
np->parent->child = np;
allnodes = np;
write_unlock_irqrestore(&devtree_lock, flags);
+
+ of_add_proc_dt_entry(np);
+}
+
+#ifdef CONFIG_PROC_DEVICETREE
+static void of_remove_proc_dt_entry(struct device_node *dn)
+{
+ struct device_node *parent = dn->parent;
+ struct property *prop = dn->properties;
+
+ while (prop) {
+ remove_proc_entry(prop->name, dn->pde);
+ prop = prop->next;
+ }
+
+ if (dn->pde)
+ remove_proc_entry(dn->pde->name, parent->pde);
+}
+#else
+static void of_remove_proc_dt_entry(struct device_node *dn)
+{
+ return;
}
+#endif
/**
* of_detach_node - "Unplug" a node from the device tree.
@@ -1131,9 +1170,17 @@
write_lock_irqsave(&devtree_lock, flags);
+ if (of_node_check_flag(np, OF_DETACHED)) {
+ /* someone already detached it */
+ write_unlock_irqrestore(&devtree_lock, flags);
+ return;
+ }
+
parent = np->parent;
- if (!parent)
- goto out_unlock;
+ if (!parent) {
+ write_unlock_irqrestore(&devtree_lock, flags);
+ return;
+ }
if (allnodes == np)
allnodes = np->allnext;
@@ -1158,9 +1205,9 @@
}
of_node_set_flag(np, OF_DETACHED);
-
-out_unlock:
write_unlock_irqrestore(&devtree_lock, flags);
+
+ of_remove_proc_dt_entry(np);
}
#endif /* defined(CONFIG_OF_DYNAMIC) */
^ permalink raw reply
* [PATCH 2/5] Move of_drconf_cell struct definition to asm/prom.h
From: Nathan Fontenot @ 2012-10-03 2:56 UTC (permalink / raw)
To: devicetree-discuss, cbe-oss-dev, LKML, linuxppc-dev
In-Reply-To: <506B2E63.5090900@linux.vnet.ibm.com>
This patch moves the definition of the of_drconf_cell struct to asm/prom.h
to make it available for all powerpc/pseries code.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/prom.h | 16 ++++++++++++++++
arch/powerpc/mm/numa.c | 12 ------------
2 files changed, 16 insertions(+), 12 deletions(-)
Index: dt-next/arch/powerpc/mm/numa.c
===================================================================
--- dt-next.orig/arch/powerpc/mm/numa.c 2012-10-02 08:30:23.000000000 -0500
+++ dt-next/arch/powerpc/mm/numa.c 2012-10-02 08:41:42.000000000 -0500
@@ -397,18 +397,6 @@
return result;
}
-struct of_drconf_cell {
- u64 base_addr;
- u32 drc_index;
- u32 reserved;
- u32 aa_index;
- u32 flags;
-};
-
-#define DRCONF_MEM_ASSIGNED 0x00000008
-#define DRCONF_MEM_AI_INVALID 0x00000040
-#define DRCONF_MEM_RESERVED 0x00000080
-
/*
* Read the next memblock list entry from the ibm,dynamic-memory property
* and return the information in the provided of_drconf_cell structure.
Index: dt-next/arch/powerpc/include/asm/prom.h
===================================================================
--- dt-next.orig/arch/powerpc/include/asm/prom.h 2011-11-17 09:12:07.000000000 -0600
+++ dt-next/arch/powerpc/include/asm/prom.h 2012-10-02 08:41:42.000000000 -0500
@@ -58,6 +58,22 @@
extern void of_instantiate_rtc(void);
+/* The of_drconf_cell struct defines the layout of the LMB array
+ * specified in the device tree property
+ * ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory
+ */
+struct of_drconf_cell {
+ u64 base_addr;
+ u32 drc_index;
+ u32 reserved;
+ u32 aa_index;
+ u32 flags;
+};
+
+#define DRCONF_MEM_ASSIGNED 0x00000008
+#define DRCONF_MEM_AI_INVALID 0x00000040
+#define DRCONF_MEM_RESERVED 0x00000080
+
/* These includes are put at the bottom because they may contain things
* that are overridden by this file. Ideally they shouldn't be included
* by this file, but there are a bunch of .c files that currently depend
^ permalink raw reply
* [PATCH 3/5] Add of node/property notification chain for adds and removes
From: Nathan Fontenot @ 2012-10-03 2:57 UTC (permalink / raw)
To: devicetree-discuss, cbe-oss-dev, LKML, linuxppc-dev
In-Reply-To: <506B2E63.5090900@linux.vnet.ibm.com>
This patch moves the notification chain for updates to the device tree
from the powerpc/pseries code to the base OF code. This makes this
functionality available to all architectures.
Additionally the notification chain is updated to allow notifications
for property add/remove/update. To make this work a pointer to a new
struct (of_prop_reconfig) is passed to the routines in the notification chain.
The of_prop_reconfig property contains a pointer to the node containing the
property and a pointer to the property itself. In the case of property
updates, the property pointer refers to the new property.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/pSeries_reconfig.h | 32 ----------
arch/powerpc/kernel/prom.c | 6 -
arch/powerpc/platforms/pseries/dlpar.c | 14 ++--
arch/powerpc/platforms/pseries/hotplug-cpu.c | 8 +-
arch/powerpc/platforms/pseries/hotplug-memory.c | 60 +++++++++++++------
arch/powerpc/platforms/pseries/iommu.c | 6 -
arch/powerpc/platforms/pseries/reconfig.c | 65 ---------------------
arch/powerpc/platforms/pseries/setup.c | 6 -
drivers/of/base.c | 74 ++++++++++++++++++++++--
include/linux/of.h | 20 +++++-
10 files changed, 154 insertions(+), 137 deletions(-)
Index: dt-next/arch/powerpc/platforms/pseries/reconfig.c
===================================================================
--- dt-next.orig/arch/powerpc/platforms/pseries/reconfig.c 2012-10-02 08:40:51.000000000 -0500
+++ dt-next/arch/powerpc/platforms/pseries/reconfig.c 2012-10-02 08:45:12.000000000 -0500
@@ -16,11 +16,11 @@
#include <linux/notifier.h>
#include <linux/proc_fs.h>
#include <linux/slab.h>
+#include <linux/of.h>
#include <asm/prom.h>
#include <asm/machdep.h>
#include <asm/uaccess.h>
-#include <asm/pSeries_reconfig.h>
#include <asm/mmu.h>
/**
@@ -55,28 +55,6 @@
return parent;
}
-static BLOCKING_NOTIFIER_HEAD(pSeries_reconfig_chain);
-
-int pSeries_reconfig_notifier_register(struct notifier_block *nb)
-{
- return blocking_notifier_chain_register(&pSeries_reconfig_chain, nb);
-}
-EXPORT_SYMBOL_GPL(pSeries_reconfig_notifier_register);
-
-void pSeries_reconfig_notifier_unregister(struct notifier_block *nb)
-{
- blocking_notifier_chain_unregister(&pSeries_reconfig_chain, nb);
-}
-EXPORT_SYMBOL_GPL(pSeries_reconfig_notifier_unregister);
-
-int pSeries_reconfig_notify(unsigned long action, void *p)
-{
- int err = blocking_notifier_call_chain(&pSeries_reconfig_chain,
- action, p);
-
- return notifier_to_errno(err);
-}
-
static int pSeries_reconfig_add_node(const char *path, struct property *proplist)
{
struct device_node *np;
@@ -100,13 +78,12 @@
goto out_err;
}
- err = pSeries_reconfig_notify(PSERIES_RECONFIG_ADD, np);
+ err = of_attach_node(np);
if (err) {
printk(KERN_ERR "Failed to add device node %s\n", path);
goto out_err;
}
- of_attach_node(np);
of_node_put(np->parent);
return 0;
@@ -134,9 +111,7 @@
return -EBUSY;
}
- pSeries_reconfig_notify(PSERIES_RECONFIG_REMOVE, np);
of_detach_node(np);
-
of_node_put(parent);
of_node_put(np); /* Must decrement the refcount */
return 0;
@@ -381,7 +356,6 @@
static int do_update_property(char *buf, size_t bufsize)
{
struct device_node *np;
- struct pSeries_reconfig_prop_update upd_value;
unsigned char *value;
char *name, *end, *next_prop;
int rc, length;
@@ -410,41 +384,8 @@
return -ENODEV;
}
- upd_value.node = np;
- upd_value.property = newprop;
- pSeries_reconfig_notify(PSERIES_UPDATE_PROPERTY, &upd_value);
-
rc = prom_update_property(np, newprop, oldprop);
- if (rc)
- return rc;
-
- /* For memory under the ibm,dynamic-reconfiguration-memory node
- * of the device tree, adding and removing memory is just an update
- * to the ibm,dynamic-memory property instead of adding/removing a
- * memory node in the device tree. For these cases we still need to
- * involve the notifier chain.
- */
- if (!strcmp(name, "ibm,dynamic-memory")) {
- int action;
-
- next_prop = parse_next_property(next_prop, end, &name,
- &length, &value);
- if (!next_prop)
- return -EINVAL;
-
- if (!strcmp(name, "add"))
- action = PSERIES_DRCONF_MEM_ADD;
- else
- action = PSERIES_DRCONF_MEM_REMOVE;
-
- rc = pSeries_reconfig_notify(action, value);
- if (rc) {
- prom_update_property(np, oldprop, newprop);
- return rc;
- }
- }
-
- return 0;
+ return rc;
}
/**
Index: dt-next/drivers/of/base.c
===================================================================
--- dt-next.orig/drivers/of/base.c 2012-10-02 08:40:51.000000000 -0500
+++ dt-next/drivers/of/base.c 2012-10-02 08:58:55.000000000 -0500
@@ -978,6 +978,24 @@
}
EXPORT_SYMBOL(of_parse_phandle_with_args);
+#if defined(CONFIG_OF_DYNAMIC)
+static int of_property_notify(int action, struct device_node *np,
+ struct property *prop)
+{
+ struct of_prop_reconfig pr;
+
+ pr.dn = np;
+ pr.prop = prop;
+ return of_reconfig_notify(action, &pr);
+}
+#else
+static int of_property_notify(int action, struct device_node *np,
+ struct property *prop)
+{
+ return 0;
+}
+#endif
+
/**
* prom_add_property - Add a property to a node
*/
@@ -985,6 +1003,11 @@
{
struct property **next;
unsigned long flags;
+ int rc;
+
+ rc = of_property_notify(OF_RECONFIG_ADD_PROPERTY, np, prop);
+ if (rc)
+ return rc;
prop->next = NULL;
write_lock_irqsave(&devtree_lock, flags);
@@ -1022,6 +1045,11 @@
struct property **next;
unsigned long flags;
int found = 0;
+ int rc;
+
+ rc = of_property_notify(OF_RECONFIG_REMOVE_PROPERTY, np, prop);
+ if (rc)
+ return rc;
write_lock_irqsave(&devtree_lock, flags);
next = &np->properties;
@@ -1064,7 +1092,11 @@
{
struct property **next;
unsigned long flags;
- int found = 0;
+ int rc, found = 0;
+
+ rc = of_property_notify(OF_RECONFIG_UPDATE_PROPERTY, np, newprop);
+ if (rc)
+ return rc;
write_lock_irqsave(&devtree_lock, flags);
next = &np->properties;
@@ -1103,6 +1135,26 @@
* device tree nodes.
*/
+static BLOCKING_NOTIFIER_HEAD(of_reconfig_chain);
+
+int of_reconfig_notifier_register(struct notifier_block *nb)
+{
+ return blocking_notifier_chain_register(&of_reconfig_chain, nb);
+}
+
+int of_reconfig_notifier_unregister(struct notifier_block *nb)
+{
+ return blocking_notifier_chain_unregister(&of_reconfig_chain, nb);
+}
+
+int of_reconfig_notify(unsigned long action, void *p)
+{
+ int rc;
+
+ rc = blocking_notifier_call_chain(&of_reconfig_chain, action, p);
+ return notifier_to_errno(rc);
+}
+
#ifdef CONFIG_PROC_DEVICETREE
static void of_add_proc_dt_entry(struct device_node *dn)
{
@@ -1122,9 +1174,14 @@
/**
* of_attach_node - Plug a device node into the tree and global list.
*/
-void of_attach_node(struct device_node *np)
+int of_attach_node(struct device_node *np)
{
unsigned long flags;
+ int rc;
+
+ rc = of_reconfig_notify(OF_RECONFIG_ATTACH_NODE, np);
+ if (rc)
+ return rc;
write_lock_irqsave(&devtree_lock, flags);
np->sibling = np->parent->child;
@@ -1134,6 +1191,7 @@
write_unlock_irqrestore(&devtree_lock, flags);
of_add_proc_dt_entry(np);
+ return 0;
}
#ifdef CONFIG_PROC_DEVICETREE
@@ -1163,23 +1221,28 @@
* The caller must hold a reference to the node. The memory associated with
* the node is not freed until its refcount goes to zero.
*/
-void of_detach_node(struct device_node *np)
+int of_detach_node(struct device_node *np)
{
struct device_node *parent;
unsigned long flags;
+ int rc = 0;
+
+ rc = of_reconfig_notify(OF_RECONFIG_DETACH_NODE, np);
+ if (rc)
+ return rc;
write_lock_irqsave(&devtree_lock, flags);
if (of_node_check_flag(np, OF_DETACHED)) {
/* someone already detached it */
write_unlock_irqrestore(&devtree_lock, flags);
- return;
+ return rc;
}
parent = np->parent;
if (!parent) {
write_unlock_irqrestore(&devtree_lock, flags);
- return;
+ return rc;
}
if (allnodes == np)
@@ -1208,6 +1271,7 @@
write_unlock_irqrestore(&devtree_lock, flags);
of_remove_proc_dt_entry(np);
+ return rc;
}
#endif /* defined(CONFIG_OF_DYNAMIC) */
Index: dt-next/arch/powerpc/include/asm/pSeries_reconfig.h
===================================================================
--- dt-next.orig/arch/powerpc/include/asm/pSeries_reconfig.h 2012-10-02 08:30:21.000000000 -0500
+++ dt-next/arch/powerpc/include/asm/pSeries_reconfig.h 2012-10-02 08:43:40.000000000 -0500
@@ -2,43 +2,11 @@
#define _PPC64_PSERIES_RECONFIG_H
#ifdef __KERNEL__
-#include <linux/notifier.h>
-
-/*
- * Use this API if your code needs to know about OF device nodes being
- * added or removed on pSeries systems.
- */
-
-#define PSERIES_RECONFIG_ADD 0x0001
-#define PSERIES_RECONFIG_REMOVE 0x0002
-#define PSERIES_DRCONF_MEM_ADD 0x0003
-#define PSERIES_DRCONF_MEM_REMOVE 0x0004
-#define PSERIES_UPDATE_PROPERTY 0x0005
-
-/**
- * pSeries_reconfig_notify - Notifier value structure for OFDT property updates
- *
- * @node: Device tree node which owns the property being updated
- * @property: Updated property
- */
-struct pSeries_reconfig_prop_update {
- struct device_node *node;
- struct property *property;
-};
-
#ifdef CONFIG_PPC_PSERIES
-extern int pSeries_reconfig_notifier_register(struct notifier_block *);
-extern void pSeries_reconfig_notifier_unregister(struct notifier_block *);
-extern int pSeries_reconfig_notify(unsigned long action, void *p);
/* Not the best place to put this, will be fixed when we move some
* of the rtas suspend-me stuff to pseries */
extern void pSeries_coalesce_init(void);
#else /* !CONFIG_PPC_PSERIES */
-static inline int pSeries_reconfig_notifier_register(struct notifier_block *nb)
-{
- return 0;
-}
-static inline void pSeries_reconfig_notifier_unregister(struct notifier_block *nb) { }
static inline void pSeries_coalesce_init(void) { }
#endif /* CONFIG_PPC_PSERIES */
Index: dt-next/arch/powerpc/platforms/pseries/hotplug-cpu.c
===================================================================
--- dt-next.orig/arch/powerpc/platforms/pseries/hotplug-cpu.c 2012-10-02 08:30:23.000000000 -0500
+++ dt-next/arch/powerpc/platforms/pseries/hotplug-cpu.c 2012-10-02 08:43:40.000000000 -0500
@@ -23,12 +23,12 @@
#include <linux/delay.h>
#include <linux/sched.h> /* for idle_task_exit */
#include <linux/cpu.h>
+#include <linux/of.h>
#include <asm/prom.h>
#include <asm/rtas.h>
#include <asm/firmware.h>
#include <asm/machdep.h>
#include <asm/vdso_datapage.h>
-#include <asm/pSeries_reconfig.h>
#include <asm/xics.h>
#include "plpar_wrappers.h"
#include "offline_states.h"
@@ -333,10 +333,10 @@
int err = 0;
switch (action) {
- case PSERIES_RECONFIG_ADD:
+ case OF_RECONFIG_ATTACH_NODE:
err = pseries_add_processor(node);
break;
- case PSERIES_RECONFIG_REMOVE:
+ case OF_RECONFIG_DETACH_NODE:
pseries_remove_processor(node);
break;
}
@@ -399,7 +399,7 @@
/* Processors can be added/removed only on LPAR */
if (firmware_has_feature(FW_FEATURE_LPAR)) {
- pSeries_reconfig_notifier_register(&pseries_smp_nb);
+ of_reconfig_notifier_register(&pseries_smp_nb);
cpu_maps_update_begin();
if (cede_offline_enabled && parse_cede_parameters() == 0) {
default_offline_state = CPU_STATE_INACTIVE;
Index: dt-next/arch/powerpc/platforms/pseries/hotplug-memory.c
===================================================================
--- dt-next.orig/arch/powerpc/platforms/pseries/hotplug-memory.c 2012-10-02 08:30:04.000000000 -0500
+++ dt-next/arch/powerpc/platforms/pseries/hotplug-memory.c 2012-10-02 08:43:40.000000000 -0500
@@ -16,7 +16,6 @@
#include <asm/firmware.h>
#include <asm/machdep.h>
-#include <asm/pSeries_reconfig.h>
#include <asm/sparsemem.h>
static unsigned long get_memblock_size(void)
@@ -181,42 +180,69 @@
return (ret < 0) ? -EINVAL : 0;
}
-static int pseries_drconf_memory(unsigned long *base, unsigned int action)
+static int pseries_update_drconf_memory(struct of_prop_reconfig *pr)
{
+ struct of_drconf_cell *new_drmem, *old_drmem;
unsigned long memblock_size;
- int rc;
+ u32 entries;
+ u32 *p;
+ int i, rc = -EINVAL;
memblock_size = get_memblock_size();
if (!memblock_size)
return -EINVAL;
- if (action == PSERIES_DRCONF_MEM_ADD) {
- rc = memblock_add(*base, memblock_size);
- rc = (rc < 0) ? -EINVAL : 0;
- } else if (action == PSERIES_DRCONF_MEM_REMOVE) {
- rc = pseries_remove_memblock(*base, memblock_size);
- } else {
- rc = -EINVAL;
+ p = (u32 *)of_get_property(pr->dn, "ibm,dynamic-memory", NULL);
+ if (!p)
+ return -EINVAL;
+
+ /* The first int of the property is the number of lmb's described
+ * by the property. This is followed by an array of of_drconf_cell
+ * entries. Get the niumber of entries and skip to the array of
+ * of_drconf_cell's.
+ */
+ entries = *p++;
+ old_drmem = (struct of_drconf_cell *)p;
+
+ p = (u32 *)pr->prop->value;
+ p++;
+ new_drmem = (struct of_drconf_cell *)p;
+
+ for (i = 0; i < entries; i++) {
+ if ((old_drmem[i].flags & DRCONF_MEM_ASSIGNED) &&
+ (!(new_drmem[i].flags & DRCONF_MEM_ASSIGNED))) {
+ rc = pseries_remove_memblock(old_drmem[i].base_addr,
+ memblock_size);
+ break;
+ } else if ((!(old_drmem[i].flags & DRCONF_MEM_ASSIGNED)) &&
+ (new_drmem[i].flags & DRCONF_MEM_ASSIGNED)) {
+ rc = memblock_add(old_drmem[i].base_addr,
+ memblock_size);
+ rc = (rc < 0) ? -EINVAL : 0;
+ break;
+ }
}
return rc;
}
static int pseries_memory_notifier(struct notifier_block *nb,
- unsigned long action, void *node)
+ unsigned long action, void *node)
{
+ struct of_prop_reconfig *pr;
int err = 0;
switch (action) {
- case PSERIES_RECONFIG_ADD:
+ case OF_RECONFIG_ATTACH_NODE:
err = pseries_add_memory(node);
break;
- case PSERIES_RECONFIG_REMOVE:
+ case OF_RECONFIG_DETACH_NODE:
err = pseries_remove_memory(node);
break;
- case PSERIES_DRCONF_MEM_ADD:
- case PSERIES_DRCONF_MEM_REMOVE:
- err = pseries_drconf_memory(node, action);
+ case OF_RECONFIG_UPDATE_PROPERTY:
+ pr = (struct of_prop_reconfig *)node;
+ if (!strcmp(pr->prop->name, "ibm,dynamic-memory"))
+ err = pseries_update_drconf_memory(pr);
break;
}
return notifier_from_errno(err);
@@ -229,7 +255,7 @@
static int __init pseries_memory_hotplug_init(void)
{
if (firmware_has_feature(FW_FEATURE_LPAR))
- pSeries_reconfig_notifier_register(&pseries_mem_nb);
+ of_reconfig_notifier_register(&pseries_mem_nb);
return 0;
}
Index: dt-next/include/linux/of.h
===================================================================
--- dt-next.orig/include/linux/of.h 2012-10-02 08:31:08.000000000 -0500
+++ dt-next/include/linux/of.h 2012-10-02 08:50:22.000000000 -0500
@@ -21,6 +21,7 @@
#include <linux/kref.h>
#include <linux/mod_devicetable.h>
#include <linux/spinlock.h>
+#include <linux/notifier.h>
#include <asm/byteorder.h>
#include <asm/errno.h>
@@ -270,8 +271,23 @@
#if defined(CONFIG_OF_DYNAMIC)
/* For updating the device tree at runtime */
-extern void of_attach_node(struct device_node *);
-extern void of_detach_node(struct device_node *);
+#define OF_RECONFIG_ATTACH_NODE 0x0001
+#define OF_RECONFIG_DETACH_NODE 0x0002
+#define OF_RECONFIG_ADD_PROPERTY 0x0003
+#define OF_RECONFIG_REMOVE_PROPERTY 0x0004
+#define OF_RECONFIG_UPDATE_PROPERTY 0x0005
+
+struct of_prop_reconfig {
+ struct device_node *dn;
+ struct property *prop;
+};
+
+extern int of_reconfig_notifier_register(struct notifier_block *);
+extern int of_reconfig_notifier_unregister(struct notifier_block *);
+extern int of_reconfig_notify(unsigned long, void *);
+
+extern int of_attach_node(struct device_node *);
+extern int of_detach_node(struct device_node *);
#endif
#define of_match_ptr(_ptr) (_ptr)
Index: dt-next/arch/powerpc/kernel/prom.c
===================================================================
--- dt-next.orig/arch/powerpc/kernel/prom.c 2012-10-02 08:30:22.000000000 -0500
+++ dt-next/arch/powerpc/kernel/prom.c 2012-10-02 08:43:40.000000000 -0500
@@ -32,6 +32,7 @@
#include <linux/debugfs.h>
#include <linux/irq.h>
#include <linux/memblock.h>
+#include <linux/of.h>
#include <asm/prom.h>
#include <asm/rtas.h>
@@ -49,7 +50,6 @@
#include <asm/btext.h>
#include <asm/sections.h>
#include <asm/machdep.h>
-#include <asm/pSeries_reconfig.h>
#include <asm/pci-bridge.h>
#include <asm/kexec.h>
#include <asm/opal.h>
@@ -802,7 +802,7 @@
int err;
switch (action) {
- case PSERIES_RECONFIG_ADD:
+ case OF_RECONFIG_ATTACH_NODE:
err = of_finish_dynamic_node(node);
if (err < 0)
printk(KERN_ERR "finish_node returned %d\n", err);
@@ -821,7 +821,7 @@
static int __init prom_reconfig_setup(void)
{
- return pSeries_reconfig_notifier_register(&prom_reconfig_nb);
+ return of_reconfig_notifier_register(&prom_reconfig_nb);
}
__initcall(prom_reconfig_setup);
#endif
Index: dt-next/arch/powerpc/platforms/pseries/iommu.c
===================================================================
--- dt-next.orig/arch/powerpc/platforms/pseries/iommu.c 2012-10-02 08:30:23.000000000 -0500
+++ dt-next/arch/powerpc/platforms/pseries/iommu.c 2012-10-02 08:43:40.000000000 -0500
@@ -35,6 +35,7 @@
#include <linux/dma-mapping.h>
#include <linux/crash_dump.h>
#include <linux/memory.h>
+#include <linux/of.h>
#include <asm/io.h>
#include <asm/prom.h>
#include <asm/rtas.h>
@@ -42,7 +43,6 @@
#include <asm/pci-bridge.h>
#include <asm/machdep.h>
#include <asm/abs_addr.h>
-#include <asm/pSeries_reconfig.h>
#include <asm/firmware.h>
#include <asm/tce.h>
#include <asm/ppc-pci.h>
@@ -1211,7 +1211,7 @@
struct direct_window *window;
switch (action) {
- case PSERIES_RECONFIG_REMOVE:
+ case OF_RECONFIG_DETACH_NODE:
if (pci && pci->iommu_table)
iommu_free_table(pci->iommu_table, np->full_name);
@@ -1274,7 +1274,7 @@
}
- pSeries_reconfig_notifier_register(&iommu_reconfig_nb);
+ of_reconfig_notifier_register(&iommu_reconfig_nb);
register_memory_notifier(&iommu_mem_nb);
set_pci_dma_ops(&dma_iommu_ops);
Index: dt-next/arch/powerpc/platforms/pseries/setup.c
===================================================================
--- dt-next.orig/arch/powerpc/platforms/pseries/setup.c 2012-10-02 08:30:23.000000000 -0500
+++ dt-next/arch/powerpc/platforms/pseries/setup.c 2012-10-02 08:43:40.000000000 -0500
@@ -40,6 +40,7 @@
#include <linux/seq_file.h>
#include <linux/root_dev.h>
#include <linux/cpuidle.h>
+#include <linux/of.h>
#include <asm/mmu.h>
#include <asm/processor.h>
@@ -63,7 +64,6 @@
#include <asm/smp.h>
#include <asm/firmware.h>
#include <asm/eeh.h>
-#include <asm/pSeries_reconfig.h>
#include "plpar_wrappers.h"
#include "pseries.h"
@@ -258,7 +258,7 @@
int err = NOTIFY_OK;
switch (action) {
- case PSERIES_RECONFIG_ADD:
+ case OF_RECONFIG_ATTACH_NODE:
pci = np->parent->data;
if (pci) {
update_dn_pci_info(np, pci->phb);
@@ -390,7 +390,7 @@
init_pci_config_tokens();
eeh_pseries_init();
find_and_init_phbs();
- pSeries_reconfig_notifier_register(&pci_dn_reconfig_nb);
+ of_reconfig_notifier_register(&pci_dn_reconfig_nb);
eeh_init();
pSeries_nvram_init();
Index: dt-next/arch/powerpc/platforms/pseries/dlpar.c
===================================================================
--- dt-next.orig/arch/powerpc/platforms/pseries/dlpar.c 2012-10-02 08:40:51.000000000 -0500
+++ dt-next/arch/powerpc/platforms/pseries/dlpar.c 2012-10-02 08:43:40.000000000 -0500
@@ -16,13 +16,13 @@
#include <linux/spinlock.h>
#include <linux/cpu.h>
#include <linux/slab.h>
+#include <linux/of.h>
#include "offline_states.h"
#include <asm/prom.h>
#include <asm/machdep.h>
#include <asm/uaccess.h>
#include <asm/rtas.h>
-#include <asm/pSeries_reconfig.h>
struct cc_workarea {
u32 drc_index;
@@ -262,24 +262,26 @@
if (!dn->parent)
return -ENOMEM;
- rc = pSeries_reconfig_notify(PSERIES_RECONFIG_ADD, dn);
+ rc = of_attach_node(dn);
if (rc) {
printk(KERN_ERR "Failed to add device node %s\n",
dn->full_name);
return rc;
}
- of_attach_node(dn);
of_node_put(dn->parent);
return 0;
}
int dlpar_detach_node(struct device_node *dn)
{
- pSeries_reconfig_notify(PSERIES_RECONFIG_REMOVE, dn);
- of_detach_node(dn);
- of_node_put(dn); /* Must decrement the refcount */
+ int rc;
+ rc = of_detach_node(dn);
+ if (rc)
+ return rc;
+
+ of_node_put(dn); /* Must decrement the refcount */
return 0;
}
^ permalink raw reply
* [PATCH 4/5] Rename the drivers/of prom_* functions to of_*
From: Nathan Fontenot @ 2012-10-03 2:58 UTC (permalink / raw)
To: devicetree-discuss, cbe-oss-dev, LKML, linuxppc-dev
In-Reply-To: <506B2E63.5090900@linux.vnet.ibm.com>
Rename the prom_*_property routines of the generic OF code to of_*_property.
This brings them in line with the naming used by the rest of the OF code.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
arch/powerpc/kernel/machine_kexec.c | 12 ++++++------
arch/powerpc/kernel/machine_kexec_64.c | 8 ++++----
arch/powerpc/kernel/pci_32.c | 2 +-
arch/powerpc/platforms/ps3/os-area.c | 6 +++---
arch/powerpc/platforms/pseries/iommu.c | 4 ++--
arch/powerpc/platforms/pseries/mobility.c | 6 +++---
arch/powerpc/platforms/pseries/reconfig.c | 8 ++++----
drivers/macintosh/smu.c | 2 +-
drivers/of/base.c | 15 +++++++--------
include/linux/of.h | 9 ++++-----
10 files changed, 35 insertions(+), 37 deletions(-)
Index: dt-next/include/linux/of.h
===================================================================
--- dt-next.orig/include/linux/of.h 2012-10-02 08:50:22.000000000 -0500
+++ dt-next/include/linux/of.h 2012-10-02 09:07:23.000000000 -0500
@@ -263,11 +263,10 @@
extern int of_machine_is_compatible(const char *compat);
-extern int prom_add_property(struct device_node* np, struct property* prop);
-extern int prom_remove_property(struct device_node *np, struct property *prop);
-extern int prom_update_property(struct device_node *np,
- struct property *newprop,
- struct property *oldprop);
+extern int of_add_property(struct device_node *np, struct property *prop);
+extern int of_remove_property(struct device_node *np, struct property *prop);
+extern int of_update_property(struct device_node *np, struct property *newprop,
+ struct property *oldprop);
#if defined(CONFIG_OF_DYNAMIC)
/* For updating the device tree at runtime */
Index: dt-next/arch/powerpc/kernel/pci_32.c
===================================================================
--- dt-next.orig/arch/powerpc/kernel/pci_32.c 2012-10-02 08:30:22.000000000 -0500
+++ dt-next/arch/powerpc/kernel/pci_32.c 2012-10-02 09:01:10.000000000 -0500
@@ -208,7 +208,7 @@
of_prop->name = "pci-OF-bus-map";
of_prop->length = 256;
of_prop->value = &of_prop[1];
- prom_add_property(dn, of_prop);
+ of_add_property(dn, of_prop);
of_node_put(dn);
}
}
Index: dt-next/arch/powerpc/kernel/machine_kexec.c
===================================================================
--- dt-next.orig/arch/powerpc/kernel/machine_kexec.c 2012-10-02 08:30:22.000000000 -0500
+++ dt-next/arch/powerpc/kernel/machine_kexec.c 2012-10-02 09:01:10.000000000 -0500
@@ -212,16 +212,16 @@
* be sure what's in them, so remove them. */
prop = of_find_property(node, "linux,crashkernel-base", NULL);
if (prop)
- prom_remove_property(node, prop);
+ of_remove_property(node, prop);
prop = of_find_property(node, "linux,crashkernel-size", NULL);
if (prop)
- prom_remove_property(node, prop);
+ of_remove_property(node, prop);
if (crashk_res.start != 0) {
- prom_add_property(node, &crashk_base_prop);
+ of_add_property(node, &crashk_base_prop);
crashk_size = resource_size(&crashk_res);
- prom_add_property(node, &crashk_size_prop);
+ of_add_property(node, &crashk_size_prop);
}
}
@@ -237,11 +237,11 @@
/* remove any stale properties so ours can be found */
prop = of_find_property(node, kernel_end_prop.name, NULL);
if (prop)
- prom_remove_property(node, prop);
+ of_remove_property(node, prop);
/* information needed by userspace when using default_machine_kexec */
kernel_end = __pa(_end);
- prom_add_property(node, &kernel_end_prop);
+ of_add_property(node, &kernel_end_prop);
export_crashk_values(node);
Index: dt-next/arch/powerpc/kernel/machine_kexec_64.c
===================================================================
--- dt-next.orig/arch/powerpc/kernel/machine_kexec_64.c 2012-10-02 08:30:22.000000000 -0500
+++ dt-next/arch/powerpc/kernel/machine_kexec_64.c 2012-10-02 09:01:10.000000000 -0500
@@ -389,14 +389,14 @@
/* remove any stale propertys so ours can be found */
prop = of_find_property(node, htab_base_prop.name, NULL);
if (prop)
- prom_remove_property(node, prop);
+ of_remove_property(node, prop);
prop = of_find_property(node, htab_size_prop.name, NULL);
if (prop)
- prom_remove_property(node, prop);
+ of_remove_property(node, prop);
htab_base = __pa(htab_address);
- prom_add_property(node, &htab_base_prop);
- prom_add_property(node, &htab_size_prop);
+ of_add_property(node, &htab_base_prop);
+ of_add_property(node, &htab_size_prop);
of_node_put(node);
return 0;
Index: dt-next/arch/powerpc/platforms/pseries/reconfig.c
===================================================================
--- dt-next.orig/arch/powerpc/platforms/pseries/reconfig.c 2012-10-02 08:45:12.000000000 -0500
+++ dt-next/arch/powerpc/platforms/pseries/reconfig.c 2012-10-02 09:14:31.000000000 -0500
@@ -326,7 +326,7 @@
if (!prop)
return -ENOMEM;
- prom_add_property(np, prop);
+ of_add_property(np, prop);
return 0;
}
@@ -350,7 +350,7 @@
prop = of_find_property(np, buf, NULL);
- return prom_remove_property(np, prop);
+ return of_remove_property(np, prop);
}
static int do_update_property(char *buf, size_t bufsize)
@@ -380,11 +380,11 @@
oldprop = of_find_property(np, name,NULL);
if (!oldprop) {
if (strlen(name))
- return prom_add_property(np, newprop);
+ return of_add_property(np, newprop);
return -ENODEV;
}
- rc = prom_update_property(np, newprop, oldprop);
+ rc = of_update_property(np, newprop, oldprop);
return rc;
}
Index: dt-next/arch/powerpc/platforms/pseries/mobility.c
===================================================================
--- dt-next.orig/arch/powerpc/platforms/pseries/mobility.c 2012-10-02 08:30:23.000000000 -0500
+++ dt-next/arch/powerpc/platforms/pseries/mobility.c 2012-10-02 09:03:54.000000000 -0500
@@ -119,9 +119,9 @@
if (!more) {
old_prop = of_find_property(dn, new_prop->name, NULL);
if (old_prop)
- prom_update_property(dn, new_prop, old_prop);
+ of_update_property(dn, new_prop, old_prop);
else
- prom_add_property(dn, new_prop);
+ of_add_property(dn, new_prop);
new_prop = NULL;
}
@@ -178,7 +178,7 @@
case 0x80000000:
prop = of_find_property(dn, prop_name, NULL);
- prom_remove_property(dn, prop);
+ of_remove_property(dn, prop);
prop = NULL;
break;
Index: dt-next/arch/powerpc/platforms/pseries/iommu.c
===================================================================
--- dt-next.orig/arch/powerpc/platforms/pseries/iommu.c 2012-10-02 08:43:40.000000000 -0500
+++ dt-next/arch/powerpc/platforms/pseries/iommu.c 2012-10-02 09:01:10.000000000 -0500
@@ -747,7 +747,7 @@
np->full_name, ret, ddw_avail[2], liobn);
delprop:
- ret = prom_remove_property(np, win64);
+ ret = of_remove_property(np, win64);
if (ret)
pr_warning("%s: failed to remove direct window property: %d\n",
np->full_name, ret);
@@ -991,7 +991,7 @@
goto out_free_window;
}
- ret = prom_add_property(pdn, win64);
+ ret = of_add_property(pdn, win64);
if (ret) {
dev_err(&dev->dev, "unable to add dma window property for %s: %d",
pdn->full_name, ret);
Index: dt-next/arch/powerpc/platforms/ps3/os-area.c
===================================================================
--- dt-next.orig/arch/powerpc/platforms/ps3/os-area.c 2012-10-02 08:30:23.000000000 -0500
+++ dt-next/arch/powerpc/platforms/ps3/os-area.c 2012-10-02 09:01:10.000000000 -0500
@@ -280,13 +280,13 @@
if (tmp) {
pr_debug("%s:%d found %s\n", __func__, __LINE__, prop->name);
- prom_remove_property(node, tmp);
+ of_remove_property(node, tmp);
}
- result = prom_add_property(node, prop);
+ result = of_add_property(node, prop);
if (result)
- pr_debug("%s:%d prom_set_property failed\n", __func__,
+ pr_debug("%s:%d of_set_property failed\n", __func__,
__LINE__);
}
Index: dt-next/drivers/macintosh/smu.c
===================================================================
--- dt-next.orig/drivers/macintosh/smu.c 2012-10-02 08:30:35.000000000 -0500
+++ dt-next/drivers/macintosh/smu.c 2012-10-02 09:01:10.000000000 -0500
@@ -998,7 +998,7 @@
"%02x !\n", id, hdr->id);
goto failure;
}
- if (prom_add_property(smu->of_node, prop)) {
+ if (of_add_property(smu->of_node, prop)) {
printk(KERN_DEBUG "SMU: Failed creating sdb-partition-%02x "
"property !\n", id);
goto failure;
Index: dt-next/drivers/of/base.c
===================================================================
--- dt-next.orig/drivers/of/base.c 2012-10-02 08:58:55.000000000 -0500
+++ dt-next/drivers/of/base.c 2012-10-02 09:05:37.000000000 -0500
@@ -997,9 +997,9 @@
#endif
/**
- * prom_add_property - Add a property to a node
+ * of_add_property - Add a property to a node
*/
-int prom_add_property(struct device_node *np, struct property *prop)
+int of_add_property(struct device_node *np, struct property *prop)
{
struct property **next;
unsigned long flags;
@@ -1033,14 +1033,14 @@
}
/**
- * prom_remove_property - Remove a property from a node.
+ * of_remove_property - Remove a property from a node.
*
* Note that we don't actually remove it, since we have given out
* who-knows-how-many pointers to the data using get-property.
* Instead we just move the property to the "dead properties"
* list, so it won't be found any more.
*/
-int prom_remove_property(struct device_node *np, struct property *prop)
+int of_remove_property(struct device_node *np, struct property *prop)
{
struct property **next;
unsigned long flags;
@@ -1079,16 +1079,15 @@
}
/*
- * prom_update_property - Update a property in a node.
+ * of_update_property - Update a property in a node.
*
* Note that we don't actually remove it, since we have given out
* who-knows-how-many pointers to the data using get-property.
* Instead we just move the property to the "dead properties" list,
* and add the new property to the property list
*/
-int prom_update_property(struct device_node *np,
- struct property *newprop,
- struct property *oldprop)
+int of_update_property(struct device_node *np, struct property *newprop,
+ struct property *oldprop)
{
struct property **next;
unsigned long flags;
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox