LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Rafael J. Wysocki @ 2013-01-31 20:54 UTC (permalink / raw)
  To: Toshi Kani
  Cc: linux-s390, jiang.liu, wency, linux-acpi, Greg KH, linux-kernel,
	linux-mm, isimatu.yasuaki, yinghai, srivatsa.bhat, guohanjun,
	bhelgaas, akpm, linuxppc-dev, lenb
In-Reply-To: <1359601065.15120.156.camel@misato.fc.hp.com>

On Wednesday, January 30, 2013 07:57:45 PM Toshi Kani wrote:
> On Tue, 2013-01-29 at 23:58 -0500, Greg KH wrote:
> > On Thu, Jan 10, 2013 at 04:40:19PM -0700, Toshi Kani wrote:
> > > +/*
> > > + * Hot-plug device information
> > > + */
> > 
> > Again, stop it with the "generic" hotplug term here, and everywhere
> > else.  You are doing a very _specific_ type of hotplug devices, so spell
> > it out.  We've worked hard to hotplug _everything_ in Linux, you are
> > going to confuse a lot of people with this type of terms.
> 
> Agreed.  I will clarify in all places.
> 
> > > +union shp_dev_info {
> > > +	struct shp_cpu {
> > > +		u32		cpu_id;
> > > +	} cpu;
> > 
> > What is this?  Why not point to the system device for the cpu?
> 
> This info is used to on-line a new CPU and create its system/cpu device.
> In other word, a system/cpu device is created as a result of CPU
> hotplug.
> 
> > > +	struct shp_memory {
> > > +		int		node;
> > > +		u64		start_addr;
> > > +		u64		length;
> > > +	} mem;
> > 
> > Same here, why not point to the system device?
> 
> Same as above.
> 
> > > +	struct shp_hostbridge {
> > > +	} hb;
> > > +
> > > +	struct shp_node {
> > > +	} node;
> > 
> > What happened here with these?  Empty structures?  Huh?
> 
> They are place holders for now.  PCI bridge hot-plug and node hot-plug
> are still very much work in progress, so I have not integrated them into
> this framework yet.
> 
> > > +};
> > > +
> > > +struct shp_device {
> > > +	struct list_head	list;
> > > +	struct device		*device;
> > 
> > No, make it a "real" device, embed the device into it.
> 
> This device pointer is used to send KOBJ_ONLINE/OFFLINE event during CPU
> online/offline operation in order to maintain the current behavior.  CPU
> online/offline operation only changes the state of CPU, so its
> system/cpu device continues to be present before and after an operation.
> (Whereas, CPU hot-add/delete operation creates or removes a system/cpu
> device.)  So, this "*device" needs to be a pointer to reference an
> existing device that is to be on-lined/off-lined.
> 
> > But, again, I'm going to ask why you aren't using the existing cpu /
> > memory / bridge / node devices that we have in the kernel.  Please use
> > them, or give me a _really_ good reason why they will not work.
> 
> We cannot use the existing system devices or ACPI devices here.  During
> hot-plug, ACPI handler sets this shp_device info, so that cpu and memory
> handlers (drivers/cpu.c and mm/memory_hotplug.c) can obtain their target
> device information in a platform-neutral way.  During hot-add, we first
> creates an ACPI device node (i.e. device under /sys/bus/acpi/devices),
> but platform-neutral modules cannot use them as they are ACPI-specific.

But suppose we're smart and have ACPI scan handlers that will create
"physical" device nodes for those devices during the ACPI namespace scan.
Then, the platform-neutral nodes will be able to bind to those "physical"
nodes.  Moreover, it should be possible to get a hierarchy of device objects
this way that will reflect all of the dependencies we need to take into
account during hot-add and hot-remove operations.  That may not be what we
have today, but I don't see any *fundamental* obstacles preventing us from
using this approach.

This is already done for PCI host bridges and platform devices and I don't
see why we can't do that for the other types of devices too.

The only missing piece I see is a way to handle the "eject" problem, i.e.
when we try do eject a device at the top of a subtree and need to tear down
the entire subtree below it, but if that's going to lead to a system crash,
for example, we want to cancel the eject.  It seems to me that we'll need some
help from the driver core here.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory
From: Jianguo Wu @ 2013-02-01  1:32 UTC (permalink / raw)
  To: Simon Jeons
  Cc: linux-ia64, linux-sh, Tang Chen, linux-mm, paulus, hpa,
	sparclinux, cl, linux-s390, x86, linux-acpi, isimatu.yasuaki,
	linfeng, mgorman, kosaki.motohiro, rientjes, len.brown, wency,
	cmetcalf, glommer, yinghai, laijs, linux-kernel, minchan.kim,
	akpm, linuxppc-dev
In-Reply-To: <1359628705.2048.5.camel@kernel>

On 2013/1/31 18:38, Simon Jeons wrote:

> Hi Tang,
> On Thu, 2013-01-31 at 17:44 +0800, Tang Chen wrote:
>> Hi Simon,
>>
>> On 01/31/2013 04:48 PM, Simon Jeons wrote:
>>> Hi Tang,
>>> On Thu, 2013-01-31 at 15:10 +0800, Tang Chen wrote:
>>>
>>> 1. IIUC, there is a button on machine which supports hot-remove memory,
>>> then what's the difference between press button and echo to /sys?
>>
>> No important difference, I think. Since I don't have the machine you are
>> saying, I cannot surely answer you. :)
>> AFAIK, pressing the button means trigger the hotplug from hardware, sysfs
>> is just another entrance. At last, they will run into the same code.
>>
>>> 2. Since kernel memory is linear mapping(I mean direct mapping part),
>>> why can't put kernel direct mapping memory into one memory device, and
>>> other memory into the other devices?
>>
>> We cannot do that because in that way, we will lose NUMA performance.
>>
>> If you know NUMA, you will understand the following example:
>>
>> node0:                    node1:
>>     cpu0~cpu15                cpu16~cpu31
>>     memory0~memory511         memory512~memory1023
>>
>> cpu16~cpu31 access memory16~memory1023 much faster than memory0~memory511.
>> If we set direct mapping area in node0, and movable area in node1, then
>> the kernel code running on cpu16~cpu31 will have to access 
>> memory0~memory511.
>> This is a terrible performance down.
> 
> So if config NUMA, kernel memory will not be linear mapping anymore? For
> example, 
> 
> Node 0  Node 1 
> 
> 0 ~ 10G 11G~14G
> 
> kernel memory only at Node 0? Can part of kernel memory also at Node 1?
> 
> How big is kernel direct mapping memory in x86_64? Is there max limit?


Max kernel direct mapping memory in x86_64 is 64TB.

> It seems that only around 896MB on x86_32. 
> 
>>
>>> As you know x86_64 don't need
>>> highmem, IIUC, all kernel memory will linear mapping in this case. Is my
>>> idea available? If is correct, x86_32 can't implement in the same way
>>> since highmem(kmap/kmap_atomic/vmalloc) can map any address, so it's
>>> hard to focus kernel memory on single memory device.
>>
>> Sorry, I'm not quite familiar with x86_32 box.
>>
>>> 3. In current implementation, if memory hotplug just need memory
>>> subsystem and ACPI codes support? Or also needs firmware take part in?
>>> Hope you can explain in details, thanks in advance. :)
>>
>> We need firmware take part in, such as SRAT in ACPI BIOS, or the firmware
>> based memory migration mentioned by Liu Jiang.
> 
> Is there any material about firmware based memory migration?
> 
>>
>> So far, I only know this. :)
>>
>>> 4. What's the status of memory hotplug? Apart from can't remove kernel
>>> memory, other things are fully implementation?
>>
>> I think the main job is done for now. And there are still bugs to fix.
>> And this functionality is not stable.
>>
>> Thanks. :)
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 
> .
> 

^ permalink raw reply

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Toshi Kani @ 2013-02-01  1:32 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-s390, jiang.liu, wency, linux-acpi, Greg KH, linux-kernel,
	linux-mm, isimatu.yasuaki, yinghai, srivatsa.bhat, guohanjun,
	bhelgaas, akpm, linuxppc-dev, lenb
In-Reply-To: <9860755.q4y3PrCFZx@vostro.rjw.lan>

On Thu, 2013-01-31 at 21:54 +0100, Rafael J. Wysocki wrote:
> On Wednesday, January 30, 2013 07:57:45 PM Toshi Kani wrote:
> > On Tue, 2013-01-29 at 23:58 -0500, Greg KH wrote:
> > > On Thu, Jan 10, 2013 at 04:40:19PM -0700, Toshi Kani wrote:
 :
> > > > +};
> > > > +
> > > > +struct shp_device {
> > > > +	struct list_head	list;
> > > > +	struct device		*device;
> > > 
> > > No, make it a "real" device, embed the device into it.
> > 
> > This device pointer is used to send KOBJ_ONLINE/OFFLINE event during CPU
> > online/offline operation in order to maintain the current behavior.  CPU
> > online/offline operation only changes the state of CPU, so its
> > system/cpu device continues to be present before and after an operation.
> > (Whereas, CPU hot-add/delete operation creates or removes a system/cpu
> > device.)  So, this "*device" needs to be a pointer to reference an
> > existing device that is to be on-lined/off-lined.
> > 
> > > But, again, I'm going to ask why you aren't using the existing cpu /
> > > memory / bridge / node devices that we have in the kernel.  Please use
> > > them, or give me a _really_ good reason why they will not work.
> > 
> > We cannot use the existing system devices or ACPI devices here.  During
> > hot-plug, ACPI handler sets this shp_device info, so that cpu and memory
> > handlers (drivers/cpu.c and mm/memory_hotplug.c) can obtain their target
> > device information in a platform-neutral way.  During hot-add, we first
> > creates an ACPI device node (i.e. device under /sys/bus/acpi/devices),
> > but platform-neutral modules cannot use them as they are ACPI-specific.
> 
> But suppose we're smart and have ACPI scan handlers that will create
> "physical" device nodes for those devices during the ACPI namespace scan.
> Then, the platform-neutral nodes will be able to bind to those "physical"
> nodes.  Moreover, it should be possible to get a hierarchy of device objects
> this way that will reflect all of the dependencies we need to take into
> account during hot-add and hot-remove operations.  That may not be what we
> have today, but I don't see any *fundamental* obstacles preventing us from
> using this approach.

I misstated in my previous email.  system/cpu device is actually created
by ACPI driver during ACPI scan in case of hot-add.  This is done by 
acpi_processor_hotadd_init(), which I consider as a hack but can be
done.  system/memory device is created in add_memory() by the mm module.

> This is already done for PCI host bridges and platform devices and I don't
> see why we can't do that for the other types of devices too.
> 
> The only missing piece I see is a way to handle the "eject" problem, i.e.
> when we try do eject a device at the top of a subtree and need to tear down
> the entire subtree below it, but if that's going to lead to a system crash,
> for example, we want to cancel the eject.  It seems to me that we'll need some
> help from the driver core here.

There are three different approaches suggested for system device
hot-plug:
 A. Proceed within system device bus scan.
 B. Proceed within ACPI bus scan.
 C. Proceed with a sequence (as a mini-boot).

Option A uses system devices as tokens, option B uses acpi devices as
tokens, and option C uses resource tables as tokens, for their handlers.

Here is summary of key questions & answers so far.  I hope this
clarifies why I am suggesting option 3.

1. What are the system devices?
System devices provide system-wide core computing resources, which are
essential to compose a computer system.  System devices are not
connected to any particular standard buses.

2. Why are the system devices special?
The system devices are initialized during early boot-time, by multiple
subsystems, from the boot-up sequence, in pre-defined order.  They
provide low-level services to enable other subsystems to come up.

3. Why can't initialize the system devices from the driver structure at
boot?
The driver structure is initialized at the end of the boot sequence and
requires the low-level services from the system devices initialized
beforehand.

4. Why do we need a new common framework?
Sysfs CPU and memory on-lining/off-lining are performed within the CPU
and memory modules.  They are common code and do not depend on ACPI.
Therefore, a new common framework is necessary to integrate both
on-lining/off-lining operation and hot-plugging operation of system
devices into a single framework.

5. Why can't do everything with ACPI bus scan?
Software dependency among system devices may not be dictated by the ACPI
hierarchy.  For instance, memory should be initialized before CPUs (i.e.
a new cpu may need its local memory), but such ordering cannot be
guaranteed by the ACPI hierarchy.  Also, as described in 4,
online/offline operations are independent from ACPI.  

Thanks,
-Toshi

^ permalink raw reply

* Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory
From: Tang Chen @ 2013-02-01  1:57 UTC (permalink / raw)
  To: Simon Jeons
  Cc: linux-ia64, linux-sh, linux-mm, paulus, hpa, sparclinux, cl,
	linux-s390, x86, linux-acpi, isimatu.yasuaki, linfeng, mgorman,
	kosaki.motohiro, rientjes, len.brown, wency, cmetcalf, glommer,
	Jianguo Wu, yinghai, laijs, linux-kernel, minchan.kim, akpm,
	linuxppc-dev
In-Reply-To: <1359682576.3574.1.camel@kernel>

On 02/01/2013 09:36 AM, Simon Jeons wrote:
> On Fri, 2013-02-01 at 09:32 +0800, Jianguo Wu wrote:
>>>
>>> So if config NUMA, kernel memory will not be linear mapping anymore? For
>>> example,
>>>
>>> Node 0  Node 1
>>>
>>> 0 ~ 10G 11G~14G

It has nothing to do with linear mapping, I think.

>>>
>>> kernel memory only at Node 0? Can part of kernel memory also at Node 1?

Please refer to find_zone_movable_pfns_for_nodes().
The kernel is not only on node0. It uses all the online nodes evenly. :)

>>>
>>> How big is kernel direct mapping memory in x86_64? Is there max limit?
>>
>>
>> Max kernel direct mapping memory in x86_64 is 64TB.
>
> For example, I have 8G memory, all of them will be direct mapping for
> kernel? then userspace memory allocated from where?

I think you misunderstood what Wu tried to say. :)

The kernel mapped that large space, it doesn't mean it is using that 
large space.
The mapping is to make kernel be able to access all the memory, not for 
the kernel
to use only. User space can also use the memory, but each process has 
its own mapping.

For example:

                                        64TB, what ever 
    xxxTB, what ever
logic address space:     |_____kernel_______|_________user_________________|
                                        \  \  /  /
                                         \  /\  /
physical address space:              |___\/__\/_____________|  4GB or 
8GB, what ever
                                           *****

The ***** part physical is mapped to user space in the process' own 
pagetable.
It is also direct mapped in kernel's pagetable. So the kernel can also 
access it. :)

>
>>
>>> It seems that only around 896MB on x86_32.
>>>
>>>>
>>>> We need firmware take part in, such as SRAT in ACPI BIOS, or the firmware
>>>> based memory migration mentioned by Liu Jiang.
>>>
>>> Is there any material about firmware based memory migration?

No, I don't have any because this is a functionality of machine from HUAWEI.
I think you can ask Liu Jiang or Wu Jianguo to share some with you. :)

Thanks. :)

^ permalink raw reply

* Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory
From: Jianguo Wu @ 2013-02-01  1:57 UTC (permalink / raw)
  To: Simon Jeons
  Cc: linux-ia64, linux-sh, Tang Chen, linux-mm, paulus, hpa,
	sparclinux, cl, linux-s390, x86, linux-acpi, isimatu.yasuaki,
	linfeng, mgorman, kosaki.motohiro, rientjes, len.brown, wency,
	cmetcalf, glommer, yinghai, laijs, linux-kernel, minchan.kim,
	akpm, linuxppc-dev
In-Reply-To: <1359682576.3574.1.camel@kernel>

On 2013/2/1 9:36, Simon Jeons wrote:

> On Fri, 2013-02-01 at 09:32 +0800, Jianguo Wu wrote:
>> On 2013/1/31 18:38, Simon Jeons wrote:
>>
>>> Hi Tang,
>>> On Thu, 2013-01-31 at 17:44 +0800, Tang Chen wrote:
>>>> Hi Simon,
>>>>
>>>> On 01/31/2013 04:48 PM, Simon Jeons wrote:
>>>>> Hi Tang,
>>>>> On Thu, 2013-01-31 at 15:10 +0800, Tang Chen wrote:
>>>>>
>>>>> 1. IIUC, there is a button on machine which supports hot-remove memory,
>>>>> then what's the difference between press button and echo to /sys?
>>>>
>>>> No important difference, I think. Since I don't have the machine you are
>>>> saying, I cannot surely answer you. :)
>>>> AFAIK, pressing the button means trigger the hotplug from hardware, sysfs
>>>> is just another entrance. At last, they will run into the same code.
>>>>
>>>>> 2. Since kernel memory is linear mapping(I mean direct mapping part),
>>>>> why can't put kernel direct mapping memory into one memory device, and
>>>>> other memory into the other devices?
>>>>
>>>> We cannot do that because in that way, we will lose NUMA performance.
>>>>
>>>> If you know NUMA, you will understand the following example:
>>>>
>>>> node0:                    node1:
>>>>     cpu0~cpu15                cpu16~cpu31
>>>>     memory0~memory511         memory512~memory1023
>>>>
>>>> cpu16~cpu31 access memory16~memory1023 much faster than memory0~memory511.
>>>> If we set direct mapping area in node0, and movable area in node1, then
>>>> the kernel code running on cpu16~cpu31 will have to access 
>>>> memory0~memory511.
>>>> This is a terrible performance down.
>>>
>>> So if config NUMA, kernel memory will not be linear mapping anymore? For
>>> example, 
>>>
>>> Node 0  Node 1 
>>>
>>> 0 ~ 10G 11G~14G
>>>
>>> kernel memory only at Node 0? Can part of kernel memory also at Node 1?
>>>
>>> How big is kernel direct mapping memory in x86_64? Is there max limit?
>>
>>
>> Max kernel direct mapping memory in x86_64 is 64TB.
> 
> For example, I have 8G memory, all of them will be direct mapping for
> kernel? then userspace memory allocated from where?

Direct mapping memory means you can use __va() and pa(), but not means that them
can be only used by kernel, them can be used by user-space too, as long as them are free.

> 
>>
>>> It seems that only around 896MB on x86_32. 
>>>
>>>>
>>>>> As you know x86_64 don't need
>>>>> highmem, IIUC, all kernel memory will linear mapping in this case. Is my
>>>>> idea available? If is correct, x86_32 can't implement in the same way
>>>>> since highmem(kmap/kmap_atomic/vmalloc) can map any address, so it's
>>>>> hard to focus kernel memory on single memory device.
>>>>
>>>> Sorry, I'm not quite familiar with x86_32 box.
>>>>
>>>>> 3. In current implementation, if memory hotplug just need memory
>>>>> subsystem and ACPI codes support? Or also needs firmware take part in?
>>>>> Hope you can explain in details, thanks in advance. :)
>>>>
>>>> We need firmware take part in, such as SRAT in ACPI BIOS, or the firmware
>>>> based memory migration mentioned by Liu Jiang.
>>>
>>> Is there any material about firmware based memory migration?
>>>
>>>>
>>>> So far, I only know this. :)
>>>>
>>>>> 4. What's the status of memory hotplug? Apart from can't remove kernel
>>>>> memory, other things are fully implementation?
>>>>
>>>> I think the main job is done for now. And there are still bugs to fix.
>>>> And this functionality is not stable.
>>>>
>>>> Thanks. :)
>>>
>>>
>>> --
>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>> the body to majordomo@kvack.org.  For more info on Linux MM,
>>> see: http://www.linux-mm.org/ .
>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>>
>>> .
>>>
>>
>>
>>
> 
> 
> 
> .
> 

^ permalink raw reply

* Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory
From: Simon Jeons @ 2013-02-01  2:06 UTC (permalink / raw)
  To: Jianguo Wu
  Cc: linux-ia64, linux-sh, Tang Chen, linux-mm, paulus, hpa,
	sparclinux, cl, linux-s390, x86, linux-acpi, isimatu.yasuaki,
	linfeng, mgorman, kosaki.motohiro, rientjes, len.brown, wency,
	cmetcalf, glommer, yinghai, laijs, linux-kernel, minchan.kim,
	akpm, linuxppc-dev
In-Reply-To: <510B20F2.20906@huawei.com>

Hi Jianguo,
On Fri, 2013-02-01 at 09:57 +0800, Jianguo Wu wrote:
> On 2013/2/1 9:36, Simon Jeons wrote:
> 
> > On Fri, 2013-02-01 at 09:32 +0800, Jianguo Wu wrote:
> >> On 2013/1/31 18:38, Simon Jeons wrote:
> >>
> >>> Hi Tang,
> >>> On Thu, 2013-01-31 at 17:44 +0800, Tang Chen wrote:
> >>>> Hi Simon,
> >>>>
> >>>> On 01/31/2013 04:48 PM, Simon Jeons wrote:
> >>>>> Hi Tang,
> >>>>> On Thu, 2013-01-31 at 15:10 +0800, Tang Chen wrote:
> >>>>>
> >>>>> 1. IIUC, there is a button on machine which supports hot-remove memory,
> >>>>> then what's the difference between press button and echo to /sys?
> >>>>
> >>>> No important difference, I think. Since I don't have the machine you are
> >>>> saying, I cannot surely answer you. :)
> >>>> AFAIK, pressing the button means trigger the hotplug from hardware, sysfs
> >>>> is just another entrance. At last, they will run into the same code.
> >>>>
> >>>>> 2. Since kernel memory is linear mapping(I mean direct mapping part),
> >>>>> why can't put kernel direct mapping memory into one memory device, and
> >>>>> other memory into the other devices?
> >>>>
> >>>> We cannot do that because in that way, we will lose NUMA performance.
> >>>>
> >>>> If you know NUMA, you will understand the following example:
> >>>>
> >>>> node0:                    node1:
> >>>>     cpu0~cpu15                cpu16~cpu31
> >>>>     memory0~memory511         memory512~memory1023
> >>>>
> >>>> cpu16~cpu31 access memory16~memory1023 much faster than memory0~memory511.
> >>>> If we set direct mapping area in node0, and movable area in node1, then
> >>>> the kernel code running on cpu16~cpu31 will have to access 
> >>>> memory0~memory511.
> >>>> This is a terrible performance down.
> >>>
> >>> So if config NUMA, kernel memory will not be linear mapping anymore? For
> >>> example, 
> >>>
> >>> Node 0  Node 1 
> >>>
> >>> 0 ~ 10G 11G~14G
> >>>
> >>> kernel memory only at Node 0? Can part of kernel memory also at Node 1?
> >>>
> >>> How big is kernel direct mapping memory in x86_64? Is there max limit?
> >>
> >>
> >> Max kernel direct mapping memory in x86_64 is 64TB.
> > 
> > For example, I have 8G memory, all of them will be direct mapping for
> > kernel? then userspace memory allocated from where?
> 
> Direct mapping memory means you can use __va() and pa(), but not means that them
> can be only used by kernel, them can be used by user-space too, as long as them are free.

IIUC, the benefit of va() and pa() is just for quick get
virtual/physical address, it takes advantage of linear mapping. But mmu
still need to go through pgd/pud/pmd/pte, correct?

> 
> > 
> >>
> >>> It seems that only around 896MB on x86_32. 
> >>>
> >>>>
> >>>>> As you know x86_64 don't need
> >>>>> highmem, IIUC, all kernel memory will linear mapping in this case. Is my
> >>>>> idea available? If is correct, x86_32 can't implement in the same way
> >>>>> since highmem(kmap/kmap_atomic/vmalloc) can map any address, so it's
> >>>>> hard to focus kernel memory on single memory device.
> >>>>
> >>>> Sorry, I'm not quite familiar with x86_32 box.
> >>>>
> >>>>> 3. In current implementation, if memory hotplug just need memory
> >>>>> subsystem and ACPI codes support? Or also needs firmware take part in?
> >>>>> Hope you can explain in details, thanks in advance. :)
> >>>>
> >>>> We need firmware take part in, such as SRAT in ACPI BIOS, or the firmware
> >>>> based memory migration mentioned by Liu Jiang.
> >>>
> >>> Is there any material about firmware based memory migration?
> >>>
> >>>>
> >>>> So far, I only know this. :)
> >>>>
> >>>>> 4. What's the status of memory hotplug? Apart from can't remove kernel
> >>>>> memory, other things are fully implementation?
> >>>>
> >>>> I think the main job is done for now. And there are still bugs to fix.
> >>>> And this functionality is not stable.
> >>>>
> >>>> Thanks. :)
> >>>
> >>>
> >>> --
> >>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >>> the body to majordomo@kvack.org.  For more info on Linux MM,
> >>> see: http://www.linux-mm.org/ .
> >>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> >>>
> >>> .
> >>>
> >>
> >>
> >>
> > 
> > 
> > 
> > .
> > 
> 
> 
> 

^ permalink raw reply

* Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory
From: Jianguo Wu @ 2013-02-01  2:18 UTC (permalink / raw)
  To: Simon Jeons
  Cc: linux-ia64, linux-sh, Tang Chen, linux-mm, paulus, hpa,
	sparclinux, cl, linux-s390, x86, linux-acpi, isimatu.yasuaki,
	linfeng, mgorman, kosaki.motohiro, rientjes, len.brown, wency,
	cmetcalf, glommer, yinghai, laijs, linux-kernel, minchan.kim,
	akpm, linuxppc-dev
In-Reply-To: <1359684403.1303.3.camel@kernel>

On 2013/2/1 10:06, Simon Jeons wrote:

> Hi Jianguo,
> On Fri, 2013-02-01 at 09:57 +0800, Jianguo Wu wrote:
>> On 2013/2/1 9:36, Simon Jeons wrote:
>>
>>> On Fri, 2013-02-01 at 09:32 +0800, Jianguo Wu wrote:
>>>> On 2013/1/31 18:38, Simon Jeons wrote:
>>>>
>>>>> Hi Tang,
>>>>> On Thu, 2013-01-31 at 17:44 +0800, Tang Chen wrote:
>>>>>> Hi Simon,
>>>>>>
>>>>>> On 01/31/2013 04:48 PM, Simon Jeons wrote:
>>>>>>> Hi Tang,
>>>>>>> On Thu, 2013-01-31 at 15:10 +0800, Tang Chen wrote:
>>>>>>>
>>>>>>> 1. IIUC, there is a button on machine which supports hot-remove memory,
>>>>>>> then what's the difference between press button and echo to /sys?
>>>>>>
>>>>>> No important difference, I think. Since I don't have the machine you are
>>>>>> saying, I cannot surely answer you. :)
>>>>>> AFAIK, pressing the button means trigger the hotplug from hardware, sysfs
>>>>>> is just another entrance. At last, they will run into the same code.
>>>>>>
>>>>>>> 2. Since kernel memory is linear mapping(I mean direct mapping part),
>>>>>>> why can't put kernel direct mapping memory into one memory device, and
>>>>>>> other memory into the other devices?
>>>>>>
>>>>>> We cannot do that because in that way, we will lose NUMA performance.
>>>>>>
>>>>>> If you know NUMA, you will understand the following example:
>>>>>>
>>>>>> node0:                    node1:
>>>>>>     cpu0~cpu15                cpu16~cpu31
>>>>>>     memory0~memory511         memory512~memory1023
>>>>>>
>>>>>> cpu16~cpu31 access memory16~memory1023 much faster than memory0~memory511.
>>>>>> If we set direct mapping area in node0, and movable area in node1, then
>>>>>> the kernel code running on cpu16~cpu31 will have to access 
>>>>>> memory0~memory511.
>>>>>> This is a terrible performance down.
>>>>>
>>>>> So if config NUMA, kernel memory will not be linear mapping anymore? For
>>>>> example, 
>>>>>
>>>>> Node 0  Node 1 
>>>>>
>>>>> 0 ~ 10G 11G~14G
>>>>>
>>>>> kernel memory only at Node 0? Can part of kernel memory also at Node 1?
>>>>>
>>>>> How big is kernel direct mapping memory in x86_64? Is there max limit?
>>>>
>>>>
>>>> Max kernel direct mapping memory in x86_64 is 64TB.
>>>
>>> For example, I have 8G memory, all of them will be direct mapping for
>>> kernel? then userspace memory allocated from where?
>>
>> Direct mapping memory means you can use __va() and pa(), but not means that them
>> can be only used by kernel, them can be used by user-space too, as long as them are free.
> 
> IIUC, the benefit of va() and pa() is just for quick get
> virtual/physical address, it takes advantage of linear mapping. But mmu
> still need to go through pgd/pud/pmd/pte, correct?

Yes.

> 

>>
>>>
>>>>
>>>>> It seems that only around 896MB on x86_32. 
>>>>>
>>>>>>
>>>>>>> As you know x86_64 don't need
>>>>>>> highmem, IIUC, all kernel memory will linear mapping in this case. Is my
>>>>>>> idea available? If is correct, x86_32 can't implement in the same way
>>>>>>> since highmem(kmap/kmap_atomic/vmalloc) can map any address, so it's
>>>>>>> hard to focus kernel memory on single memory device.
>>>>>>
>>>>>> Sorry, I'm not quite familiar with x86_32 box.
>>>>>>
>>>>>>> 3. In current implementation, if memory hotplug just need memory
>>>>>>> subsystem and ACPI codes support? Or also needs firmware take part in?
>>>>>>> Hope you can explain in details, thanks in advance. :)
>>>>>>
>>>>>> We need firmware take part in, such as SRAT in ACPI BIOS, or the firmware
>>>>>> based memory migration mentioned by Liu Jiang.
>>>>>
>>>>> Is there any material about firmware based memory migration?
>>>>>
>>>>>>
>>>>>> So far, I only know this. :)
>>>>>>
>>>>>>> 4. What's the status of memory hotplug? Apart from can't remove kernel
>>>>>>> memory, other things are fully implementation?
>>>>>>
>>>>>> I think the main job is done for now. And there are still bugs to fix.
>>>>>> And this functionality is not stable.
>>>>>>
>>>>>> Thanks. :)
>>>>>
>>>>>
>>>>> --
>>>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>>>> the body to majordomo@kvack.org.  For more info on Linux MM,
>>>>> see: http://www.linux-mm.org/ .
>>>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>>>>
>>>>> .
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> .
>>>
>>
>>
>>
> 
> 
> 
> .
> 

^ permalink raw reply

* Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory
From: Tang Chen @ 2013-02-01  2:42 UTC (permalink / raw)
  To: Simon Jeons
  Cc: linux-ia64, linux-sh, linux-mm, paulus, hpa, sparclinux, cl,
	linux-s390, x86, linux-acpi, isimatu.yasuaki, linfeng, mgorman,
	kosaki.motohiro, rientjes, len.brown, wency, cmetcalf, glommer,
	Jianguo Wu, yinghai, laijs, linux-kernel, minchan.kim, akpm,
	linuxppc-dev
In-Reply-To: <1359685040.1303.6.camel@kernel>

Hi Simon,

On 02/01/2013 10:17 AM, Simon Jeons wrote:
>> For example:
>>
>>                                          64TB, what ever
>>      xxxTB, what ever
>> logic address space:     |_____kernel_______|_________user_________________|
>>                                          \  \  /  /
>>                                           \  /\  /
>> physical address space:              |___\/__\/_____________|  4GB or
>> 8GB, what ever
>>                                             *****
>
> How much address space user process can have on x86_64? Also 8GB?

Usually, we don't say that.

8GB is your physical memory, right ?
But kernel space and user space is the logic conception in OS. They are 
in logic
address space.

So both the kernel space and the user space can use all the physical memory.
But if the page is already in use by either of them, the other one 
cannot use it.
For example, some pages are direct mapped to kernel, and is in use by 
kernel, the
user space cannot map it.

>
>>
>> The ***** part physical is mapped to user space in the process' own
>> pagetable.
>> It is also direct mapped in kernel's pagetable. So the kernel can also
>> access it. :)
>
> But how to protect user process not modify kernel memory?

This is the job of CPU. On intel cpus, user space code is running in 
level 3, and
kernel space code is running in level 0. So the code in level 3 cannot 
access the data
segment in level 0.

Thanks. :)

^ permalink raw reply

* Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory
From: Simon Jeons @ 2013-02-01  2:17 UTC (permalink / raw)
  To: Tang Chen
  Cc: linux-ia64, linux-sh, linux-mm, paulus, hpa, sparclinux, cl,
	linux-s390, x86, linux-acpi, isimatu.yasuaki, linfeng, mgorman,
	kosaki.motohiro, rientjes, len.brown, wency, cmetcalf, glommer,
	Jianguo Wu, yinghai, laijs, linux-kernel, minchan.kim, akpm,
	linuxppc-dev
In-Reply-To: <510B20F9.10408@cn.fujitsu.com>

Hi Tang,
On Fri, 2013-02-01 at 09:57 +0800, Tang Chen wrote:
> On 02/01/2013 09:36 AM, Simon Jeons wrote:
> > On Fri, 2013-02-01 at 09:32 +0800, Jianguo Wu wrote:
> >>>
> >>> So if config NUMA, kernel memory will not be linear mapping anymore? For
> >>> example,
> >>>
> >>> Node 0  Node 1
> >>>
> >>> 0 ~ 10G 11G~14G
> 
> It has nothing to do with linear mapping, I think.
> 
> >>>
> >>> kernel memory only at Node 0? Can part of kernel memory also at Node 1?
> 
> Please refer to find_zone_movable_pfns_for_nodes().

I see, thanks. :)

> The kernel is not only on node0. It uses all the online nodes evenly. :)
> 
> >>>
> >>> How big is kernel direct mapping memory in x86_64? Is there max limit?
> >>
> >>
> >> Max kernel direct mapping memory in x86_64 is 64TB.
> >
> > For example, I have 8G memory, all of them will be direct mapping for
> > kernel? then userspace memory allocated from where?
> 
> I think you misunderstood what Wu tried to say. :)
> 
> The kernel mapped that large space, it doesn't mean it is using that 
> large space.
> The mapping is to make kernel be able to access all the memory, not for 
> the kernel
> to use only. User space can also use the memory, but each process has 
> its own mapping.
> 
> For example:
> 
>                                         64TB, what ever 
>     xxxTB, what ever
> logic address space:     |_____kernel_______|_________user_________________|
>                                         \  \  /  /
>                                          \  /\  /
> physical address space:              |___\/__\/_____________|  4GB or 
> 8GB, what ever
>                                            *****

How much address space user process can have on x86_64? Also 8GB?

> 
> The ***** part physical is mapped to user space in the process' own 
> pagetable.
> It is also direct mapped in kernel's pagetable. So the kernel can also 
> access it. :)

But how to protect user process not modify kernel memory?

> 
> >
> >>
> >>> It seems that only around 896MB on x86_32.
> >>>
> >>>>
> >>>> We need firmware take part in, such as SRAT in ACPI BIOS, or the firmware
> >>>> based memory migration mentioned by Liu Jiang.
> >>>
> >>> Is there any material about firmware based memory migration?
> 
> No, I don't have any because this is a functionality of machine from HUAWEI.
> I think you can ask Liu Jiang or Wu Jianguo to share some with you. :)
> 
> Thanks. :)

^ permalink raw reply

* Re: [PATCH] powerpc: kernel/kgdb.c: fix memory leakage
From: Jason Wessel @ 2013-02-01  2:04 UTC (permalink / raw)
  To: Cong Ding
  Cc: Stephen Rothwell, Michael Neuling, linux-kernel, Tiejun Chen,
	Paul Mackerras, linuxppc-dev
In-Reply-To: <1358184395-31418-1-git-send-email-dinggnu@gmail.com>

On 01/14/2013 11:26 AM, Cong Ding wrote:
> the variable backup_current_thread_info isn't freed before existing the
> function.
> 
> Signed-off-by: Cong Ding <dinggnu@gmail.com>
> ---
>  arch/powerpc/kernel/kgdb.c |    5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/kgdb.c b/arch/powerpc/kernel/kgdb.c
> index 8747447..5ca82cd 100644
> --- a/arch/powerpc/kernel/kgdb.c
> +++ b/arch/powerpc/kernel/kgdb.c
> @@ -154,12 +154,12 @@ static int kgdb_handle_breakpoint(struct pt_regs *regs)
>  static int kgdb_singlestep(struct pt_regs *regs)
>  {
>  	struct thread_info *thread_info, *exception_thread_info;
> -	struct thread_info *backup_current_thread_info = \
> -		(struct thread_info *)kmalloc(sizeof(struct thread_info), GFP_KERNEL);
> +	struct thread_info *backup_current_thread_info;



Woh...  This is definitely wrong.  You have found a problem for sure,
but this is not the right way to fix it.

It is not a good idea to kmalloc while single stepping because you can
hang the kernel if you single step any operation in kmalloc().

I am in the process of going through all the kgdb mails from the last
few months while I had been away from the project, so I didn't catch
this one and I see it has upstream commit (fefd9e6f8).  I'll submit
another patch to fix this the right way and use a static variable.
This is ok to use a static variable here because this is not something
we can recursively call at a single CPU level.

If Ben prefers we not burn the memory unless kgdb is active we can
kmalloc / kfree the space we need at the time that kgdb is
initialized.  Else we can go with this patch you see below.  We'll see
what Ben desires.

-----
diff --git a/arch/powerpc/kernel/kgdb.c b/arch/powerpc/kernel/kgdb.c
index a7bc752..bb12c8b 100644
--- a/arch/powerpc/kernel/kgdb.c
+++ b/arch/powerpc/kernel/kgdb.c
@@ -151,15 +151,16 @@ static int kgdb_handle_breakpoint(struct pt_regs *regs)
 	return 1;
 }
 
+static struct thread_info kgdb_backup_thread_info[NR_CPUS];
+
 static int kgdb_singlestep(struct pt_regs *regs)
 {
 	struct thread_info *thread_info, *exception_thread_info;
-	struct thread_info *backup_current_thread_info;
+	int cpu = raw_smp_processor_id();
 
 	if (user_mode(regs))
 		return 0;
 
-	backup_current_thread_info = (struct thread_info *)kmalloc(sizeof(struct thread_info), GFP_KERNEL);
 	/*
 	 * On Book E and perhaps other processors, singlestep is handled on
 	 * the critical exception stack.  This causes current_thread_info()
@@ -175,7 +176,7 @@ static int kgdb_singlestep(struct pt_regs *regs)
 
 	if (thread_info != exception_thread_info) {
 		/* Save the original current_thread_info. */
-		memcpy(backup_current_thread_info, exception_thread_info, sizeof *thread_info);
+		memcpy(&kgdb_backup_thread_info[cpu], exception_thread_info, sizeof *thread_info);
 		memcpy(exception_thread_info, thread_info, sizeof *thread_info);
 	}
 
@@ -183,9 +184,8 @@ static int kgdb_singlestep(struct pt_regs *regs)
 
 	if (thread_info != exception_thread_info)
 		/* Restore current_thread_info lastly. */
-		memcpy(exception_thread_info, backup_current_thread_info, sizeof *thread_info);
+		memcpy(exception_thread_info, &kgdb_backup_thread_info[cpu], sizeof *thread_info);
 
-	kfree(backup_current_thread_info);
 	return 1;
 }
 

-----


Thanks,
Jason.


>  
>  	if (user_mode(regs))
>  		return 0;
>  
> +	backup_current_thread_info = (struct thread_info *)kmalloc(sizeof(struct thread_info), GFP_KERNEL);
>  	/*
>  	 * On Book E and perhaps other processors, singlestep is handled on
>  	 * the critical exception stack.  This causes current_thread_info()
> @@ -185,6 +185,7 @@ static int kgdb_singlestep(struct pt_regs *regs)
>  		/* Restore current_thread_info lastly. */
>  		memcpy(exception_thread_info, backup_current_thread_info, sizeof *thread_info);
>  
> +	kfree(backup_current_thread_info);
>  	return 1;
>  }
>  
> 

^ permalink raw reply related

* Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory
From: Simon Jeons @ 2013-02-01  3:06 UTC (permalink / raw)
  To: Tang Chen
  Cc: linux-ia64, linux-sh, linux-mm, paulus, hpa, sparclinux, cl,
	linux-s390, x86, linux-acpi, isimatu.yasuaki, linfeng, mgorman,
	kosaki.motohiro, rientjes, len.brown, wency, cmetcalf, glommer,
	Jianguo Wu, yinghai, laijs, linux-kernel, minchan.kim, akpm,
	linuxppc-dev
In-Reply-To: <510B2B8A.7040407@cn.fujitsu.com>

Hi Tang,
On Fri, 2013-02-01 at 10:42 +0800, Tang Chen wrote:

I confuse!

> Hi Simon,
> 
> On 02/01/2013 10:17 AM, Simon Jeons wrote:
> >> For example:
> >>
> >>                                          64TB, what ever
> >>      xxxTB, what ever
> >> logic address space:     |_____kernel_______|_________user_________________|
> >>                                          \  \  /  /
> >>                                           \  /\  /
> >> physical address space:              |___\/__\/_____________|  4GB or
> >> 8GB, what ever
> >>                                             *****
> >
> > How much address space user process can have on x86_64? Also 8GB?
> 
> Usually, we don't say that.
> 
> 8GB is your physical memory, right ?
> But kernel space and user space is the logic conception in OS. They are 
> in logic
> address space.
> 
> So both the kernel space and the user space can use all the physical memory.
> But if the page is already in use by either of them, the other one 
> cannot use it.
> For example, some pages are direct mapped to kernel, and is in use by 
> kernel, the
> user space cannot map it.

How can distinguish map and use? I mean how can confirm memory is used
by kernel instead of map? 

> 
> >
> >>
> >> The ***** part physical is mapped to user space in the process' own
> >> pagetable.
> >> It is also direct mapped in kernel's pagetable. So the kernel can also
> >> access it. :)
> >
> > But how to protect user process not modify kernel memory?
> 
> This is the job of CPU. On intel cpus, user space code is running in 
> level 3, and
> kernel space code is running in level 0. So the code in level 3 cannot 
> access the data
> segment in level 0.

1) If user process and kenel map to same physical memory, user process
will get SIGSEGV during #PF if access to this memory, but If user proces
s will map to the same memory which kernel map? Why? It can't access it.
2) If two user processes map to same physical memory, what will happen
if one process access the memory?

> 
> Thanks. :)

^ permalink raw reply

* Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory
From: Simon Jeons @ 2013-02-01  1:36 UTC (permalink / raw)
  To: Jianguo Wu
  Cc: linux-ia64, linux-sh, Tang Chen, linux-mm, paulus, hpa,
	sparclinux, cl, linux-s390, x86, linux-acpi, isimatu.yasuaki,
	linfeng, mgorman, kosaki.motohiro, rientjes, len.brown, wency,
	cmetcalf, glommer, yinghai, laijs, linux-kernel, minchan.kim,
	akpm, linuxppc-dev
In-Reply-To: <510B1B4B.5080207@huawei.com>

On Fri, 2013-02-01 at 09:32 +0800, Jianguo Wu wrote:
> On 2013/1/31 18:38, Simon Jeons wrote:
> 
> > Hi Tang,
> > On Thu, 2013-01-31 at 17:44 +0800, Tang Chen wrote:
> >> Hi Simon,
> >>
> >> On 01/31/2013 04:48 PM, Simon Jeons wrote:
> >>> Hi Tang,
> >>> On Thu, 2013-01-31 at 15:10 +0800, Tang Chen wrote:
> >>>
> >>> 1. IIUC, there is a button on machine which supports hot-remove memory,
> >>> then what's the difference between press button and echo to /sys?
> >>
> >> No important difference, I think. Since I don't have the machine you are
> >> saying, I cannot surely answer you. :)
> >> AFAIK, pressing the button means trigger the hotplug from hardware, sysfs
> >> is just another entrance. At last, they will run into the same code.
> >>
> >>> 2. Since kernel memory is linear mapping(I mean direct mapping part),
> >>> why can't put kernel direct mapping memory into one memory device, and
> >>> other memory into the other devices?
> >>
> >> We cannot do that because in that way, we will lose NUMA performance.
> >>
> >> If you know NUMA, you will understand the following example:
> >>
> >> node0:                    node1:
> >>     cpu0~cpu15                cpu16~cpu31
> >>     memory0~memory511         memory512~memory1023
> >>
> >> cpu16~cpu31 access memory16~memory1023 much faster than memory0~memory511.
> >> If we set direct mapping area in node0, and movable area in node1, then
> >> the kernel code running on cpu16~cpu31 will have to access 
> >> memory0~memory511.
> >> This is a terrible performance down.
> > 
> > So if config NUMA, kernel memory will not be linear mapping anymore? For
> > example, 
> > 
> > Node 0  Node 1 
> > 
> > 0 ~ 10G 11G~14G
> > 
> > kernel memory only at Node 0? Can part of kernel memory also at Node 1?
> > 
> > How big is kernel direct mapping memory in x86_64? Is there max limit?
> 
> 
> Max kernel direct mapping memory in x86_64 is 64TB.

For example, I have 8G memory, all of them will be direct mapping for
kernel? then userspace memory allocated from where?

> 
> > It seems that only around 896MB on x86_32. 
> > 
> >>
> >>> As you know x86_64 don't need
> >>> highmem, IIUC, all kernel memory will linear mapping in this case. Is my
> >>> idea available? If is correct, x86_32 can't implement in the same way
> >>> since highmem(kmap/kmap_atomic/vmalloc) can map any address, so it's
> >>> hard to focus kernel memory on single memory device.
> >>
> >> Sorry, I'm not quite familiar with x86_32 box.
> >>
> >>> 3. In current implementation, if memory hotplug just need memory
> >>> subsystem and ACPI codes support? Or also needs firmware take part in?
> >>> Hope you can explain in details, thanks in advance. :)
> >>
> >> We need firmware take part in, such as SRAT in ACPI BIOS, or the firmware
> >> based memory migration mentioned by Liu Jiang.
> > 
> > Is there any material about firmware based memory migration?
> > 
> >>
> >> So far, I only know this. :)
> >>
> >>> 4. What's the status of memory hotplug? Apart from can't remove kernel
> >>> memory, other things are fully implementation?
> >>
> >> I think the main job is done for now. And there are still bugs to fix.
> >> And this functionality is not stable.
> >>
> >> Thanks. :)
> > 
> > 
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> > 
> > .
> > 
> 
> 
> 

^ permalink raw reply

* Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory
From: Tang Chen @ 2013-02-01  3:39 UTC (permalink / raw)
  To: Simon Jeons
  Cc: linux-ia64, linux-sh, linux-mm, paulus, hpa, sparclinux, cl,
	linux-s390, x86, linux-acpi, isimatu.yasuaki, linfeng, mgorman,
	kosaki.motohiro, rientjes, len.brown, wency, cmetcalf, glommer,
	Jianguo Wu, yinghai, laijs, linux-kernel, minchan.kim, akpm,
	linuxppc-dev
In-Reply-To: <1359687985.1303.15.camel@kernel>

Hi Simon,

On 02/01/2013 11:06 AM, Simon Jeons wrote:
>
> How can distinguish map and use? I mean how can confirm memory is used
> by kernel instead of map?

If the page is free, for example, it is in the buddy system, it is not 
in use.
Even if it is direct mapped by kernel, the kernel logic should not to 
access it
because you didn't allocate it. This is the kernel's logic. Of course 
the hardware
and the user will not know this.

You want to access some memory, you should first have a logic address, 
right?
So how can you get a logic address ?  You call alloc api.

For example, when you are coding, of course you write:

p = alloc_xxx(); ---- allocate memory, now, it is in use, alloc_xxx() 
makes kernel know it.
*p = ......      ---- use the memory

You won't write:
p = 0xFFFF8745;  ---- if so, kernel doesn't know it is in use
*p = ......      ---- wrong...

right ?

The kernel mapped a page, it doesn't mean it is using the page. You 
should allocate it.
That is just the kernel's allocating logic.

Well, I think I can only give you this answer now. If you want something 
deeper, I think
you need to read how the kernel manage the physical pages. :)

>
> 1) If user process and kenel map to same physical memory, user process
> will get SIGSEGV during #PF if access to this memory, but If user proces
> s will map to the same memory which kernel map? Why? It can't access it.

When you call malloc() to allocate memory in user space, the OS logic will
assure that you won't map a page that has already been used by kernel.

A page is mapped by kernel, but not used by kernel (not allocated, like 
above),
malloc() could allocate it, and map it to user space. This is the situation
you are talking about, right ?

Now it is mapped by kernel and user, but it is only allocated by user. 
So the kernel
will not use it. When the kernel wants some memory, it will allocate 
some other memory.
This is just the kernel logic. This is what memory management subsystem 
does.

I think I cannot answer more because I'm also a student in memory 
management.
This is just my understanding. And I hope it is helpful. :)

> 2) If two user processes map to same physical memory, what will happen
> if one process access the memory?

Obviously you don't need to worry about this situation. We can swap the page
used by process 1 out, and process 2 can use the same page. When process 
1 wants
to access it again, we swap it in. This only happens when the physical 
memory
is not enough to use. :)

And also, if you are using shared memory in user space, like

shmget(), shmat()......

it is the shared memory, both processes can use it at the same time.

Thanks. :)

^ permalink raw reply

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Greg KH @ 2013-02-01  7:23 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-s390, Toshi Kani, jiang.liu, wency, linux-acpi, yinghai,
	linux-kernel, linux-mm, isimatu.yasuaki, srivatsa.bhat, guohanjun,
	bhelgaas, akpm, linuxppc-dev, lenb
In-Reply-To: <9860755.q4y3PrCFZx@vostro.rjw.lan>

On Thu, Jan 31, 2013 at 09:54:51PM +0100, Rafael J. Wysocki wrote:
> > > But, again, I'm going to ask why you aren't using the existing cpu /
> > > memory / bridge / node devices that we have in the kernel.  Please use
> > > them, or give me a _really_ good reason why they will not work.
> > 
> > We cannot use the existing system devices or ACPI devices here.  During
> > hot-plug, ACPI handler sets this shp_device info, so that cpu and memory
> > handlers (drivers/cpu.c and mm/memory_hotplug.c) can obtain their target
> > device information in a platform-neutral way.  During hot-add, we first
> > creates an ACPI device node (i.e. device under /sys/bus/acpi/devices),
> > but platform-neutral modules cannot use them as they are ACPI-specific.
> 
> But suppose we're smart and have ACPI scan handlers that will create
> "physical" device nodes for those devices during the ACPI namespace scan.
> Then, the platform-neutral nodes will be able to bind to those "physical"
> nodes.  Moreover, it should be possible to get a hierarchy of device objects
> this way that will reflect all of the dependencies we need to take into
> account during hot-add and hot-remove operations.  That may not be what we
> have today, but I don't see any *fundamental* obstacles preventing us from
> using this approach.

I would _much_ rather see that be the solution here as I think it is the
proper one.

> This is already done for PCI host bridges and platform devices and I don't
> see why we can't do that for the other types of devices too.

I agree.

> The only missing piece I see is a way to handle the "eject" problem, i.e.
> when we try do eject a device at the top of a subtree and need to tear down
> the entire subtree below it, but if that's going to lead to a system crash,
> for example, we want to cancel the eject.  It seems to me that we'll need some
> help from the driver core here.

I say do what we always have done here, if the user asked us to tear
something down, let it happen as they are the ones that know best :)

Seriously, I guess this gets back to the "fail disconnect" idea that the
ACPI developers keep harping on.  I thought we already resolved this
properly by having them implement it in their bus code, no reason the
same thing couldn't happen here, right?  I don't think the core needs to
do anything special, but if so, I'll be glad to review it.

thanks,

gre k-h

^ permalink raw reply

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Greg KH @ 2013-02-01  7:30 UTC (permalink / raw)
  To: Toshi Kani
  Cc: linux-s390, jiang.liu, wency, linux-mm, yinghai, linux-kernel,
	Rafael J. Wysocki, linux-acpi, isimatu.yasuaki, srivatsa.bhat,
	guohanjun, bhelgaas, akpm, linuxppc-dev, lenb
In-Reply-To: <1359682338.15120.209.camel@misato.fc.hp.com>

On Thu, Jan 31, 2013 at 06:32:18PM -0700, Toshi Kani wrote:
 > This is already done for PCI host bridges and platform devices and I don't
> > see why we can't do that for the other types of devices too.
> > 
> > The only missing piece I see is a way to handle the "eject" problem, i.e.
> > when we try do eject a device at the top of a subtree and need to tear down
> > the entire subtree below it, but if that's going to lead to a system crash,
> > for example, we want to cancel the eject.  It seems to me that we'll need some
> > help from the driver core here.
> 
> There are three different approaches suggested for system device
> hot-plug:
>  A. Proceed within system device bus scan.
>  B. Proceed within ACPI bus scan.
>  C. Proceed with a sequence (as a mini-boot).
> 
> Option A uses system devices as tokens, option B uses acpi devices as
> tokens, and option C uses resource tables as tokens, for their handlers.
> 
> Here is summary of key questions & answers so far.  I hope this
> clarifies why I am suggesting option 3.
> 
> 1. What are the system devices?
> System devices provide system-wide core computing resources, which are
> essential to compose a computer system.  System devices are not
> connected to any particular standard buses.

Not a problem, lots of devices are not connected to any "particular
standard busses".  All this means is that system devices are connected
to the "system" bus, nothing more.

> 2. Why are the system devices special?
> The system devices are initialized during early boot-time, by multiple
> subsystems, from the boot-up sequence, in pre-defined order.  They
> provide low-level services to enable other subsystems to come up.

Sorry, no, that doesn't mean they are special, nothing here is unique
for the point of view of the driver model from any other device or bus.

> 3. Why can't initialize the system devices from the driver structure at
> boot?
> The driver structure is initialized at the end of the boot sequence and
> requires the low-level services from the system devices initialized
> beforehand.

Wait, what "driver structure"?  If you need to initialize the driver
core earlier, then do so.  Or, even better, just wait until enough of
the system has come up and then go initialize all of the devices you
have found so far as part of your boot process.

None of the above things you have stated seem to have anything to do
with your proposed patch, so I don't understand why you have mentioned
them...

> 4. Why do we need a new common framework?
> Sysfs CPU and memory on-lining/off-lining are performed within the CPU
> and memory modules.  They are common code and do not depend on ACPI.
> Therefore, a new common framework is necessary to integrate both
> on-lining/off-lining operation and hot-plugging operation of system
> devices into a single framework.

{sigh}

Removing and adding devices and handling hotplug operations is what the
driver core was written for, almost 10 years ago.  To somehow think that
your devices are "special" just because they don't use ACPI is odd,
because the driver core itself has nothing to do with ACPI.  Don't get
the current mix of x86 system code tied into ACPI confused with an
driver core issues here please.

> 5. Why can't do everything with ACPI bus scan?
> Software dependency among system devices may not be dictated by the ACPI
> hierarchy.  For instance, memory should be initialized before CPUs (i.e.
> a new cpu may need its local memory), but such ordering cannot be
> guaranteed by the ACPI hierarchy.  Also, as described in 4,
> online/offline operations are independent from ACPI.  

That's fine, the driver core is independant from ACPI.  I don't care how
you do the scaning of your devices, but I do care about you creating new
driver core pieces that duplicate the existing functionality of what we
have today.

In short, I like Rafael's proposal better, and I fail to see how
anything you have stated here would matter in how this is implemented. :)

thanks,

greg k-h

^ permalink raw reply

* [PATCH 1/2] gianfar: Clean up an unnecessary function gfar_new_skb
From: Jianhua Xie @ 2013-02-01  9:07 UTC (permalink / raw)
  To: claudiu.manoil, davem; +Cc: Jianhua Xie, linuxppc-dev

Clean up an unnecessary function gfar_new_skb, since gfar_new_skb()
has the same parameters, return value and function as gfar_alloc_skb(),
should be cleaned up.

Signed-off-by: Jianhua Xie <jianhua.xie@freescale.com>
---
 drivers/net/ethernet/freescale/gianfar.c |   11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c
index bffb2ed..94d36ac 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -110,7 +110,7 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev);
 static void gfar_reset_task(struct work_struct *work);
 static void gfar_timeout(struct net_device *dev);
 static int gfar_close(struct net_device *dev);
-struct sk_buff *gfar_new_skb(struct net_device *dev);
+static struct sk_buff *gfar_alloc_skb(struct net_device *dev);
 static void gfar_new_rxbdp(struct gfar_priv_rx_q *rx_queue, struct rxbd8 *bdp,
 			   struct sk_buff *skb);
 static int gfar_set_mac_address(struct net_device *dev);
@@ -207,7 +207,7 @@ static int gfar_init_bds(struct net_device *ndev)
 				gfar_init_rxbdp(rx_queue, rxbdp,
 						rxbdp->bufPtr);
 			} else {
-				skb = gfar_new_skb(ndev);
+				skb = gfar_alloc_skb(ndev);
 				if (!skb) {
 					netdev_err(ndev, "Can't allocate RX buffers\n");
 					return -ENOMEM;
@@ -2612,11 +2612,6 @@ static struct sk_buff *gfar_alloc_skb(struct net_device *dev)
 	return skb;
 }
 
-struct sk_buff *gfar_new_skb(struct net_device *dev)
-{
-	return gfar_alloc_skb(dev);
-}
-
 static inline void count_errors(unsigned short status, struct net_device *dev)
 {
 	struct gfar_private *priv = netdev_priv(dev);
@@ -2754,7 +2749,7 @@ int gfar_clean_rx_ring(struct gfar_priv_rx_q *rx_queue, int rx_work_limit)
 		rmb();
 
 		/* Add another skb for the future */
-		newskb = gfar_new_skb(dev);
+		newskb = gfar_alloc_skb(dev);
 
 		skb = rx_queue->rx_skbuff[rx_queue->skb_currx];
 
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH 2/2] gianfar: Add a parameter to support allocation skb GFP_KERNEL and GFP_ATOMIC
From: Jianhua Xie @ 2013-02-01  9:12 UTC (permalink / raw)
  To: claudiu.manoil, davem; +Cc: Jianhua Xie, linuxppc-dev

While allocation skb in IRQ/SOFTIRQ, such as processing each frame
in the rx ring, alloc skb should be ATOMIC based, should not sleep.

When allocation skb is not in IRQ/SOFTIRQ, such as allocation skb
when initializing a net driver, starting up the NIC from stopped
status. In this case, it is not necessary to alloc memory base on
GFP_ATOMIC, should use GFP_KERNEL.

The second method also avoid kernel crash and reporting -ENOMEM
when free low memory is near vm.min_free_kbytes as below:

Signed-off-by: Jianhua Xie <jianhua.xie@freescale.com>
---
 drivers/net/ethernet/freescale/gianfar.c |   11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c
index 94d36ac..559a01c 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -110,7 +110,7 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev);
 static void gfar_reset_task(struct work_struct *work);
 static void gfar_timeout(struct net_device *dev);
 static int gfar_close(struct net_device *dev);
-static struct sk_buff *gfar_alloc_skb(struct net_device *dev);
+static struct sk_buff *gfar_alloc_skb(struct net_device *dev, gfp_t gfp_mask);
 static void gfar_new_rxbdp(struct gfar_priv_rx_q *rx_queue, struct rxbd8 *bdp,
 			   struct sk_buff *skb);
 static int gfar_set_mac_address(struct net_device *dev);
@@ -207,7 +207,7 @@ static int gfar_init_bds(struct net_device *ndev)
 				gfar_init_rxbdp(rx_queue, rxbdp,
 						rxbdp->bufPtr);
 			} else {
-				skb = gfar_alloc_skb(ndev);
+				skb = gfar_alloc_skb(ndev, GFP_KERNEL);
 				if (!skb) {
 					netdev_err(ndev, "Can't allocate RX buffers\n");
 					return -ENOMEM;
@@ -2598,12 +2598,13 @@ static void gfar_new_rxbdp(struct gfar_priv_rx_q *rx_queue, struct rxbd8 *bdp,
 	gfar_init_rxbdp(rx_queue, bdp, buf);
 }
 
-static struct sk_buff *gfar_alloc_skb(struct net_device *dev)
+static struct sk_buff *gfar_alloc_skb(struct net_device *dev, gfp_t gfp_mask)
 {
 	struct gfar_private *priv = netdev_priv(dev);
 	struct sk_buff *skb;
 
-	skb = netdev_alloc_skb(dev, priv->rx_buffer_size + RXBUF_ALIGNMENT);
+	skb = __netdev_alloc_skb(dev, priv->rx_buffer_size + RXBUF_ALIGNMENT,
+				gfp_mask);
 	if (!skb)
 		return NULL;
 
@@ -2749,7 +2750,7 @@ int gfar_clean_rx_ring(struct gfar_priv_rx_q *rx_queue, int rx_work_limit)
 		rmb();
 
 		/* Add another skb for the future */
-		newskb = gfar_alloc_skb(dev);
+		newskb = gfar_alloc_skb(dev, GFP_ATOMIC);
 
 		skb = rx_queue->rx_skbuff[rx_queue->skb_currx];
 
-- 
1.7.9.5

^ permalink raw reply related

* Re: [GIT PULL 00/25] perf/core improvements and fixes
From: Ingo Molnar @ 2013-02-01 10:18 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Arnaldo Carvalho de Melo, Robert Richter, Andi Kleen,
	Peter Zijlstra, Frederic Weisbecker, Namhyung Kim,
	Anton Blanchard, linux-kernel, Stephane Eranian, Pekka Enberg,
	linuxppc-dev, Paul Mackerras, Mike Galbraith, acme, David Ahern,
	Namhyung Kim, Sukadev Bhattiprolu, Jiri Olsa
In-Reply-To: <1359653128-10433-1-git-send-email-acme@infradead.org>


* Arnaldo Carvalho de Melo <acme@infradead.org> wrote:

> Hi Ingo,
> 
> 	Please consider pulling,
> 
> - Arnaldo
> 
> The following changes since commit 152fefa921535665f95840c08062844ab2f5593e:
> 
>   Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core (2013-01-31 10:20:14 +0100)
> 
> are available in the git repository at:
> 
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux tags/perf-core-for-mingo
> 
> for you to fetch changes up to 2ac3634a7e1c8eedc961030c87c5c36ebd5bbf8e:
> 
>   perf: Document the ABI of perf sysfs entries (2013-01-31 13:07:51 -0300)
> 
> ----------------------------------------------------------------
> perf/core improvements and fixes:
> 
> . Make some POWER7 events available in sysfs, equivalent to
>   what was done on x86, from Sukadev Bhattiprolu.
> 
> . Add event group view, from Namyung Kim:
> 
>   To use it, 'perf record' should group events when recording. And then perf
>   report parses the saved group relation from file header and prints them
>   together if --group option is provided.  You can use 'perf evlist' command to
>   see event group information:
> 
>     $ perf record -e '{ref-cycles,cycles}' noploop 1
>     [ perf record: Woken up 2 times to write data ]
>     [ perf record: Captured and wrote 0.385 MB perf.data (~16807 samples) ]
> 
>     $ perf evlist --group
>     {ref-cycles,cycles}
> 
>   With this example, default perf report will show you each event
>   separately like this:
> 
>     $ perf report
>     ...
>     # group: {ref-cycles,cycles}
>     # ========
>     # Samples: 3K of event 'ref-cycles'
>     # Event count (approx.): 3153797218
>     #
>     # Overhead  Command      Shared Object                      Symbol
>     # ........  .......  .................  ..........................
>         99.84%  noploop  noploop            [.] main
>          0.07%  noploop  ld-2.15.so         [.] strcmp
>          0.03%  noploop  [kernel.kallsyms]  [k] timerqueue_del
>          0.03%  noploop  [kernel.kallsyms]  [k] sched_clock_cpu
>          0.02%  noploop  [kernel.kallsyms]  [k] account_user_time
>          0.01%  noploop  [kernel.kallsyms]  [k] __alloc_pages_nodemask
>          0.00%  noploop  [kernel.kallsyms]  [k] native_write_msr_safe
> 
>     # Samples: 3K of event 'cycles'
>     # Event count (approx.): 3722310525
>     #
>     # Overhead  Command      Shared Object                     Symbol
>     # ........  .......  .................  .........................
>         99.76%  noploop  noploop            [.] main
>          0.11%  noploop  [kernel.kallsyms]  [k] _raw_spin_lock
>          0.06%  noploop  [kernel.kallsyms]  [k] find_get_page
>          0.03%  noploop  [kernel.kallsyms]  [k] sched_clock_cpu
>          0.02%  noploop  [kernel.kallsyms]  [k] rcu_check_callbacks
>          0.02%  noploop  [kernel.kallsyms]  [k] __current_kernel_time
>          0.00%  noploop  [kernel.kallsyms]  [k] native_write_msr_safe
> 
>   In this case the event group information will be shown in the end of
>   header area.  So you can use --group option to enable event group view.
> 
>     $ perf report --group
>     ...
>     # group: {ref-cycles,cycles}
>     # ========
>     # Samples: 7K of event 'anon group { ref-cycles, cycles }'
>     # Event count (approx.): 6876107743
>     #
>     #         Overhead  Command      Shared Object                      Symbol
>     # ................  .......  .................  ..........................
>         99.84%  99.76%  noploop  noploop            [.] main
>          0.07%   0.00%  noploop  ld-2.15.so         [.] strcmp
>          0.03%   0.00%  noploop  [kernel.kallsyms]  [k] timerqueue_del
>          0.03%   0.03%  noploop  [kernel.kallsyms]  [k] sched_clock_cpu
>          0.02%   0.00%  noploop  [kernel.kallsyms]  [k] account_user_time
>          0.01%   0.00%  noploop  [kernel.kallsyms]  [k] __alloc_pages_nodemask
>          0.00%   0.00%  noploop  [kernel.kallsyms]  [k] native_write_msr_safe
>          0.00%   0.11%  noploop  [kernel.kallsyms]  [k] _raw_spin_lock
>          0.00%   0.06%  noploop  [kernel.kallsyms]  [k] find_get_page
>          0.00%   0.02%  noploop  [kernel.kallsyms]  [k] rcu_check_callbacks
>          0.00%   0.02%  noploop  [kernel.kallsyms]  [k] __current_kernel_time
> 
>   As you can see the Overhead column now contains both of ref-cycles and
>   cycles and header line shows group information also - 'anon group {
>   ref-cycles, cycles }'.  The output is sorted by period of group leader
>   first.
> 
>   If perf.data file doesn't contain group information, this --group
>   option does nothing.  So if you want enable event group view by
>   default you can set it in ~/.perfconfig file:
> 
>     $ cat ~/.perfconfig
>     [report]
>     group = true
> 
>   It can be overridden with command line if you want:
> 
>     $ perf report --no-group
> 
> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> 
> ----------------------------------------------------------------
> Arnaldo Carvalho de Melo (2):
>       perf top: Stop using exit()
>       perf top: Delete maps on exit
> 
> Namhyung Kim (18):
>       perf tools: Keep group information
>       perf tests: Add group test conditions
>       perf header: Add HEADER_GROUP_DESC feature
>       perf report: Make another loop for linking group hists
>       perf hists: Resort hist entries using group members for output
>       perf ui/hist: Consolidate hpp helpers
>       perf hists browser: Convert hpp helpers to a function
>       perf gtk/browser: Convert hpp helpers to a function
>       perf ui/hist: Add support for event group view
>       perf hists browser: Move coloring logic to hpp functions
>       perf hists browser: Add suppport for event group view
>       perf gtk/browser: Add support for event group view
>       perf gtk/browser: Trim column header string when event group enabled
>       perf report: Bypass non-leader events when event group is enabled
>       perf report: Show group description when event group is enabled
>       perf report: Add --group option
>       perf report: Add report.group config option
>       perf evlist: Add --group option
> 
> Sukadev Bhattiprolu (5):
>       perf/Power7: Use macros to identify perf events
>       perf: Make EVENT_ATTR global
>       perf/POWER7: Make generic event translations available in sysfs
>       perf/POWER7: Make some POWER7 events available in sysfs
>       perf: Document the ABI of perf sysfs entries
> 
>  .../testing/sysfs-bus-event_source-devices-events  |  62 +++++
>  arch/powerpc/include/asm/perf_event_server.h       |  26 ++
>  arch/powerpc/perf/core-book3s.c                    |  12 +
>  arch/powerpc/perf/power7-pmu.c                     |  80 +++++-
>  arch/x86/kernel/cpu/perf_event.c                   |  13 +-
>  include/linux/perf_event.h                         |  11 +
>  tools/perf/Documentation/perf-evlist.txt           |   4 +
>  tools/perf/Documentation/perf-report.txt           |   3 +
>  tools/perf/builtin-evlist.c                        |   7 +
>  tools/perf/builtin-record.c                        |   3 +
>  tools/perf/builtin-report.c                        |  47 +++-
>  tools/perf/builtin-top.c                           |  62 +++--
>  tools/perf/tests/parse-events.c                    |  28 ++
>  tools/perf/ui/browsers/hists.c                     | 217 ++++++++++++---
>  tools/perf/ui/gtk/hists.c                          | 130 +++++++--
>  tools/perf/ui/hist.c                               | 306 ++++++++++-----------
>  tools/perf/ui/stdio/hist.c                         |   2 +
>  tools/perf/util/evlist.c                           |   7 +-
>  tools/perf/util/evlist.h                           |   1 +
>  tools/perf/util/evsel.c                            |  49 +++-
>  tools/perf/util/evsel.h                            |  16 ++
>  tools/perf/util/header.c                           | 164 +++++++++++
>  tools/perf/util/header.h                           |   2 +
>  tools/perf/util/hist.c                             |  59 +++-
>  tools/perf/util/parse-events.c                     |   1 +
>  tools/perf/util/parse-events.h                     |   1 +
>  tools/perf/util/parse-events.y                     |  10 +
>  tools/perf/util/symbol.h                           |   3 +-
>  28 files changed, 1059 insertions(+), 267 deletions(-)
>  create mode 100644 Documentation/ABI/testing/sysfs-bus-event_source-devices-events

Pulled, thanks a lot Arnaldo!

	Ingo

^ permalink raw reply

* Re: [PATCH 22/25] perf: Make EVENT_ATTR global
From: Ingo Molnar @ 2013-02-01 10:26 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Stephane Eranian
  Cc: Andi Kleen, Peter Zijlstra, Robert Richter, Anton Blanchard,
	linux-kernel, Stephane Eranian, Arnaldo Carvalho de Melo,
	linuxppc-dev, Ingo Molnar, Paul Mackerras, Sukadev Bhattiprolu,
	Jiri Olsa
In-Reply-To: <1359653128-10433-23-git-send-email-acme@infradead.org>


* Arnaldo Carvalho de Melo <acme@infradead.org> wrote:

> From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
> 
> Rename EVENT_ATTR() to PMU_EVENT_ATTR() and make it global so it is
> available to all architectures.
> 
> Further to allow architectures flexibility, have PMU_EVENT_ATTR() pass
> in the variable name as a parameter.
> 
> Changelog[v2]
> 	- [Jiri Olsa] No need to define PMU_EVENT_PTR()
> 
> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
> Acked-by: Jiri Olsa <jolsa@redhat.com>
> Cc: Andi Kleen <ak@linux.intel.com>
> Cc: Anton Blanchard <anton@au1.ibm.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Jiri Olsa <jolsa@redhat.com>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Robert Richter <robert.richter@amd.com>
> Cc: Stephane Eranian <eranian@google.com>
> Cc: linuxppc-dev@ozlabs.org
> Link: http://lkml.kernel.org/r/20130123062422.GC13720@us.ibm.com
> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> ---
>  arch/x86/kernel/cpu/perf_event.c | 13 +++----------
>  include/linux/perf_event.h       | 11 +++++++++++
>  2 files changed, 14 insertions(+), 10 deletions(-)

so this one started conflicting non-trivially with tip:perf/x86 
- the pending memory profiling kernel-side bits.

Can we merge the memory profiling tooling side bits together 
with the kernel side bits - or does it need more work?

For now I've excluded perf/x86 from tip:master until this is 
resolved.

Thanks,

	Ingo

^ permalink raw reply

* [RFC PATCH 0/5] powerpc: Support context tracking for Power pSeries
From: Li Zhong @ 2013-02-01 10:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: Li Zhong, fweisbec, paulus, paulmck, linuxppc-dev

These patches try to support context tracking for Power arch, beginning with 
64-bit pSeries. The codes are ported from that of the x86_64, and in each
patch, I listed the corresponding patch for x86.

Would you please help review and give your comments?

Thanks, Zhong

Li Zhong (5):
  powerpc: Syscall hooks for context tracking subsystem
  powerpc: Exception hooks for context tracking subsystem
  powerpc: Exit user context on notify resume
  powerpc: Use the new schedule_user API on userspace preemption
  powerpc: select HAVE_CONTEXT_TRACKING for pSeries

 arch/powerpc/include/asm/context_tracking.h |   31 +++++++++++
 arch/powerpc/include/asm/thread_info.h      |    7 ++-
 arch/powerpc/kernel/entry_64.S              |    3 +-
 arch/powerpc/kernel/exceptions-64s.S        |    4 +-
 arch/powerpc/kernel/ptrace.c                |    5 ++
 arch/powerpc/kernel/signal.c                |    5 ++
 arch/powerpc/kernel/traps.c                 |   79 ++++++++++++++++++++-------
 arch/powerpc/mm/fault.c                     |   15 ++++-
 arch/powerpc/mm/hash_utils_64.c             |   17 ++++++
 arch/powerpc/platforms/pseries/Kconfig      |    1 +
 10 files changed, 141 insertions(+), 26 deletions(-)
 create mode 100644 arch/powerpc/include/asm/context_tracking.h

-- 
1.7.9.5

^ permalink raw reply

* [RFC PATCH 1/5] powerpc: Syscall hooks for context tracking subsystem
From: Li Zhong @ 2013-02-01 10:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: Li Zhong, fweisbec, paulus, paulmck, linuxppc-dev
In-Reply-To: <1359714465-6297-1-git-send-email-zhong@linux.vnet.ibm.com>

This is the syscall slow path hooks for context tracking subsystem,
corresponding to
[PATCH] x86: Syscall hooks for userspace RCU extended QS
  commit bf5a3c13b939813d28ce26c01425054c740d6731

TIF_MEMDIE is moved to the second 16-bits (with value 17), as it seems there
is no asm code using it. TIF_NOHZ is added to _TIF_SYCALL_T_OR_A, so it is
better for it to be in the same 16 bits with others in the group, so in the
asm code, andi. with this group could work.

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/thread_info.h |    7 +++++--
 arch/powerpc/kernel/ptrace.c           |    5 +++++
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h
index 406b7b9..414a261 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -97,7 +97,7 @@ static inline struct thread_info *current_thread_info(void)
 #define TIF_PERFMON_CTXSW	6	/* perfmon needs ctxsw calls */
 #define TIF_SYSCALL_AUDIT	7	/* syscall auditing active */
 #define TIF_SINGLESTEP		8	/* singlestepping active */
-#define TIF_MEMDIE		9	/* is terminating due to OOM killer */
+#define TIF_NOHZ		9	/* in adaptive nohz mode */
 #define TIF_SECCOMP		10	/* secure computing */
 #define TIF_RESTOREALL		11	/* Restore all regs (implies NOERROR) */
 #define TIF_NOERROR		12	/* Force successful syscall return */
@@ -106,6 +106,7 @@ static inline struct thread_info *current_thread_info(void)
 #define TIF_SYSCALL_TRACEPOINT	15	/* syscall tracepoint instrumentation */
 #define TIF_EMULATE_STACK_STORE	16	/* Is an instruction emulation
 						for stack store? */
+#define TIF_MEMDIE		17	/* is terminating due to OOM killer */
 
 /* as above, but as bit values */
 #define _TIF_SYSCALL_TRACE	(1<<TIF_SYSCALL_TRACE)
@@ -124,8 +125,10 @@ static inline struct thread_info *current_thread_info(void)
 #define _TIF_UPROBE		(1<<TIF_UPROBE)
 #define _TIF_SYSCALL_TRACEPOINT	(1<<TIF_SYSCALL_TRACEPOINT)
 #define _TIF_EMULATE_STACK_STORE	(1<<TIF_EMULATE_STACK_STORE)
+#define _TIF_NOHZ		(1<<TIF_NOHZ)
 #define _TIF_SYSCALL_T_OR_A	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
-				 _TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT)
+				 _TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT | \
+				 _TIF_NOHZ)
 
 #define _TIF_USER_WORK_MASK	(_TIF_SIGPENDING | _TIF_NEED_RESCHED | \
 				 _TIF_NOTIFY_RESUME | _TIF_UPROBE)
diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index c497000..62238dd 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -32,6 +32,7 @@
 #include <trace/syscall.h>
 #include <linux/hw_breakpoint.h>
 #include <linux/perf_event.h>
+#include <linux/context_tracking.h>
 
 #include <asm/uaccess.h>
 #include <asm/page.h>
@@ -1745,6 +1746,8 @@ long do_syscall_trace_enter(struct pt_regs *regs)
 {
 	long ret = 0;
 
+	user_exit();
+
 	secure_computing_strict(regs->gpr[0]);
 
 	if (test_thread_flag(TIF_SYSCALL_TRACE) &&
@@ -1789,4 +1792,6 @@ void do_syscall_trace_leave(struct pt_regs *regs)
 	step = test_thread_flag(TIF_SINGLESTEP);
 	if (step || test_thread_flag(TIF_SYSCALL_TRACE))
 		tracehook_report_syscall_exit(regs, step);
+
+	user_enter();
 }
-- 
1.7.9.5

^ permalink raw reply related

* [RFC PATCH 2/5] powerpc: Exception hooks for context tracking subsystem
From: Li Zhong @ 2013-02-01 10:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: Li Zhong, fweisbec, paulus, paulmck, linuxppc-dev
In-Reply-To: <1359714465-6297-1-git-send-email-zhong@linux.vnet.ibm.com>

This is the exception hooks for context tracking subsystem, including
data access, program check, single step, instruction breakpoint, machine check,
alignment, fp unavailable, altivec assist, unknown exception, whose handlers
might use RCU.

This patch corresponds to
[PATCH] x86: Exception hooks for userspace RCU extended QS
  commit 6ba3c97a38803883c2eee489505796cb0a727122

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/context_tracking.h |   20 +++++++
 arch/powerpc/kernel/exceptions-64s.S        |    4 +-
 arch/powerpc/kernel/traps.c                 |   79 ++++++++++++++++++++-------
 arch/powerpc/mm/fault.c                     |   15 ++++-
 arch/powerpc/mm/hash_utils_64.c             |   17 ++++++
 5 files changed, 112 insertions(+), 23 deletions(-)
 create mode 100644 arch/powerpc/include/asm/context_tracking.h

diff --git a/arch/powerpc/include/asm/context_tracking.h b/arch/powerpc/include/asm/context_tracking.h
new file mode 100644
index 0000000..3adccd8
--- /dev/null
+++ b/arch/powerpc/include/asm/context_tracking.h
@@ -0,0 +1,20 @@
+#ifndef _ASM_POWERPC_CONTEXT_TRACKING_H
+#define _ASM_POWERPC_CONTEXT_TRACKING_H
+
+#include <linux/context_tracking.h>
+#include <asm/ptrace.h>
+
+static inline void exception_enter(struct pt_regs *regs)
+{
+	user_exit();
+}
+
+static inline void exception_exit(struct pt_regs *regs)
+{
+#ifdef CONFIG_CONTEXT_TRACKING
+	if (user_mode(regs))
+		user_enter();
+#endif
+}
+
+#endif
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 4665e82..b877cf2 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1184,15 +1184,17 @@ END_MMU_FTR_SECTION_IFCLR(MMU_FTR_SLB)
 	rlwimi	r4,r0,32-13,30,30	/* becomes _PAGE_USER access bit */
 	ori	r4,r4,1			/* add _PAGE_PRESENT */
 	rlwimi	r4,r5,22+2,31-2,31-2	/* Set _PAGE_EXEC if trap is 0x400 */
+	addi	r6,r1,STACK_FRAME_OVERHEAD
 
 	/*
 	 * r3 contains the faulting address
 	 * r4 contains the required access permissions
 	 * r5 contains the trap number
+	 * r6 contains the address of pt_regs
 	 *
 	 * at return r3 = 0 for success, 1 for page fault, negative for error
 	 */
-	bl	.hash_page		/* build HPTE if possible */
+	bl	.hash_page_ct		/* build HPTE if possible */
 	cmpdi	r3,0			/* see if hash_page succeeded */
 
 	/* Success */
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 3251840..d7c0414 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -59,6 +59,7 @@
 #include <asm/fadump.h>
 #include <asm/switch_to.h>
 #include <asm/debug.h>
+#include <asm/context_tracking.h>
 
 #if defined(CONFIG_DEBUGGER) || defined(CONFIG_KEXEC)
 int (*__debugger)(struct pt_regs *regs) __read_mostly;
@@ -660,6 +661,8 @@ void machine_check_exception(struct pt_regs *regs)
 {
 	int recover = 0;
 
+	exception_enter(regs);
+
 	__get_cpu_var(irq_stat).mce_exceptions++;
 
 	/* See if any machine dependent calls. In theory, we would want
@@ -674,7 +677,7 @@ void machine_check_exception(struct pt_regs *regs)
 		recover = cur_cpu_spec->machine_check(regs);
 
 	if (recover > 0)
-		return;
+		goto exit;
 
 #if defined(CONFIG_8xx) && defined(CONFIG_PCI)
 	/* the qspan pci read routines can cause machine checks -- Cort
@@ -684,20 +687,23 @@ void machine_check_exception(struct pt_regs *regs)
 	 * -- BenH
 	 */
 	bad_page_fault(regs, regs->dar, SIGBUS);
-	return;
+	goto exit;
 #endif
 
 	if (debugger_fault_handler(regs))
-		return;
+		goto exit;
 
 	if (check_io_access(regs))
-		return;
+		goto exit;
 
 	die("Machine check", regs, SIGBUS);
 
 	/* Must die if the interrupt is not recoverable */
 	if (!(regs->msr & MSR_RI))
 		panic("Unrecoverable Machine check");
+
+exit:
+	exception_exit(regs);
 }
 
 void SMIException(struct pt_regs *regs)
@@ -707,20 +713,29 @@ void SMIException(struct pt_regs *regs)
 
 void unknown_exception(struct pt_regs *regs)
 {
+	exception_enter(regs);
+
 	printk("Bad trap at PC: %lx, SR: %lx, vector=%lx\n",
 	       regs->nip, regs->msr, regs->trap);
 
 	_exception(SIGTRAP, regs, 0, 0);
+
+	exception_exit(regs);
 }
 
 void instruction_breakpoint_exception(struct pt_regs *regs)
 {
+	exception_enter(regs);
+
 	if (notify_die(DIE_IABR_MATCH, "iabr_match", regs, 5,
 					5, SIGTRAP) == NOTIFY_STOP)
-		return;
+		goto exit;
 	if (debugger_iabr_match(regs))
-		return;
+		goto exit;
 	_exception(SIGTRAP, regs, TRAP_BRKPT, regs->nip);
+
+exit:
+	exception_exit(regs);
 }
 
 void RunModeException(struct pt_regs *regs)
@@ -730,15 +745,20 @@ void RunModeException(struct pt_regs *regs)
 
 void __kprobes single_step_exception(struct pt_regs *regs)
 {
+	exception_enter(regs);
+
 	clear_single_step(regs);
 
 	if (notify_die(DIE_SSTEP, "single_step", regs, 5,
 					5, SIGTRAP) == NOTIFY_STOP)
-		return;
+		goto exit;
 	if (debugger_sstep(regs))
-		return;
+		goto exit;
 
 	_exception(SIGTRAP, regs, TRAP_TRACE, regs->nip);
+
+exit:
+	exception_exit(regs);
 }
 
 /*
@@ -993,32 +1013,34 @@ void __kprobes program_check_exception(struct pt_regs *regs)
 	unsigned int reason = get_reason(regs);
 	extern int do_mathemu(struct pt_regs *regs);
 
+	exception_enter(regs);
+
 	/* We can now get here via a FP Unavailable exception if the core
 	 * has no FPU, in that case the reason flags will be 0 */
 
 	if (reason & REASON_FP) {
 		/* IEEE FP exception */
 		parse_fpe(regs);
-		return;
+		goto exit;
 	}
 	if (reason & REASON_TRAP) {
 		/* Debugger is first in line to stop recursive faults in
 		 * rcu_lock, notify_die, or atomic_notifier_call_chain */
 		if (debugger_bpt(regs))
-			return;
+			goto exit;
 
 		/* trap exception */
 		if (notify_die(DIE_BPT, "breakpoint", regs, 5, 5, SIGTRAP)
 				== NOTIFY_STOP)
-			return;
+			goto exit;
 
 		if (!(regs->msr & MSR_PR) &&  /* not user-mode */
 		    report_bug(regs->nip, regs) == BUG_TRAP_TYPE_WARN) {
 			regs->nip += 4;
-			return;
+			goto exit;
 		}
 		_exception(SIGTRAP, regs, TRAP_BRKPT, regs->nip);
-		return;
+		goto exit;
 	}
 
 	/* We restore the interrupt state now */
@@ -1036,16 +1058,16 @@ void __kprobes program_check_exception(struct pt_regs *regs)
 	switch (do_mathemu(regs)) {
 	case 0:
 		emulate_single_step(regs);
-		return;
+		goto exit;
 	case 1: {
 			int code = 0;
 			code = __parse_fpscr(current->thread.fpscr.val);
 			_exception(SIGFPE, regs, code, regs->nip);
-			return;
+			goto exit;
 		}
 	case -EFAULT:
 		_exception(SIGSEGV, regs, SEGV_MAPERR, regs->nip);
-		return;
+		goto exit;
 	}
 	/* fall through on any other errors */
 #endif /* CONFIG_MATH_EMULATION */
@@ -1056,10 +1078,10 @@ void __kprobes program_check_exception(struct pt_regs *regs)
 		case 0:
 			regs->nip += 4;
 			emulate_single_step(regs);
-			return;
+			goto exit;
 		case -EFAULT:
 			_exception(SIGSEGV, regs, SEGV_MAPERR, regs->nip);
-			return;
+			goto exit;
 		}
 	}
 
@@ -1067,12 +1089,17 @@ void __kprobes program_check_exception(struct pt_regs *regs)
 		_exception(SIGILL, regs, ILL_PRVOPC, regs->nip);
 	else
 		_exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
+
+exit:
+	exception_exit(regs);
 }
 
 void alignment_exception(struct pt_regs *regs)
 {
 	int sig, code, fixed = 0;
 
+	exception_enter(regs);
+
 	/* We restore the interrupt state now */
 	if (!arch_irq_disabled_regs(regs))
 		local_irq_enable();
@@ -1084,7 +1111,7 @@ void alignment_exception(struct pt_regs *regs)
 	if (fixed == 1) {
 		regs->nip += 4;	/* skip over emulated instruction */
 		emulate_single_step(regs);
-		return;
+		goto exit;
 	}
 
 	/* Operand address was bad */
@@ -1099,6 +1126,9 @@ void alignment_exception(struct pt_regs *regs)
 		_exception(sig, regs, code, regs->dar);
 	else
 		bad_page_fault(regs, regs->dar, sig);
+
+exit:
+	exception_exit(regs);
 }
 
 void StackOverflow(struct pt_regs *regs)
@@ -1127,23 +1157,32 @@ void trace_syscall(struct pt_regs *regs)
 
 void kernel_fp_unavailable_exception(struct pt_regs *regs)
 {
+	exception_enter(regs);
+
 	printk(KERN_EMERG "Unrecoverable FP Unavailable Exception "
 			  "%lx at %lx\n", regs->trap, regs->nip);
 	die("Unrecoverable FP Unavailable Exception", regs, SIGABRT);
+
+	exception_exit(regs);
 }
 
 void altivec_unavailable_exception(struct pt_regs *regs)
 {
+	exception_enter(regs);
+
 	if (user_mode(regs)) {
 		/* A user program has executed an altivec instruction,
 		   but this kernel doesn't support altivec. */
 		_exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
-		return;
+		goto exit;
 	}
 
 	printk(KERN_EMERG "Unrecoverable VMX/Altivec Unavailable Exception "
 			"%lx at %lx\n", regs->trap, regs->nip);
 	die("Unrecoverable VMX/Altivec Unavailable Exception", regs, SIGABRT);
+
+exit:
+	exception_exit(regs);
 }
 
 void vsx_unavailable_exception(struct pt_regs *regs)
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 3a8489a..b1b9542 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -42,6 +42,7 @@
 #include <asm/tlbflush.h>
 #include <asm/siginfo.h>
 #include <asm/debug.h>
+#include <asm/context_tracking.h>
 #include <mm/mmu_decl.h>
 
 #include "icswx.h"
@@ -193,8 +194,8 @@ static int mm_fault_error(struct pt_regs *regs, unsigned long addr, int fault)
  * The return value is 0 if the fault was handled, or the signal
  * number if this is a kernel fault that can't be handled here.
  */
-int __kprobes do_page_fault(struct pt_regs *regs, unsigned long address,
-			    unsigned long error_code)
+static int __kprobes __do_page_fault(struct pt_regs *regs,
+				unsigned long address, unsigned long error_code)
 {
 	struct vm_area_struct * vma;
 	struct mm_struct *mm = current->mm;
@@ -475,6 +476,16 @@ bad_area_nosemaphore:
 
 }
 
+int __kprobes do_page_fault(struct pt_regs *regs, unsigned long address,
+			    unsigned long error_code)
+{
+	int ret;
+	exception_enter(regs);
+	ret = __do_page_fault(regs, address, error_code);
+	exception_exit(regs);
+	return ret;
+}
+
 /*
  * bad_page_fault is called when we have a bad access from the kernel.
  * It is called from the DSI and ISI handlers in head.S and from some
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 3a292be..447e5a7 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -55,6 +55,7 @@
 #include <asm/code-patching.h>
 #include <asm/fadump.h>
 #include <asm/firmware.h>
+#include <asm/context_tracking.h>
 
 #ifdef DEBUG
 #define DBG(fmt...) udbg_printf(fmt)
@@ -1083,6 +1084,18 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
 }
 EXPORT_SYMBOL_GPL(hash_page);
 
+int hash_page_ct(unsigned long ea, unsigned long access,
+		 unsigned long trap, struct pt_regs *regs)
+{
+	int ret;
+
+	exception_enter(regs);
+	ret = hash_page(ea, access, trap);
+	exception_exit(regs);
+
+	return ret;
+}
+
 void hash_preload(struct mm_struct *mm, unsigned long ea,
 		  unsigned long access, unsigned long trap)
 {
@@ -1194,6 +1207,8 @@ void flush_hash_range(unsigned long number, int local)
  */
 void low_hash_fault(struct pt_regs *regs, unsigned long address, int rc)
 {
+	exception_enter(regs);
+
 	if (user_mode(regs)) {
 #ifdef CONFIG_PPC_SUBPAGE_PROT
 		if (rc == -2)
@@ -1203,6 +1218,8 @@ void low_hash_fault(struct pt_regs *regs, unsigned long address, int rc)
 			_exception(SIGBUS, regs, BUS_ADRERR, address);
 	} else
 		bad_page_fault(regs, address, SIGBUS);
+
+	exception_exit(regs);
 }
 
 #ifdef CONFIG_DEBUG_PAGEALLOC
-- 
1.7.9.5

^ permalink raw reply related

* [RFC PATCH 3/5] powerpc: Exit user context on notify resume
From: Li Zhong @ 2013-02-01 10:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: Li Zhong, fweisbec, paulus, paulmck, linuxppc-dev
In-Reply-To: <1359714465-6297-1-git-send-email-zhong@linux.vnet.ibm.com>

This patch allows RCU usage in do_notify_resume, e.g. signal handling.
It corresponds to
[PATCH] x86: Exit RCU extended QS on notify resume
  commit edf55fda35c7dc7f2d9241c3abaddaf759b457c6

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/signal.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index 3b99711..c1eaea2 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -13,6 +13,7 @@
 #include <linux/signal.h>
 #include <linux/uprobes.h>
 #include <linux/key.h>
+#include <linux/context_tracking.h>
 #include <asm/hw_breakpoint.h>
 #include <asm/uaccess.h>
 #include <asm/unistd.h>
@@ -158,6 +159,8 @@ static int do_signal(struct pt_regs *regs)
 
 void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags)
 {
+	user_exit();
+
 	if (thread_info_flags & _TIF_UPROBE)
 		uprobe_notify_resume(regs);
 
@@ -168,6 +171,8 @@ void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags)
 		clear_thread_flag(TIF_NOTIFY_RESUME);
 		tracehook_notify_resume(regs);
 	}
+
+	user_enter();
 }
 
 long sys_sigaltstack(const stack_t __user *uss, stack_t __user *uoss,
-- 
1.7.9.5

^ permalink raw reply related

* [RFC PATCH 4/5] powerpc: Use the new schedule_user API on userspace preemption
From: Li Zhong @ 2013-02-01 10:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: Li Zhong, fweisbec, paulus, paulmck, linuxppc-dev
In-Reply-To: <1359714465-6297-1-git-send-email-zhong@linux.vnet.ibm.com>

This patch corresponds to
[PATCH] x86: Use the new schedule_user API on userspace preemption
  commit 0430499ce9d78691f3985962021b16bf8f8a8048

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/context_tracking.h |   11 +++++++++++
 arch/powerpc/kernel/entry_64.S              |    3 ++-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/context_tracking.h b/arch/powerpc/include/asm/context_tracking.h
index 3adccd8..2e042ba 100644
--- a/arch/powerpc/include/asm/context_tracking.h
+++ b/arch/powerpc/include/asm/context_tracking.h
@@ -1,6 +1,7 @@
 #ifndef _ASM_POWERPC_CONTEXT_TRACKING_H
 #define _ASM_POWERPC_CONTEXT_TRACKING_H
 
+#ifndef __ASSEMBLY__
 #include <linux/context_tracking.h>
 #include <asm/ptrace.h>
 
@@ -17,4 +18,14 @@ static inline void exception_exit(struct pt_regs *regs)
 #endif
 }
 
+#else /* __ASSEMBLY__ */
+
+#ifdef CONFIG_CONTEXT_TRACKING
+#define SCHEDULE_USER bl	.schedule_user
+#else
+#define SCHEDULE_USER bl	.schedule
+#endif
+
+#endif /* !__ASSEMBLY__ */
+
 #endif
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 3d990d3..91f09ec 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -33,6 +33,7 @@
 #include <asm/irqflags.h>
 #include <asm/ftrace.h>
 #include <asm/hw_irq.h>
+#include <asm/context_tracking.h>
 
 /*
  * System calls.
@@ -595,7 +596,7 @@ _GLOBAL(ret_from_except_lite)
 	andi.	r0,r4,_TIF_NEED_RESCHED
 	beq	1f
 	bl	.restore_interrupts
-	bl	.schedule
+	SCHEDULE_USER
 	b	.ret_from_except_lite
 
 1:	bl	.save_nvgprs
-- 
1.7.9.5

^ permalink raw reply related

* [RFC PATCH 5/5] powerpc: select HAVE_CONTEXT_TRACKING for pSeries
From: Li Zhong @ 2013-02-01 10:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: Li Zhong, fweisbec, paulus, paulmck, linuxppc-dev
In-Reply-To: <1359714465-6297-1-git-send-email-zhong@linux.vnet.ibm.com>

Start context tracking support from pSeries.

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/Kconfig |    1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig
index 837cf49..a9570fe 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -17,6 +17,7 @@ config PPC_PSERIES
 	select PPC_NATIVE
 	select PPC_PCI_CHOICE if EXPERT
 	select ZLIB_DEFLATE
+	select HAVE_CONTEXT_TRACKING
 	default y
 
 config PPC_SPLPAR
-- 
1.7.9.5

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox