LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re[2]: PS3 platform is broken on Linux 3.7.0
From: Phileas Fogg @ 2013-02-10 11:45 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1355954017.5397.53.camel@pasglop>

Cj5PbiBGcmksIDIwMTItMTItMTQgYXQgMTY6MzUgKzA0MDAsIFBoaWxlYXMgRm9nZyB3cm90ZToK
Pj4gSGksCj4+IAo+PiBJIHdhbnRlZCB0byBicmluZyB0byB5b3VyIGF0dGVudGlvbiB0aGUgZmFj
dCB0aGF0IHRoZSBQUzMgcGxhdGZvcm0gaXMgYnJva2VuIG9uIExpbnV4IDMuNy4wLgo+PiAKPj4g
aSdtIG5vdCBhYmxlIHRvIGJvb3QgTGludXggMy43LjAgb24gbXkgUFMzIHNsaW0uIExpbnV4IDMu
Ni4xMCBib290cyBqdXN0IGZpbmUgYnV0IG5vdCAzLjcuMAo+PiBXaGVuIGkgdHJ5IHRvIGJvb3Qg
TGludXggMy43LjAgdGhlbiBteSBQUzMgIHNodXRzIGRvd24uCj4+IAo+PiBTbyBpIGNsb25lZCB0
aGUgTGludXggcG93ZXJwYyBHSVQgcmVwb3NpdG9yeSBhbmQgdHJpZWQgdG8gZmluZCBvdXQgd2hp
Y2ggY29tbWl0cyBicm9rZSB0aGUgUFMzIHBsYXRmb3JtLgo+PiBBZnRlciBzb21lIHRpbWUgSSB0
cmFja2VkIGl0IGRvd24gdG8gMiBjb21taXRzOgo+Cj5BbmVlc2gsIGRvIHlvdSBoYXZlIGFueSBp
ZGVhIHdoYXQgbWlnaHQgYmUgZ29pbmcgb24gdGhlcmUgPyBDYW4geW91IGxvb2sKPmF0IHRoZSBQ
UzMgaGFzaCBjb2RlID8gSXQncyBhIGJpdCBkaWZmZXJlbnQgZnJvbSB0aGUgcmVzdCwgeW91IG1p
Z2h0Cj5oYXZlIG1pc3NlZCBhbiB1cGRhdGUgb3IgdHdvLi4uCj4KPk1pY2hhZWwsIHNhbWUgZGVh
bCB3aXRoIFBBQ0EuLi4KPgo+Q2hlZXJzLAo+QmVuLgoKCkkgZGVidWdnZWQgdGhlIGlzc3VlIHdp
dGggdGhlIHBhbmljIG9uIFBBQ0EgYWNjZXNzIG9uIFBTMyBhcmNoIGFuZCBmb3VuZCBvdXQgdGhh
dCBpdCBwYW5pY3MgaW4KCmFyY2gvcG93ZXJwYy9rZXJuZWwvc2V0dXBfNjQuYyAtPiBlYXJseV9z
ZXR1cCAtPiB1ZGJnX2Vhcmx5X2luaXQgLT4gcmVnaXN0ZXJfZWFybHlfdWRiZ19jb25zb2xlIC0+
IGNvbnNvbGVfbG9jayAtPiBkb3duIC0+IHJhd19zcGluX3VubG9ja19pcnFyZXN0b3JlCgpJdCBw
YW5pY3Mgb25seSBpZiBpIGVuYWJsZSBsb2NrIGRlYnVnZ2luZyBpbiBrZXJuZWwuCgpJIHN1Z2dl
c3QgdGhlIGZvbGxvd2luZyBwYXRjaCB0byBmaXggdGhlIGlzc3VlOgoKLS0tIGFyY2gvcG93ZXJw
Yy9rZXJuZWwvc2V0dXBfNjQuYy5vbGQJMjAxMy0wMi0xMCAxMzozOTo0NS4xNDcxMzE1NDcgKzAx
MDAKKysrIGFyY2gvcG93ZXJwYy9rZXJuZWwvc2V0dXBfNjQuYwkyMDEzLTAyLTEwIDEzOjQwOjUx
LjY5NzEzNTQxOSArMDEwMApAQCAtMTg2LDYgKzE4Niw5IEBACiAJaW5pdGlhbGlzZV9wYWNhKCZi
b290X3BhY2EsIDApOwogCXNldHVwX3BhY2EoJmJvb3RfcGFjYSk7CiAKKwkvKiBBbGxvdyBwZXJj
cHUgYWNjZXNzZXMgdG8gIndvcmsiIHVudGlsIHdlIHNldHVwIHBlcmNwdSBkYXRhICovCisJZ2V0
X3BhY2EoKS0+ZGF0YV9vZmZzZXQgPSAwOworCiAJLyogSW5pdGlhbGl6ZSBsb2NrZGVwIGVhcmx5
IG9yIGVsc2Ugc3BpbmxvY2tzIHdpbGwgYmxvdyAqLwogCWxvY2tkZXBfaW5pdCgpOwogCkBAIC0y
MDgsOCArMjExLDYgQEAKIAogCS8qIEZpeCB1cCBwYWNhIGZpZWxkcyByZXF1aXJlZCBmb3IgdGhl
IGJvb3QgY3B1ICovCiAJZ2V0X3BhY2EoKS0+Y3B1X3N0YXJ0ID0gMTsKLQkvKiBBbGxvdyBwZXJj
cHUgYWNjZXNzZXMgdG8gIndvcmsiIHVudGlsIHdlIHNldHVwIHBlcmNwdSBkYXRhICovCi0JZ2V0
X3BhY2EoKS0+ZGF0YV9vZmZzZXQgPSAwOwogCiAJLyogUHJvYmUgdGhlIG1hY2hpbmUgdHlwZSAq
LwogCXByb2JlX21hY2hpbmUoKTsKCgoKCg==

^ permalink raw reply

* Re: [RFC PATCH 1/5] powerpc: Syscall hooks for context tracking subsystem
From: Frederic Weisbecker @ 2013-02-10 10:32 UTC (permalink / raw)
  To: Li Zhong; +Cc: paulmck, linuxppc-dev, linux-kernel, paulus
In-Reply-To: <1359714465-6297-2-git-send-email-zhong@linux.vnet.ibm.com>

2013/2/1 Li Zhong <zhong@linux.vnet.ibm.com>:
> This is the syscall slow path hooks for context tracking subsystem,
> corresponding to
> [PATCH] x86: Syscall hooks for userspace RCU extended QS
>   commit bf5a3c13b939813d28ce26c01425054c740d6731
>
> TIF_MEMDIE is moved to the second 16-bits (with value 17), as it seems there
> is no asm code using it. TIF_NOHZ is added to _TIF_SYCALL_T_OR_A, so it is
> better for it to be in the same 16 bits with others in the group, so in the
> asm code, andi. with this group could work.
>
> Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>

Looks good. Thanks.

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>

> ---
>  arch/powerpc/include/asm/thread_info.h |    7 +++++--
>  arch/powerpc/kernel/ptrace.c           |    5 +++++
>  2 files changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h
> index 406b7b9..414a261 100644
> --- a/arch/powerpc/include/asm/thread_info.h
> +++ b/arch/powerpc/include/asm/thread_info.h
> @@ -97,7 +97,7 @@ static inline struct thread_info *current_thread_info(void)
>  #define TIF_PERFMON_CTXSW      6       /* perfmon needs ctxsw calls */
>  #define TIF_SYSCALL_AUDIT      7       /* syscall auditing active */
>  #define TIF_SINGLESTEP         8       /* singlestepping active */
> -#define TIF_MEMDIE             9       /* is terminating due to OOM killer */
> +#define TIF_NOHZ               9       /* in adaptive nohz mode */
>  #define TIF_SECCOMP            10      /* secure computing */
>  #define TIF_RESTOREALL         11      /* Restore all regs (implies NOERROR) */
>  #define TIF_NOERROR            12      /* Force successful syscall return */
> @@ -106,6 +106,7 @@ static inline struct thread_info *current_thread_info(void)
>  #define TIF_SYSCALL_TRACEPOINT 15      /* syscall tracepoint instrumentation */
>  #define TIF_EMULATE_STACK_STORE        16      /* Is an instruction emulation
>                                                 for stack store? */
> +#define TIF_MEMDIE             17      /* is terminating due to OOM killer */
>
>  /* as above, but as bit values */
>  #define _TIF_SYSCALL_TRACE     (1<<TIF_SYSCALL_TRACE)
> @@ -124,8 +125,10 @@ static inline struct thread_info *current_thread_info(void)
>  #define _TIF_UPROBE            (1<<TIF_UPROBE)
>  #define _TIF_SYSCALL_TRACEPOINT        (1<<TIF_SYSCALL_TRACEPOINT)
>  #define _TIF_EMULATE_STACK_STORE       (1<<TIF_EMULATE_STACK_STORE)
> +#define _TIF_NOHZ              (1<<TIF_NOHZ)
>  #define _TIF_SYSCALL_T_OR_A    (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
> -                                _TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT)
> +                                _TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT | \
> +                                _TIF_NOHZ)
>
>  #define _TIF_USER_WORK_MASK    (_TIF_SIGPENDING | _TIF_NEED_RESCHED | \
>                                  _TIF_NOTIFY_RESUME | _TIF_UPROBE)
> diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
> index c497000..62238dd 100644
> --- a/arch/powerpc/kernel/ptrace.c
> +++ b/arch/powerpc/kernel/ptrace.c
> @@ -32,6 +32,7 @@
>  #include <trace/syscall.h>
>  #include <linux/hw_breakpoint.h>
>  #include <linux/perf_event.h>
> +#include <linux/context_tracking.h>
>
>  #include <asm/uaccess.h>
>  #include <asm/page.h>
> @@ -1745,6 +1746,8 @@ long do_syscall_trace_enter(struct pt_regs *regs)
>  {
>         long ret = 0;
>
> +       user_exit();
> +
>         secure_computing_strict(regs->gpr[0]);
>
>         if (test_thread_flag(TIF_SYSCALL_TRACE) &&
> @@ -1789,4 +1792,6 @@ void do_syscall_trace_leave(struct pt_regs *regs)
>         step = test_thread_flag(TIF_SINGLESTEP);
>         if (step || test_thread_flag(TIF_SYSCALL_TRACE))
>                 tracehook_report_syscall_exit(regs, step);
> +
> +       user_enter();
>  }
> --
> 1.7.9.5
>

^ permalink raw reply

* Re[3]: PS3 platform is broken on Linux 3.7.0
From: Phileas Fogg @ 2013-02-10  9:16 UTC (permalink / raw)
  To: Geoff Levand, linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1360486777.923599245@f85.mail.ru>

[-- Attachment #1: Type: text/plain, Size: 5185 bytes --]

 And another note.
I took a look at the MMU chapter in the Cell Architecture handbook and indeed the first 15 bits in VA are treated as 0 by the hardware.

Quote:

1. High-order bits above 65 bits in the 80-bit virtual address (VA[0:14]) are not implemented. The hardware always
   treats these bits as `0'. Software must not set these bits to any other value than `0' or the results are undefined in
   the PPE.


regards


Воскресенье, 10 февраля 2013, 12:59 +04:00 от Phileas Fogg <phileas-fogg@mail.ru>:
>Hi,
>
>i found where the problem lies.
>I also printed some values in ps3_hpte_insert with and without 64TB support, i used OpenWRT with Linux 3.7.6 for testing.
>
>Some values without 64TB support:
>-------------------------------------------------
>
>[    0.060487] RPC: Registered named UNIX socket transport module.
>[    0.060511] RPC: Registered udp transport module.
>[    0.060672] RPC: Registered tcp transport module.
>[    0.060873] RPC: Registered tcp NFSv4.1 backchannel transport module.
>[    0.061080] initcall .init_sunrpc+0x0/0xbc returned 0 after 784 usecs
>[    0.061280] calling  .populate_rootfs+0x0/0x120 @ 1
>[    0.061683] ps3_hpte_insert:result=0 vpn=f09b89af50101 pa=d4e0000 ix=dfa0 v=f09b89af5001 r=6c005d4e0194 psize=0 ssize=0 lpar=6c005d4e0000
>[    0.061733] ps3_hpte_insert:result=0 vpn=f09b89af50102 pa=d4e1000 ix=dfb8 v=f09b89af5001 r=6c005d4e1194 psize=0 ssize=0 lpar=6c005d4e1000
>[    0.061895] ps3_hpte_insert:result=0 vpn=f09b89af50103 pa=d4e2000 ix=dfb0 v=f09b89af5001 r=6c005d4e2194 psize=0 ssize=0 lpar=6c005d4e2000
>
>
>Some values with 64TB support:
>-------------------------------------------------
>
>[    0.076477] calling  .init_sunrpc+0x0/0xbc @ 1
>[    0.076992] RPC: Registered named UNIX socket transport module.
>[    0.077017] RPC: Registered udp transport module.
>[    0.077076] RPC: Registered tcp transport module.
>[    0.077277] RPC: Registered tcp NFSv4.1 backchannel transport module.
>[    0.077484] initcall .init_sunrpc+0x0/0xbc returned 0 after 784 usecs
>[    0.077684] calling  .populate_rootfs+0x0/0x120 @ 1
>[    0.078126] ps3_hpte_insert:result=-17 vpn=25008684d80101 pa=d567000 ix=2ec8 v=25008684d8001 r=6c005d567194 psize=0 ssize=0 lpar=6c005d567000
>[    0.078164] ps3_hpte_insert:result=-17 vpn=25008684d80101 pa=d567000 ix=2ec8 v=25008684d8001 r=6c005d567194 psize=0 ssize=0 lpar=6c005d567000
>[    0.078287] ------------[ cut here ]------------
>[    0.078482] Kernel BUG at c00000000002cb3c [verbose debug info unavailable]
>[    0.078686] Oops: Exception in kernel mode, sig: 5 [#1]
>[    0.078883] SMP NR_CPUS=2 PS3
>[    0.079084] Modules linked in:
>[    0.079287] NIP: c00000000002cb3c LR: c00000000002cb38 CTR: 00000000002ffc38
>[    0.079489] REGS: c00000000d04f0e0 TRAP: 0700   Not tainted  (3.7.6)
>[    0.079687] MSR: 8000000000020032 <SF,IR,DR,RI>  CR: 22000022  XER: 00000000
>[    0.079888] SOFTE: 0
>[    0.080090] TASK = c00000000d049060[1] 'swapper/1' THREAD: c00000000d04c000 CPU: 1
>GPR00: c00000000002cb38 c00000000d04f360 c0000000012ec8d0 0000000000000081 
>GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
>GPR08: 0000000000000000 c00000000124ce10 0000000000000000 c00000000002bcf0 
>GPR12: 0000000022000022 c000000007ffe280 c000000000008c94 c0000000005cba00 
>
>
>
>And now take a look at 'v' values in both cases.
>
>Without 64TB support:   v=f09b89af5001
>With 64TB support: v=25008684d8001
>
>Number of leading zeros in f09b89af5001 is 16.
>Number of leading zeros in 25008684d8001 is 14.
>
>And that's why lv1_insert_htab_entry fails with -17 which means LV1_ILLEGAL_PARAMETER_VALUE because
>the Hypervisor of PS3 checks 'AVPN' values for number of leading zeros and allows at least 15 bits which in case
>of 'v' value 25008684d8001 is too small of course.
>
>Not sure how to fix it in current Linux kernel. You guys know it better than me.
>
>Regards
>
>
>
>
>Понедельник, 14 января 2013, 15:37 -08:00 от Geoff Levand < geoff@infradead.org >:
>>Hi,
>>
>>On Fri, 2013-01-11 at 18:12 -0800, Geoff Levand wrote:
>>> I checked these, and Michael's 407821a34fce89b4f0b031dbab5cec7d059f46bc
>>> does indeed cause the LV1 hypervisor to panic early, and if that is
>>> reverted, Aneesh's 048ee0993ec8360abb0b51bdf8f8721e9ed62ec4 hits a BUG.
>>
>>Just to give an update, I did a little more work on it and found that
>>the call to lv1_insert_htab_entry() inside ps3_hpte_insert() is
>>failing.
>>
>>     http://git.kernel.org/?p=linux/kernel/git/geoff/ps3-linux.git;a=blob;f=arch/powerpc/platforms/ps3/htab.c;hb=HEAD#l70
>>
>>The values of the variables printed all look strange compared with
>>commit 048ee0993 reverted.  I'll try do some more work on it this
>>week.
>>
>>-Geoff
>>
>>
>>
>>
>>
>>_______________________________________________
>>Linuxppc-dev mailing list
>>Linuxppc-dev@lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/linuxppc-dev
>
>_______________________________________________
>Linuxppc-dev mailing list
>Linuxppc-dev@lists.ozlabs.org
>https://lists.ozlabs.org/listinfo/linuxppc-dev


[-- Attachment #2: Type: text/html, Size: 6880 bytes --]

^ permalink raw reply

* Re[2]: PS3 platform is broken on Linux 3.7.0
From: Phileas Fogg @ 2013-02-10  8:59 UTC (permalink / raw)
  To: Geoff Levand; +Cc: linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1358206641.4489.8.camel@smoke>

IEhpLAoKaSBmb3VuZCB3aGVyZSB0aGUgcHJvYmxlbSBsaWVzLgpJIGFsc28gcHJpbnRlZCBzb21l
IHZhbHVlcyBpbiBwczNfaHB0ZV9pbnNlcnQgd2l0aCBhbmQgd2l0aG91dCA2NFRCIHN1cHBvcnQs
IGkgdXNlZCBPcGVuV1JUIHdpdGggTGludXggMy43LjYgZm9yIHRlc3RpbmcuCgpTb21lIHZhbHVl
cyB3aXRob3V0IDY0VEIgc3VwcG9ydDoKLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLQoKW8KgwqDCoCAwLjA2MDQ4N10gUlBDOiBSZWdpc3RlcmVkIG5hbWVk
IFVOSVggc29ja2V0IHRyYW5zcG9ydCBtb2R1bGUuClvCoMKgwqAgMC4wNjA1MTFdIFJQQzogUmVn
aXN0ZXJlZCB1ZHAgdHJhbnNwb3J0IG1vZHVsZS4KW8KgwqDCoCAwLjA2MDY3Ml0gUlBDOiBSZWdp
c3RlcmVkIHRjcCB0cmFuc3BvcnQgbW9kdWxlLgpbwqDCoMKgIDAuMDYwODczXSBSUEM6IFJlZ2lz
dGVyZWQgdGNwIE5GU3Y0LjEgYmFja2NoYW5uZWwgdHJhbnNwb3J0IG1vZHVsZS4KW8KgwqDCoCAw
LjA2MTA4MF0gaW5pdGNhbGwgLmluaXRfc3VucnBjKzB4MC8weGJjIHJldHVybmVkIDAgYWZ0ZXIg
Nzg0IHVzZWNzClvCoMKgwqAgMC4wNjEyODBdIGNhbGxpbmfCoCAucG9wdWxhdGVfcm9vdGZzKzB4
MC8weDEyMCBAIDEKW8KgwqDCoCAwLjA2MTY4M10gcHMzX2hwdGVfaW5zZXJ0OnJlc3VsdD0wIHZw
bj1mMDliODlhZjUwMTAxIHBhPWQ0ZTAwMDAgaXg9ZGZhMCB2PWYwOWI4OWFmNTAwMSByPTZjMDA1
ZDRlMDE5NCBwc2l6ZT0wIHNzaXplPTAgbHBhcj02YzAwNWQ0ZTAwMDAKW8KgwqDCoCAwLjA2MTcz
M10gcHMzX2hwdGVfaW5zZXJ0OnJlc3VsdD0wIHZwbj1mMDliODlhZjUwMTAyIHBhPWQ0ZTEwMDAg
aXg9ZGZiOCB2PWYwOWI4OWFmNTAwMSByPTZjMDA1ZDRlMTE5NCBwc2l6ZT0wIHNzaXplPTAgbHBh
cj02YzAwNWQ0ZTEwMDAKW8KgwqDCoCAwLjA2MTg5NV0gcHMzX2hwdGVfaW5zZXJ0OnJlc3VsdD0w
IHZwbj1mMDliODlhZjUwMTAzIHBhPWQ0ZTIwMDAgaXg9ZGZiMCB2PWYwOWI4OWFmNTAwMSByPTZj
MDA1ZDRlMjE5NCBwc2l6ZT0wIHNzaXplPTAgbHBhcj02YzAwNWQ0ZTIwMDAKCgpTb21lIHZhbHVl
cyB3aXRoIDY0VEIgc3VwcG9ydDoKLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLQoKW8KgwqDCoCAwLjA3NjQ3N10gY2FsbGluZ8KgIC5pbml0X3N1bnJwYysw
eDAvMHhiYyBAIDEKW8KgwqDCoCAwLjA3Njk5Ml0gUlBDOiBSZWdpc3RlcmVkIG5hbWVkIFVOSVgg
c29ja2V0IHRyYW5zcG9ydCBtb2R1bGUuClvCoMKgwqAgMC4wNzcwMTddIFJQQzogUmVnaXN0ZXJl
ZCB1ZHAgdHJhbnNwb3J0IG1vZHVsZS4KW8KgwqDCoCAwLjA3NzA3Nl0gUlBDOiBSZWdpc3RlcmVk
IHRjcCB0cmFuc3BvcnQgbW9kdWxlLgpbwqDCoMKgIDAuMDc3Mjc3XSBSUEM6IFJlZ2lzdGVyZWQg
dGNwIE5GU3Y0LjEgYmFja2NoYW5uZWwgdHJhbnNwb3J0IG1vZHVsZS4KW8KgwqDCoCAwLjA3NzQ4
NF0gaW5pdGNhbGwgLmluaXRfc3VucnBjKzB4MC8weGJjIHJldHVybmVkIDAgYWZ0ZXIgNzg0IHVz
ZWNzClvCoMKgwqAgMC4wNzc2ODRdIGNhbGxpbmfCoCAucG9wdWxhdGVfcm9vdGZzKzB4MC8weDEy
MCBAIDEKW8KgwqDCoCAwLjA3ODEyNl0gcHMzX2hwdGVfaW5zZXJ0OnJlc3VsdD0tMTcgdnBuPTI1
MDA4Njg0ZDgwMTAxIHBhPWQ1NjcwMDAgaXg9MmVjOCB2PTI1MDA4Njg0ZDgwMDEgcj02YzAwNWQ1
NjcxOTQgcHNpemU9MCBzc2l6ZT0wIGxwYXI9NmMwMDVkNTY3MDAwClvCoMKgwqAgMC4wNzgxNjRd
IHBzM19ocHRlX2luc2VydDpyZXN1bHQ9LTE3IHZwbj0yNTAwODY4NGQ4MDEwMSBwYT1kNTY3MDAw
IGl4PTJlYzggdj0yNTAwODY4NGQ4MDAxIHI9NmMwMDVkNTY3MTk0IHBzaXplPTAgc3NpemU9MCBs
cGFyPTZjMDA1ZDU2NzAwMApbwqDCoMKgIDAuMDc4Mjg3XSAtLS0tLS0tLS0tLS1bIGN1dCBoZXJl
IF0tLS0tLS0tLS0tLS0KW8KgwqDCoCAwLjA3ODQ4Ml0gS2VybmVsIEJVRyBhdCBjMDAwMDAwMDAw
MDJjYjNjIFt2ZXJib3NlIGRlYnVnIGluZm8gdW5hdmFpbGFibGVdClvCoMKgwqAgMC4wNzg2ODZd
IE9vcHM6IEV4Y2VwdGlvbiBpbiBrZXJuZWwgbW9kZSwgc2lnOiA1IFsjMV0KW8KgwqDCoCAwLjA3
ODg4M10gU01QIE5SX0NQVVM9MiBQUzMKW8KgwqDCoCAwLjA3OTA4NF0gTW9kdWxlcyBsaW5rZWQg
aW46ClvCoMKgwqAgMC4wNzkyODddIE5JUDogYzAwMDAwMDAwMDAyY2IzYyBMUjogYzAwMDAwMDAw
MDAyY2IzOCBDVFI6IDAwMDAwMDAwMDAyZmZjMzgKW8KgwqDCoCAwLjA3OTQ4OV0gUkVHUzogYzAw
MDAwMDAwZDA0ZjBlMCBUUkFQOiAwNzAwwqDCoCBOb3QgdGFpbnRlZMKgICgzLjcuNikKW8KgwqDC
oCAwLjA3OTY4N10gTVNSOiA4MDAwMDAwMDAwMDIwMDMyIDxTRixJUixEUixSST7CoCBDUjogMjIw
MDAwMjLCoCBYRVI6IDAwMDAwMDAwClvCoMKgwqAgMC4wNzk4ODhdIFNPRlRFOiAwClvCoMKgwqAg
MC4wODAwOTBdIFRBU0sgPSBjMDAwMDAwMDBkMDQ5MDYwWzFdICdzd2FwcGVyLzEnIFRIUkVBRDog
YzAwMDAwMDAwZDA0YzAwMCBDUFU6IDEKR1BSMDA6IGMwMDAwMDAwMDAwMmNiMzggYzAwMDAwMDAw
ZDA0ZjM2MCBjMDAwMDAwMDAxMmVjOGQwIDAwMDAwMDAwMDAwMDAwODEgCkdQUjA0OiAwMDAwMDAw
MDAwMDAwMDAwIDAwMDAwMDAwMDAwMDAwMDAgMDAwMDAwMDAwMDAwMDAwMCAwMDAwMDAwMDAwMDAw
MDAwIApHUFIwODogMDAwMDAwMDAwMDAwMDAwMCBjMDAwMDAwMDAxMjRjZTEwIDAwMDAwMDAwMDAw
MDAwMDAgYzAwMDAwMDAwMDAyYmNmMCAKR1BSMTI6IDAwMDAwMDAwMjIwMDAwMjIgYzAwMDAwMDAw
N2ZmZTI4MCBjMDAwMDAwMDAwMDA4Yzk0IGMwMDAwMDAwMDA1Y2JhMDAgCgoKCkFuZCBub3cgdGFr
ZSBhIGxvb2sgYXQgJ3YnIHZhbHVlcyBpbiBib3RoIGNhc2VzLgoKV2l0aG91dCA2NFRCIHN1cHBv
cnQ6ICAgdj1mMDliODlhZjUwMDEKV2l0aCA2NFRCIHN1cHBvcnQ6IHY9MjUwMDg2ODRkODAwMQoK
TnVtYmVyIG9mIGxlYWRpbmcgemVyb3MgaW4gZjA5Yjg5YWY1MDAxIGlzIDE2LgpOdW1iZXIgb2Yg
bGVhZGluZyB6ZXJvcyBpbiAyNTAwODY4NGQ4MDAxIGlzIDE0LgoKQW5kIHRoYXQncyB3aHkgbHYx
X2luc2VydF9odGFiX2VudHJ5IGZhaWxzIHdpdGggLTE3IHdoaWNoIG1lYW5zIExWMV9JTExFR0FM
X1BBUkFNRVRFUl9WQUxVRSBiZWNhdXNlCnRoZSBIeXBlcnZpc29yIG9mIFBTMyBjaGVja3MgJ0FW
UE4nIHZhbHVlcyBmb3IgbnVtYmVyIG9mIGxlYWRpbmcgemVyb3MgYW5kIGFsbG93cyBhdCBsZWFz
dCAxNSBiaXRzIHdoaWNoIGluIGNhc2UKb2YgJ3YnIHZhbHVlIDI1MDA4Njg0ZDgwMDEgaXMgdG9v
IHNtYWxsIG9mIGNvdXJzZS4KCk5vdCBzdXJlIGhvdyB0byBmaXggaXQgaW4gY3VycmVudCBMaW51
eCBrZXJuZWwuIFlvdSBndXlzIGtub3cgaXQgYmV0dGVyIHRoYW4gbWUuCgpSZWdhcmRzCgoKCgrQ
n9C+0L3QtdC00LXQu9GM0L3QuNC6LCAxNCDRj9C90LLQsNGA0Y8gMjAxMywgMTU6MzcgLTA4OjAw
INC+0YIgR2VvZmYgTGV2YW5kIDxnZW9mZkBpbmZyYWRlYWQub3JnPjoKPkhpLAo+Cj5PbiBGcmks
IDIwMTMtMDEtMTEgYXQgMTg6MTIgLTA4MDAsIEdlb2ZmIExldmFuZCB3cm90ZToKPj4gSSBjaGVj
a2VkIHRoZXNlLCBhbmQgTWljaGFlbCdzIDQwNzgyMWEzNGZjZTg5YjRmMGIwMzFkYmFiNWNlYzdk
MDU5ZjQ2YmMKPj4gZG9lcyBpbmRlZWQgY2F1c2UgdGhlIExWMSBoeXBlcnZpc29yIHRvIHBhbmlj
IGVhcmx5LCBhbmQgaWYgdGhhdCBpcwo+PiByZXZlcnRlZCwgQW5lZXNoJ3MgMDQ4ZWUwOTkzZWM4
MzYwYWJiMGI1MWJkZjhmODcyMWU5ZWQ2MmVjNCBoaXRzIGEgQlVHLgo+Cj5KdXN0IHRvIGdpdmUg
YW4gdXBkYXRlLCBJIGRpZCBhIGxpdHRsZSBtb3JlIHdvcmsgb24gaXQgYW5kIGZvdW5kIHRoYXQK
PnRoZSBjYWxsIHRvIGx2MV9pbnNlcnRfaHRhYl9lbnRyeSgpIGluc2lkZSBwczNfaHB0ZV9pbnNl
cnQoKSBpcwo+ZmFpbGluZy4KPgo+wqDCoMKgIGh0dHA6Ly9naXQua2VybmVsLm9yZy8/cD1saW51
eC9rZXJuZWwvZ2l0L2dlb2ZmL3BzMy1saW51eC5naXQ7YT1ibG9iO2Y9YXJjaC9wb3dlcnBjL3Bs
YXRmb3Jtcy9wczMvaHRhYi5jO2hiPUhFQUQjbDcwCj4KPlRoZSB2YWx1ZXMgb2YgdGhlIHZhcmlh
YmxlcyBwcmludGVkIGFsbCBsb29rIHN0cmFuZ2UgY29tcGFyZWQgd2l0aAo+Y29tbWl0IDA0OGVl
MDk5MyByZXZlcnRlZC4gIEknbGwgdHJ5IGRvIHNvbWUgbW9yZSB3b3JrIG9uIGl0IHRoaXMKPndl
ZWsuCj4KPi1HZW9mZgo+Cj4KPgo+Cj4KPl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f
X19fX19fX19fX19fX19fCj5MaW51eHBwYy1kZXYgbWFpbGluZyBsaXN0Cj5MaW51eHBwYy1kZXZA
bGlzdHMub3psYWJzLm9yZwo+aHR0cHM6Ly9saXN0cy5vemxhYnMub3JnL2xpc3RpbmZvL2xpbnV4
cHBjLWRldgoK

^ permalink raw reply

* [PATCH RESEND] net: fec_mpc52xx: Read MAC address from device-tree
From: Stefan Roese @ 2013-02-09  9:49 UTC (permalink / raw)
  To: netdev; +Cc: linuxppc-dev, Anatolij Gustschin

Until now, the MPC5200 FEC ethernet driver relied upon the bootloader
(U-Boot) to write the MAC address into the ethernet controller
registers. The Linux driver should not rely on such a thing. So
lets read the MAC address from the DT as it should be done here.

This fixes a problem with a MPC5200 board that uses the SPL U-Boot
version without FEC initialization before Linux booting for
boot speedup.

Additionally a status line will now be printed upon successful
driver probing, also displaying this MAC address.

Signed-off-by: Stefan Roese <sr@denx.de>
Cc: Anatolij Gustschin <agust@denx.de>
---
 drivers/net/ethernet/freescale/fec_mpc52xx.c | 30 +++++++++++++++++-----------
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_mpc52xx.c b/drivers/net/ethernet/freescale/fec_mpc52xx.c
index 2933d08..f4c3897 100644
--- a/drivers/net/ethernet/freescale/fec_mpc52xx.c
+++ b/drivers/net/ethernet/freescale/fec_mpc52xx.c
@@ -110,15 +110,6 @@ static void mpc52xx_fec_set_paddr(struct net_device *dev, u8 *mac)
 	out_be32(&fec->paddr2, (*(u16 *)(&mac[4]) << 16) | FEC_PADDR2_TYPE);
 }
 
-static void mpc52xx_fec_get_paddr(struct net_device *dev, u8 *mac)
-{
-	struct mpc52xx_fec_priv *priv = netdev_priv(dev);
-	struct mpc52xx_fec __iomem *fec = priv->fec;
-
-	*(u32 *)(&mac[0]) = in_be32(&fec->paddr1);
-	*(u16 *)(&mac[4]) = in_be32(&fec->paddr2) >> 16;
-}
-
 static int mpc52xx_fec_set_mac_address(struct net_device *dev, void *addr)
 {
 	struct sockaddr *sock = addr;
@@ -928,10 +919,22 @@ static int __devinit mpc52xx_fec_probe(struct platform_device *op)
 	priv->t_irq = bcom_get_task_irq(priv->tx_dmatsk);
 
 	/* MAC address init */
-	if (!is_zero_ether_addr(mpc52xx_fec_mac_addr))
+	if (!is_zero_ether_addr(mpc52xx_fec_mac_addr)) {
 		memcpy(ndev->dev_addr, mpc52xx_fec_mac_addr, 6);
-	else
-		mpc52xx_fec_get_paddr(ndev, ndev->dev_addr);
+	} else {
+		struct device_node *np = op->dev.of_node;
+		const void *p;
+
+		/* Read MAC-address */
+		p = of_get_property(np, "local-mac-address", NULL);
+		if (p == NULL) {
+			dev_err(&ndev->dev, "%s: Can't find local-mac-address property\n",
+				np->full_name);
+			rv = -ENXIO;
+			goto err_irq_dispose;
+		}
+		memcpy(ndev->dev_addr, p, 6);
+	}
 
 	priv->msg_enable = netif_msg_init(debug, MPC52xx_MESSAGES_DEFAULT);
 
@@ -970,11 +973,14 @@ static int __devinit mpc52xx_fec_probe(struct platform_device *op)
 
 	/* We're done ! */
 	dev_set_drvdata(&op->dev, ndev);
+	printk(KERN_INFO "%s: %s MAC %pM\n",
+	       ndev->name, op->dev.of_node->full_name, ndev->dev_addr);
 
 	return 0;
 
 err_node:
 	of_node_put(priv->phy_node);
+err_irq_dispose:
 	irq_dispose_mapping(ndev->irq);
 err_rx_tx_dmatsk:
 	if (priv->rx_dmatsk)
-- 
1.8.1.2

^ permalink raw reply related

* [PATCH RESEND] net: fec_mpc52xx: Read MAC address from device-tree
From: Stefan Roese @ 2013-02-09  9:48 UTC (permalink / raw)
  To: net; +Cc: linuxppc-dev, Anatolij Gustschin

Until now, the MPC5200 FEC ethernet driver relied upon the bootloader
(U-Boot) to write the MAC address into the ethernet controller
registers. The Linux driver should not rely on such a thing. So
lets read the MAC address from the DT as it should be done here.

This fixes a problem with a MPC5200 board that uses the SPL U-Boot
version without FEC initialization before Linux booting for
boot speedup.

Additionally a status line will now be printed upon successful
driver probing, also displaying this MAC address.

Signed-off-by: Stefan Roese <sr@denx.de>
Cc: Anatolij Gustschin <agust@denx.de>
---
 drivers/net/ethernet/freescale/fec_mpc52xx.c | 30 +++++++++++++++++-----------
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_mpc52xx.c b/drivers/net/ethernet/freescale/fec_mpc52xx.c
index 2933d08..f4c3897 100644
--- a/drivers/net/ethernet/freescale/fec_mpc52xx.c
+++ b/drivers/net/ethernet/freescale/fec_mpc52xx.c
@@ -110,15 +110,6 @@ static void mpc52xx_fec_set_paddr(struct net_device *dev, u8 *mac)
 	out_be32(&fec->paddr2, (*(u16 *)(&mac[4]) << 16) | FEC_PADDR2_TYPE);
 }
 
-static void mpc52xx_fec_get_paddr(struct net_device *dev, u8 *mac)
-{
-	struct mpc52xx_fec_priv *priv = netdev_priv(dev);
-	struct mpc52xx_fec __iomem *fec = priv->fec;
-
-	*(u32 *)(&mac[0]) = in_be32(&fec->paddr1);
-	*(u16 *)(&mac[4]) = in_be32(&fec->paddr2) >> 16;
-}
-
 static int mpc52xx_fec_set_mac_address(struct net_device *dev, void *addr)
 {
 	struct sockaddr *sock = addr;
@@ -928,10 +919,22 @@ static int __devinit mpc52xx_fec_probe(struct platform_device *op)
 	priv->t_irq = bcom_get_task_irq(priv->tx_dmatsk);
 
 	/* MAC address init */
-	if (!is_zero_ether_addr(mpc52xx_fec_mac_addr))
+	if (!is_zero_ether_addr(mpc52xx_fec_mac_addr)) {
 		memcpy(ndev->dev_addr, mpc52xx_fec_mac_addr, 6);
-	else
-		mpc52xx_fec_get_paddr(ndev, ndev->dev_addr);
+	} else {
+		struct device_node *np = op->dev.of_node;
+		const void *p;
+
+		/* Read MAC-address */
+		p = of_get_property(np, "local-mac-address", NULL);
+		if (p == NULL) {
+			dev_err(&ndev->dev, "%s: Can't find local-mac-address property\n",
+				np->full_name);
+			rv = -ENXIO;
+			goto err_irq_dispose;
+		}
+		memcpy(ndev->dev_addr, p, 6);
+	}
 
 	priv->msg_enable = netif_msg_init(debug, MPC52xx_MESSAGES_DEFAULT);
 
@@ -970,11 +973,14 @@ static int __devinit mpc52xx_fec_probe(struct platform_device *op)
 
 	/* We're done ! */
 	dev_set_drvdata(&op->dev, ndev);
+	printk(KERN_INFO "%s: %s MAC %pM\n",
+	       ndev->name, op->dev.of_node->full_name, ndev->dev_addr);
 
 	return 0;
 
 err_node:
 	of_node_put(priv->phy_node);
+err_irq_dispose:
 	irq_dispose_mapping(ndev->irq);
 err_rx_tx_dmatsk:
 	if (priv->rx_dmatsk)
-- 
1.8.1.2

^ permalink raw reply related

* Re: [PATCH v5 14/45] rcu, CPU hotplug: Fix comment referring to stop_machine()
From: Paul E. McKenney @ 2013-02-09  0:14 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: linux-doc, peterz, fweisbec, linux-kernel, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, nikunj, linux-pm, rusty, rostedt,
	rjw, namhyung, tglx, linux-arm-kernel, netdev, oleg, sbw, tj,
	akpm, linuxppc-dev
In-Reply-To: <20130122073630.13822.99359.stgit@srivatsabhat.in.ibm.com>

On Tue, Jan 22, 2013 at 01:06:34PM +0530, Srivatsa S. Bhat wrote:
> Don't refer to stop_machine() in the CPU hotplug path, since we are going
> to get rid of it. Also, move the comment referring to callback adoption
> to the CPU_DEAD case, because that's where it happens now.
> 
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

Ouch!  That comment is indeed obsolete and must die.

I queued this to -rcu with your Signed-off-by.  However, I omitted
the added comment, as it is imcomplete -- it is easy to look at
rcu_cleanup_dead_cpu() to see what it does.

							Thanx, Paul

> ---
> 
>  kernel/rcutree.c |    9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index e441b77..ac94474 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -2827,11 +2827,6 @@ static int __cpuinit rcu_cpu_notify(struct notifier_block *self,
>  		break;
>  	case CPU_DYING:
>  	case CPU_DYING_FROZEN:
> -		/*
> -		 * The whole machine is "stopped" except this CPU, so we can
> -		 * touch any data without introducing corruption. We send the
> -		 * dying CPU's callbacks to an arbitrarily chosen online CPU.
> -		 */
>  		for_each_rcu_flavor(rsp)
>  			rcu_cleanup_dying_cpu(rsp);
>  		rcu_cleanup_after_idle(cpu);
> @@ -2840,6 +2835,10 @@ static int __cpuinit rcu_cpu_notify(struct notifier_block *self,
>  	case CPU_DEAD_FROZEN:
>  	case CPU_UP_CANCELED:
>  	case CPU_UP_CANCELED_FROZEN:
> +		/*
> +		 * We send the dead CPU's callbacks to an arbitrarily chosen
> +		 * online CPU.
> +		 */
>  		for_each_rcu_flavor(rsp)
>  			rcu_cleanup_dead_cpu(cpu, rsp);
>  		break;
> 

^ permalink raw reply

* Re: [PATCH v5 45/45] Documentation/cpu-hotplug: Remove references to stop_machine()
From: Paul E. McKenney @ 2013-02-09  0:16 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: linux-doc, peterz, fweisbec, linux-kernel, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, nikunj, linux-pm, rusty, rostedt,
	rjw, namhyung, tglx, linux-arm-kernel, netdev, oleg, sbw, tj,
	akpm, linuxppc-dev
In-Reply-To: <20130122074543.13822.44724.stgit@srivatsabhat.in.ibm.com>

On Tue, Jan 22, 2013 at 01:15:48PM +0530, Srivatsa S. Bhat wrote:
> Since stop_machine() is no longer used in the CPU offline path, we cannot
> disable CPU hotplug using preempt_disable()/local_irq_disable() etc. We
> need to use the newly introduced get/put_online_cpus_atomic() APIs.
> Reflect this in the documentation.
> 
> Cc: Rob Landley <rob@landley.net>
> Cc: linux-doc@vger.kernel.org
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> ---
> 
>  Documentation/cpu-hotplug.txt |   17 +++++++++++------
>  1 file changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/cpu-hotplug.txt b/Documentation/cpu-hotplug.txt
> index 9f40135..7f907ec 100644
> --- a/Documentation/cpu-hotplug.txt
> +++ b/Documentation/cpu-hotplug.txt
> @@ -113,13 +113,15 @@ Never use anything other than cpumask_t to represent bitmap of CPUs.
>  	#include <linux/cpu.h>
>  	get_online_cpus() and put_online_cpus():
> 
> -The above calls are used to inhibit cpu hotplug operations. While the
> +The above calls are used to inhibit cpu hotplug operations, when invoked from
> +non-atomic context (because the above functions can sleep). While the
>  cpu_hotplug.refcount is non zero, the cpu_online_mask will not change.
> -If you merely need to avoid cpus going away, you could also use
> -preempt_disable() and preempt_enable() for those sections.
> -Just remember the critical section cannot call any
> -function that can sleep or schedule this process away. The preempt_disable()
> -will work as long as stop_machine_run() is used to take a cpu down.
> +
> +However, if you are executing in atomic context (ie., you can't afford to
> +sleep), and you merely need to avoid cpus going offline, you can use
> +get_online_cpus_atomic() and put_online_cpus_atomic() for those sections.
> +Just remember the critical section cannot call any function that can sleep or
> +schedule this process away.
> 
>  CPU Hotplug - Frequently Asked Questions.
> 
> @@ -360,6 +362,9 @@ A: There are two ways.  If your code can be run in interrupt context, use
>  		return err;
>  	}
> 
> +   If my_func_on_cpu() itself cannot block, use get/put_online_cpus_atomic()
> +   instead of get/put_online_cpus() to prevent CPUs from going offline.
> +
>  Q: How do we determine how many CPUs are available for hotplug.
>  A: There is no clear spec defined way from ACPI that can give us that
>     information today. Based on some input from Natalie of Unisys,
> 

^ permalink raw reply

* Re: [PATCH v5 44/45] CPU hotplug, stop_machine: Decouple CPU hotplug from stop_machine() in Kconfig
From: Paul E. McKenney @ 2013-02-09  0:15 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: linux-doc, peterz, fweisbec, linux-kernel, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, nikunj, linux-pm, rusty, rostedt,
	rjw, namhyung, tglx, linux-arm-kernel, netdev, oleg, sbw, tj,
	akpm, linuxppc-dev
In-Reply-To: <20130122074518.13822.86992.stgit@srivatsabhat.in.ibm.com>

On Tue, Jan 22, 2013 at 01:15:22PM +0530, Srivatsa S. Bhat wrote:
> ... and also cleanup a comment that refers to CPU hotplug being dependent on
> stop_machine().
> 
> Cc: David Howells <dhowells@redhat.com>
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

(Hey, I thought I owed myself an easy one!)

> ---
> 
>  include/linux/stop_machine.h |    2 +-
>  init/Kconfig                 |    2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
> index 3b5e910..ce2d3c4 100644
> --- a/include/linux/stop_machine.h
> +++ b/include/linux/stop_machine.h
> @@ -120,7 +120,7 @@ int stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus);
>   * @cpus: the cpus to run the @fn() on (NULL = any online cpu)
>   *
>   * Description: This is a special version of the above, which assumes cpus
> - * won't come or go while it's being called.  Used by hotplug cpu.
> + * won't come or go while it's being called.
>   */
>  int __stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus);
> 
> diff --git a/init/Kconfig b/init/Kconfig
> index be8b7f5..048a0c5 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1711,7 +1711,7 @@ config INIT_ALL_POSSIBLE
>  config STOP_MACHINE
>  	bool
>  	default y
> -	depends on (SMP && MODULE_UNLOAD) || HOTPLUG_CPU
> +	depends on (SMP && MODULE_UNLOAD)
>  	help
>  	  Need stop_machine() primitive.
> 
> 

^ permalink raw reply

* Re: [PATCH v5 09/45] smp, cpu hotplug: Fix smp_call_function_*() to prevent CPU offline properly
From: Paul E. McKenney @ 2013-02-09  0:07 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: linux-doc, peterz, fweisbec, linux-kernel, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, nikunj, linux-pm, rusty, rostedt,
	rjw, namhyung, tglx, linux-arm-kernel, netdev, oleg, sbw, tj,
	akpm, linuxppc-dev
In-Reply-To: <20130122073508.13822.12784.stgit@srivatsabhat.in.ibm.com>

On Tue, Jan 22, 2013 at 01:05:10PM +0530, Srivatsa S. Bhat wrote:
> Once stop_machine() is gone from the CPU offline path, we won't be able to
> depend on preempt_disable() to prevent CPUs from going offline from under us.
> 
> Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
> while invoking from atomic context.
> 
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

Would it make sense for get_online_cpus_atomic() to return the current
CPU number?  Looks good otherwise.

							Thanx, Paul

> ---
> 
>  kernel/smp.c |   40 ++++++++++++++++++++++++++--------------
>  1 file changed, 26 insertions(+), 14 deletions(-)
> 
> diff --git a/kernel/smp.c b/kernel/smp.c
> index 29dd40a..f421bcc 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -310,7 +310,8 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
>  	 * prevent preemption and reschedule on another processor,
>  	 * as well as CPU removal
>  	 */
> -	this_cpu = get_cpu();
> +	get_online_cpus_atomic();
> +	this_cpu = smp_processor_id();
> 
>  	/*
>  	 * Can deadlock when called with interrupts disabled.
> @@ -342,7 +343,7 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
>  		}
>  	}
> 
> -	put_cpu();
> +	put_online_cpus_atomic();
> 
>  	return err;
>  }
> @@ -371,8 +372,10 @@ int smp_call_function_any(const struct cpumask *mask,
>  	const struct cpumask *nodemask;
>  	int ret;
> 
> +	get_online_cpus_atomic();
>  	/* Try for same CPU (cheapest) */
> -	cpu = get_cpu();
> +	cpu = smp_processor_id();
> +
>  	if (cpumask_test_cpu(cpu, mask))
>  		goto call;
> 
> @@ -388,7 +391,7 @@ int smp_call_function_any(const struct cpumask *mask,
>  	cpu = cpumask_any_and(mask, cpu_online_mask);
>  call:
>  	ret = smp_call_function_single(cpu, func, info, wait);
> -	put_cpu();
> +	put_online_cpus_atomic();
>  	return ret;
>  }
>  EXPORT_SYMBOL_GPL(smp_call_function_any);
> @@ -409,25 +412,28 @@ void __smp_call_function_single(int cpu, struct call_single_data *data,
>  	unsigned int this_cpu;
>  	unsigned long flags;
> 
> -	this_cpu = get_cpu();
> +	get_online_cpus_atomic();
> +
> +	this_cpu = smp_processor_id();
> +
>  	/*
>  	 * Can deadlock when called with interrupts disabled.
>  	 * We allow cpu's that are not yet online though, as no one else can
>  	 * send smp call function interrupt to this cpu and as such deadlocks
>  	 * can't happen.
>  	 */
> -	WARN_ON_ONCE(cpu_online(smp_processor_id()) && wait && irqs_disabled()
> +	WARN_ON_ONCE(cpu_online(this_cpu) && wait && irqs_disabled()
>  		     && !oops_in_progress);
> 
>  	if (cpu == this_cpu) {
>  		local_irq_save(flags);
>  		data->func(data->info);
>  		local_irq_restore(flags);
> -	} else {
> +	} else if ((unsigned)cpu < nr_cpu_ids && cpu_online(cpu)) {
>  		csd_lock(data);
>  		generic_exec_single(cpu, data, wait);
>  	}
> -	put_cpu();
> +	put_online_cpus_atomic();
>  }
> 
>  /**
> @@ -451,6 +457,8 @@ void smp_call_function_many(const struct cpumask *mask,
>  	unsigned long flags;
>  	int refs, cpu, next_cpu, this_cpu = smp_processor_id();
> 
> +	get_online_cpus_atomic();
> +
>  	/*
>  	 * Can deadlock when called with interrupts disabled.
>  	 * We allow cpu's that are not yet online though, as no one else can
> @@ -467,17 +475,18 @@ void smp_call_function_many(const struct cpumask *mask,
> 
>  	/* No online cpus?  We're done. */
>  	if (cpu >= nr_cpu_ids)
> -		return;
> +		goto out_unlock;
> 
>  	/* Do we have another CPU which isn't us? */
>  	next_cpu = cpumask_next_and(cpu, mask, cpu_online_mask);
>  	if (next_cpu == this_cpu)
> -		next_cpu = cpumask_next_and(next_cpu, mask, cpu_online_mask);
> +		next_cpu = cpumask_next_and(next_cpu, mask,
> +						cpu_online_mask);
> 
>  	/* Fastpath: do that cpu by itself. */
>  	if (next_cpu >= nr_cpu_ids) {
>  		smp_call_function_single(cpu, func, info, wait);
> -		return;
> +		goto out_unlock;
>  	}
> 
>  	data = &__get_cpu_var(cfd_data);
> @@ -523,7 +532,7 @@ void smp_call_function_many(const struct cpumask *mask,
>  	/* Some callers race with other cpus changing the passed mask */
>  	if (unlikely(!refs)) {
>  		csd_unlock(&data->csd);
> -		return;
> +		goto out_unlock;
>  	}
> 
>  	raw_spin_lock_irqsave(&call_function.lock, flags);
> @@ -554,6 +563,9 @@ void smp_call_function_many(const struct cpumask *mask,
>  	/* Optionally wait for the CPUs to complete */
>  	if (wait)
>  		csd_lock_wait(&data->csd);
> +
> +out_unlock:
> +	put_online_cpus_atomic();
>  }
>  EXPORT_SYMBOL(smp_call_function_many);
> 
> @@ -574,9 +586,9 @@ EXPORT_SYMBOL(smp_call_function_many);
>   */
>  int smp_call_function(smp_call_func_t func, void *info, int wait)
>  {
> -	preempt_disable();
> +	get_online_cpus_atomic();
>  	smp_call_function_many(cpu_online_mask, func, info, wait);
> -	preempt_enable();
> +	put_online_cpus_atomic();
> 
>  	return 0;
>  }
> 

^ permalink raw reply

* Re: [PATCH v5 08/45] CPU hotplug: Convert preprocessor macros to static inline functions
From: Paul E. McKenney @ 2013-02-08 23:51 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: linux-doc, peterz, fweisbec, linux-kernel, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, nikunj, linux-pm, rusty, rostedt,
	rjw, namhyung, tglx, linux-arm-kernel, netdev, oleg, sbw, tj,
	akpm, linuxppc-dev
In-Reply-To: <20130122073500.13822.11469.stgit@srivatsabhat.in.ibm.com>

On Tue, Jan 22, 2013 at 01:05:02PM +0530, Srivatsa S. Bhat wrote:
> On 12/05/2012 06:10 AM, Andrew Morton wrote:
> "static inline C functions would be preferred if possible.  Feel free to
> fix up the wrong crufty surrounding code as well ;-)"
> 
> Convert the macros in the CPU hotplug code to static inline C functions.
> 
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> ---
> 
>  include/linux/cpu.h |    8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/cpu.h b/include/linux/cpu.h
> index cf24da1..eb79f47 100644
> --- a/include/linux/cpu.h
> +++ b/include/linux/cpu.h
> @@ -198,10 +198,10 @@ static inline void cpu_hotplug_driver_unlock(void)
> 
>  #else		/* CONFIG_HOTPLUG_CPU */
> 
> -#define get_online_cpus()	do { } while (0)
> -#define put_online_cpus()	do { } while (0)
> -#define get_online_cpus_atomic()	do { } while (0)
> -#define put_online_cpus_atomic()	do { } while (0)
> +static inline void get_online_cpus(void) {}
> +static inline void put_online_cpus(void) {}
> +static inline void get_online_cpus_atomic(void) {}
> +static inline void put_online_cpus_atomic(void) {}
>  #define hotcpu_notifier(fn, pri)	do { (void)(fn); } while (0)
>  /* These aren't inline functions due to a GCC bug. */
>  #define register_hotcpu_notifier(nb)	({ (void)(nb); 0; })
> 

^ permalink raw reply

* Re: [PATCH v5 07/45] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Paul E. McKenney @ 2013-02-08 23:50 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: linux-doc, peterz, fweisbec, linux-kernel, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, nikunj, linux-pm, rusty, rostedt,
	rjw, namhyung, tglx, linux-arm-kernel, netdev, oleg, sbw, tj,
	akpm, linuxppc-dev
In-Reply-To: <20130122073446.13822.39253.stgit@srivatsabhat.in.ibm.com>

On Tue, Jan 22, 2013 at 01:04:54PM +0530, Srivatsa S. Bhat wrote:
> There are places where preempt_disable() or local_irq_disable() are used
> to prevent any CPU from going offline during the critical section. Let us
> call them as "atomic hotplug readers" ("atomic" because they run in atomic,
> non-preemptible contexts).
> 
> Today, preempt_disable() or its equivalent works because the hotplug writer
> uses stop_machine() to take CPUs offline. But once stop_machine() is gone
> from the CPU hotplug offline path, the readers won't be able to prevent
> CPUs from going offline using preempt_disable().
> 
> So the intent here is to provide synchronization APIs for such atomic hotplug
> readers, to prevent (any) CPUs from going offline, without depending on
> stop_machine() at the writer-side. The new APIs will look something like
> this:  get_online_cpus_atomic() and put_online_cpus_atomic()
> 
> Some important design requirements and considerations:
> -----------------------------------------------------
> 
> 1. Scalable synchronization at the reader-side, especially in the fast-path
> 
>    Any synchronization at the atomic hotplug readers side must be highly
>    scalable - avoid global single-holder locks/counters etc. Because, these
>    paths currently use the extremely fast preempt_disable(); our replacement
>    to preempt_disable() should not become ridiculously costly and also should
>    not serialize the readers among themselves needlessly.
> 
>    At a minimum, the new APIs must be extremely fast at the reader side
>    atleast in the fast-path, when no CPU offline writers are active.
> 
> 2. preempt_disable() was recursive. The replacement should also be recursive.
> 
> 3. No (new) lock-ordering restrictions
> 
>    preempt_disable() was super-flexible. It didn't impose any ordering
>    restrictions or rules for nesting. Our replacement should also be equally
>    flexible and usable.
> 
> 4. No deadlock possibilities
> 
>    Regular per-cpu locking is not the way to go if we want to have relaxed
>    rules for lock-ordering. Because, we can end up in circular-locking
>    dependencies as explained in https://lkml.org/lkml/2012/12/6/290
> 
>    So, avoid the usual per-cpu locking schemes (per-cpu locks/per-cpu atomic
>    counters with spin-on-contention etc) as much as possible, to avoid
>    numerous deadlock possibilities from creeping in.
> 
> 
> Implementation of the design:
> ----------------------------
> 
> We use per-CPU reader-writer locks for synchronization because:
> 
>   a. They are quite fast and scalable in the fast-path (when no writers are
>      active), since they use fast per-cpu counters in those paths.
> 
>   b. They are recursive at the reader side.
> 
>   c. They provide a good amount of safety against deadlocks; they don't
>      spring new deadlock possibilities on us from out of nowhere. As a
>      result, they have relaxed locking rules and are quite flexible, and
>      thus are best suited for replacing usages of preempt_disable() or
>      local_irq_disable() at the reader side.
> 
> Together, these satisfy all the requirements mentioned above.
> 
> I'm indebted to Michael Wang and Xiao Guangrong for their numerous thoughtful
> suggestions and ideas, which inspired and influenced many of the decisions in
> this as well as previous designs. Thanks a lot Michael and Xiao!
> 
> Cc: Russell King <linux@arm.linux.org.uk>
> Cc: Mike Frysinger <vapier@gentoo.org>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Ralf Baechle <ralf@linux-mips.org>
> Cc: David Howells <dhowells@redhat.com>
> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Cc: Paul Mundt <lethal@linux-sh.org>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: x86@kernel.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: uclinux-dist-devel@blackfin.uclinux.org
> Cc: linux-ia64@vger.kernel.org
> Cc: linux-mips@linux-mips.org
> Cc: linux-am33-list@redhat.com
> Cc: linux-parisc@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-s390@vger.kernel.org
> Cc: linux-sh@vger.kernel.org
> Cc: sparclinux@vger.kernel.org
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

With the change suggested by Namhyung:

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> ---
> 
>  arch/arm/Kconfig      |    1 +
>  arch/blackfin/Kconfig |    1 +
>  arch/ia64/Kconfig     |    1 +
>  arch/mips/Kconfig     |    1 +
>  arch/mn10300/Kconfig  |    1 +
>  arch/parisc/Kconfig   |    1 +
>  arch/powerpc/Kconfig  |    1 +
>  arch/s390/Kconfig     |    1 +
>  arch/sh/Kconfig       |    1 +
>  arch/sparc/Kconfig    |    1 +
>  arch/x86/Kconfig      |    1 +
>  include/linux/cpu.h   |    4 +++
>  kernel/cpu.c          |   57 ++++++++++++++++++++++++++++++++++++++++++++++---
>  13 files changed, 69 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index 67874b8..cb6b94b 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -1616,6 +1616,7 @@ config NR_CPUS
>  config HOTPLUG_CPU
>  	bool "Support for hot-pluggable CPUs"
>  	depends on SMP && HOTPLUG
> +	select PERCPU_RWLOCK
>  	help
>  	  Say Y here to experiment with turning CPUs off and on.  CPUs
>  	  can be controlled through /sys/devices/system/cpu.
> diff --git a/arch/blackfin/Kconfig b/arch/blackfin/Kconfig
> index b6f3ad5..83d9882 100644
> --- a/arch/blackfin/Kconfig
> +++ b/arch/blackfin/Kconfig
> @@ -261,6 +261,7 @@ config NR_CPUS
>  config HOTPLUG_CPU
>  	bool "Support for hot-pluggable CPUs"
>  	depends on SMP && HOTPLUG
> +	select PERCPU_RWLOCK
>  	default y
> 
>  config BF_REV_MIN
> diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
> index 3279646..c246772 100644
> --- a/arch/ia64/Kconfig
> +++ b/arch/ia64/Kconfig
> @@ -378,6 +378,7 @@ config HOTPLUG_CPU
>  	bool "Support for hot-pluggable CPUs (EXPERIMENTAL)"
>  	depends on SMP && EXPERIMENTAL
>  	select HOTPLUG
> +	select PERCPU_RWLOCK
>  	default n
>  	---help---
>  	  Say Y here to experiment with turning CPUs off and on.  CPUs
> diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
> index 2ac626a..f97c479 100644
> --- a/arch/mips/Kconfig
> +++ b/arch/mips/Kconfig
> @@ -956,6 +956,7 @@ config SYS_HAS_EARLY_PRINTK
>  config HOTPLUG_CPU
>  	bool "Support for hot-pluggable CPUs"
>  	depends on SMP && HOTPLUG && SYS_SUPPORTS_HOTPLUG_CPU
> +	select PERCPU_RWLOCK
>  	help
>  	  Say Y here to allow turning CPUs off and on. CPUs can be
>  	  controlled through /sys/devices/system/cpu.
> diff --git a/arch/mn10300/Kconfig b/arch/mn10300/Kconfig
> index e70001c..a64e488 100644
> --- a/arch/mn10300/Kconfig
> +++ b/arch/mn10300/Kconfig
> @@ -60,6 +60,7 @@ config ARCH_HAS_ILOG2_U32
> 
>  config HOTPLUG_CPU
>  	def_bool n
> +	select PERCPU_RWLOCK
> 
>  source "init/Kconfig"
> 
> diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
> index b77feff..6f55cd4 100644
> --- a/arch/parisc/Kconfig
> +++ b/arch/parisc/Kconfig
> @@ -226,6 +226,7 @@ config HOTPLUG_CPU
>  	bool
>  	default y if SMP
>  	select HOTPLUG
> +	select PERCPU_RWLOCK
> 
>  config ARCH_SELECT_MEMORY_MODEL
>  	def_bool y
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 17903f1..56b1f15 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -336,6 +336,7 @@ config HOTPLUG_CPU
>  	bool "Support for enabling/disabling CPUs"
>  	depends on SMP && HOTPLUG && EXPERIMENTAL && (PPC_PSERIES || \
>  	PPC_PMAC || PPC_POWERNV || (PPC_85xx && !PPC_E500MC))
> +	select PERCPU_RWLOCK
>  	---help---
>  	  Say Y here to be able to disable and re-enable individual
>  	  CPUs at runtime on SMP machines.
> diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
> index b5ea38c..a9aafb4 100644
> --- a/arch/s390/Kconfig
> +++ b/arch/s390/Kconfig
> @@ -299,6 +299,7 @@ config HOTPLUG_CPU
>  	prompt "Support for hot-pluggable CPUs"
>  	depends on SMP
>  	select HOTPLUG
> +	select PERCPU_RWLOCK
>  	help
>  	  Say Y here to be able to turn CPUs off and on. CPUs
>  	  can be controlled through /sys/devices/system/cpu/cpu#.
> diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
> index babc2b8..8c92eef 100644
> --- a/arch/sh/Kconfig
> +++ b/arch/sh/Kconfig
> @@ -765,6 +765,7 @@ config NR_CPUS
>  config HOTPLUG_CPU
>  	bool "Support for hot-pluggable CPUs (EXPERIMENTAL)"
>  	depends on SMP && HOTPLUG && EXPERIMENTAL
> +	select PERCPU_RWLOCK
>  	help
>  	  Say Y here to experiment with turning CPUs off and on.  CPUs
>  	  can be controlled through /sys/devices/system/cpu.
> diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
> index 9f2edb5..e2bd573 100644
> --- a/arch/sparc/Kconfig
> +++ b/arch/sparc/Kconfig
> @@ -253,6 +253,7 @@ config HOTPLUG_CPU
>  	bool "Support for hot-pluggable CPUs"
>  	depends on SPARC64 && SMP
>  	select HOTPLUG
> +	select PERCPU_RWLOCK
>  	help
>  	  Say Y here to experiment with turning CPUs off and on.  CPUs
>  	  can be controlled through /sys/devices/system/cpu/cpu#.
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 79795af..a225d12 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1689,6 +1689,7 @@ config PHYSICAL_ALIGN
>  config HOTPLUG_CPU
>  	bool "Support for hot-pluggable CPUs"
>  	depends on SMP && HOTPLUG
> +	select PERCPU_RWLOCK
>  	---help---
>  	  Say Y here to allow turning CPUs off and on. CPUs can be
>  	  controlled through /sys/devices/system/cpu.
> diff --git a/include/linux/cpu.h b/include/linux/cpu.h
> index ce7a074..cf24da1 100644
> --- a/include/linux/cpu.h
> +++ b/include/linux/cpu.h
> @@ -175,6 +175,8 @@ extern struct bus_type cpu_subsys;
> 
>  extern void get_online_cpus(void);
>  extern void put_online_cpus(void);
> +extern void get_online_cpus_atomic(void);
> +extern void put_online_cpus_atomic(void);
>  #define hotcpu_notifier(fn, pri)	cpu_notifier(fn, pri)
>  #define register_hotcpu_notifier(nb)	register_cpu_notifier(nb)
>  #define unregister_hotcpu_notifier(nb)	unregister_cpu_notifier(nb)
> @@ -198,6 +200,8 @@ static inline void cpu_hotplug_driver_unlock(void)
> 
>  #define get_online_cpus()	do { } while (0)
>  #define put_online_cpus()	do { } while (0)
> +#define get_online_cpus_atomic()	do { } while (0)
> +#define put_online_cpus_atomic()	do { } while (0)
>  #define hotcpu_notifier(fn, pri)	do { (void)(fn); } while (0)
>  /* These aren't inline functions due to a GCC bug. */
>  #define register_hotcpu_notifier(nb)	({ (void)(nb); 0; })
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 3046a50..1c84138 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -1,6 +1,18 @@
>  /* CPU control.
>   * (C) 2001, 2002, 2003, 2004 Rusty Russell
>   *
> + * Rework of the CPU hotplug offline mechanism to remove its dependence on
> + * the heavy-weight stop_machine() primitive, by Srivatsa S. Bhat and
> + * Paul E. McKenney.
> + *
> + * Copyright (C) IBM Corporation, 2012-2013
> + * Authors: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> + *          Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> + *
> + * With lots of invaluable suggestions from:
> + *	    Oleg Nesterov <oleg@redhat.com>
> + *	    Tejun Heo <tj@kernel.org>
> + *
>   * This code is licenced under the GPL.
>   */
>  #include <linux/proc_fs.h>
> @@ -19,6 +31,7 @@
>  #include <linux/mutex.h>
>  #include <linux/gfp.h>
>  #include <linux/suspend.h>
> +#include <linux/percpu-rwlock.h>
> 
>  #include "smpboot.h"
> 
> @@ -133,6 +146,38 @@ static void cpu_hotplug_done(void)
>  	mutex_unlock(&cpu_hotplug.lock);
>  }
> 
> +/*
> + * Per-CPU Reader-Writer lock to synchronize between atomic hotplug
> + * readers and the CPU offline hotplug writer.
> + */
> +DEFINE_STATIC_PERCPU_RWLOCK(hotplug_pcpu_rwlock);
> +
> +/*
> + * Invoked by atomic hotplug reader (a task which wants to prevent
> + * CPU offline, but which can't afford to sleep), to prevent CPUs from
> + * going offline. So, you can call this function from atomic contexts
> + * (including interrupt handlers).
> + *
> + * Note: This does NOT prevent CPUs from coming online! It only prevents
> + * CPUs from going offline.
> + *
> + * You can call this function recursively.
> + *
> + * Returns with preemption disabled (but interrupts remain as they are;
> + * they are not disabled).
> + */
> +void get_online_cpus_atomic(void)
> +{
> +	percpu_read_lock_irqsafe(&hotplug_pcpu_rwlock);
> +}
> +EXPORT_SYMBOL_GPL(get_online_cpus_atomic);
> +
> +void put_online_cpus_atomic(void)
> +{
> +	percpu_read_unlock_irqsafe(&hotplug_pcpu_rwlock);
> +}
> +EXPORT_SYMBOL_GPL(put_online_cpus_atomic);
> +
>  #else /* #if CONFIG_HOTPLUG_CPU */
>  static void cpu_hotplug_begin(void) {}
>  static void cpu_hotplug_done(void) {}
> @@ -246,15 +291,21 @@ struct take_cpu_down_param {
>  static int __ref take_cpu_down(void *_param)
>  {
>  	struct take_cpu_down_param *param = _param;
> -	int err;
> +	unsigned long flags;
> +	int err = 0;
> +
> +	percpu_write_lock_irqsave(&hotplug_pcpu_rwlock, &flags);
> 
>  	/* Ensure this CPU doesn't handle any more interrupts. */
>  	err = __cpu_disable();
>  	if (err < 0)
> -		return err;
> +		goto out;
> 
>  	cpu_notify(CPU_DYING | param->mod, param->hcpu);
> -	return 0;
> +
> +out:
> +	percpu_write_unlock_irqrestore(&hotplug_pcpu_rwlock, &flags);
> +	return err;
>  }
> 
>  /* Requires cpu_add_remove_lock to be held */
> 

^ permalink raw reply

* Re: [PATCH v5 06/45] percpu_rwlock: Allow writers to be readers, and add lockdep annotations
From: Paul E. McKenney @ 2013-02-08 23:47 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: linux-doc, peterz, fweisbec, linux-kernel, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, nikunj, linux-pm, rusty, rostedt,
	rjw, namhyung, tglx, linux-arm-kernel, netdev, oleg, sbw, tj,
	akpm, linuxppc-dev
In-Reply-To: <20130122073416.13822.96504.stgit@srivatsabhat.in.ibm.com>

On Tue, Jan 22, 2013 at 01:04:23PM +0530, Srivatsa S. Bhat wrote:
> CPU hotplug (which will be the first user of per-CPU rwlocks) has a special
> requirement with respect to locking: the writer, after acquiring the per-CPU
> rwlock for write, must be allowed to take the same lock for read, without
> deadlocking and without getting complaints from lockdep. In comparison, this
> is similar to what get_online_cpus()/put_online_cpus() does today: it allows
> a hotplug writer (who holds the cpu_hotplug.lock mutex) to invoke it without
> locking issues, because it silently returns if the caller is the hotplug
> writer itself.
> 
> This can be easily achieved with per-CPU rwlocks as well (even without a
> "is this a writer?" check) by incrementing the per-CPU refcount of the writer
> immediately after taking the global rwlock for write, and then decrementing
> the per-CPU refcount before releasing the global rwlock.
> This ensures that any reader that comes along on that CPU while the writer is
> active (on that same CPU), notices the non-zero value of the nested counter
> and assumes that it is a nested read-side critical section and proceeds by
> just incrementing the refcount. Thus we prevent the reader from taking the
> global rwlock for read, which prevents the writer from deadlocking itself.
> 
> Add that support and teach lockdep about this special locking scheme so
> that it knows that this sort of usage is valid. Also add the required lockdep
> annotations to enable it to detect common locking problems with per-CPU
> rwlocks.

Very nice!  The write-side interrupt disabling ensures that the task
stays on CPU, as required.

One request: Could we please have a comment explaining the reasons for
the writer incrementing and decrementing the reader reference count?

It looked really really strange to me until I came back and read the
commit log.  ;-)

							Thanx, Paul

> Cc: David Howells <dhowells@redhat.com>
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> ---
> 
>  lib/percpu-rwlock.c |   21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
> 
> diff --git a/lib/percpu-rwlock.c b/lib/percpu-rwlock.c
> index a8d177a..054a50a 100644
> --- a/lib/percpu-rwlock.c
> +++ b/lib/percpu-rwlock.c
> @@ -84,6 +84,10 @@ void percpu_read_lock_irqsafe(struct percpu_rwlock *pcpu_rwlock)
> 
>  		if (likely(!writer_active(pcpu_rwlock))) {
>  			this_cpu_inc(*pcpu_rwlock->reader_refcnt);
> +
> +			/* Pretend that we take global_rwlock for lockdep */
> +			rwlock_acquire_read(&pcpu_rwlock->global_rwlock.dep_map,
> +					    0, 0, _RET_IP_);
>  		} else {
>  			/* Writer is active, so switch to global rwlock. */
> 
> @@ -108,6 +112,12 @@ void percpu_read_lock_irqsafe(struct percpu_rwlock *pcpu_rwlock)
>  			if (!writer_active(pcpu_rwlock)) {
>  				this_cpu_inc(*pcpu_rwlock->reader_refcnt);
>  				read_unlock(&pcpu_rwlock->global_rwlock);
> +
> +				/*
> +				 * Pretend that we take global_rwlock for lockdep
> +				 */
> +				rwlock_acquire_read(&pcpu_rwlock->global_rwlock.dep_map,
> +						    0, 0, _RET_IP_);
>  			}
>  		}
>  	}
> @@ -128,6 +138,14 @@ void percpu_read_unlock_irqsafe(struct percpu_rwlock *pcpu_rwlock)
>  	if (reader_nested_percpu(pcpu_rwlock)) {
>  		this_cpu_dec(*pcpu_rwlock->reader_refcnt);
>  		smp_wmb(); /* Paired with smp_rmb() in sync_reader() */
> +
> +		/*
> +		 * If this is the last decrement, then it is time to pretend
> +		 * to lockdep that we are releasing the read lock.
> +		 */
> +		if (!reader_nested_percpu(pcpu_rwlock))
> +			rwlock_release(&pcpu_rwlock->global_rwlock.dep_map,
> +				       1, _RET_IP_);
>  	} else {
>  		read_unlock(&pcpu_rwlock->global_rwlock);
>  	}
> @@ -205,11 +223,14 @@ void percpu_write_lock_irqsave(struct percpu_rwlock *pcpu_rwlock,
>  	announce_writer_active(pcpu_rwlock);
>  	sync_all_readers(pcpu_rwlock);
>  	write_lock_irqsave(&pcpu_rwlock->global_rwlock, *flags);
> +	this_cpu_inc(*pcpu_rwlock->reader_refcnt);
>  }
> 
>  void percpu_write_unlock_irqrestore(struct percpu_rwlock *pcpu_rwlock,
>  			 unsigned long *flags)
>  {
> +	this_cpu_dec(*pcpu_rwlock->reader_refcnt);
> +
>  	/*
>  	 * Inform all readers that we are done, so that they can switch back
>  	 * to their per-cpu refcounts. (We don't need to wait for them to
> 

^ permalink raw reply

* Re: [PATCH v5 05/45] percpu_rwlock: Make percpu-rwlocks IRQ-safe, optimally
From: Paul E. McKenney @ 2013-02-08 23:44 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: linux-doc, peterz, fweisbec, linux-kernel, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, nikunj, linux-pm, rusty, rostedt,
	rjw, namhyung, tglx, linux-arm-kernel, netdev, oleg, sbw, tj,
	akpm, linuxppc-dev
In-Reply-To: <20130122073400.13822.52336.stgit@srivatsabhat.in.ibm.com>

On Tue, Jan 22, 2013 at 01:04:11PM +0530, Srivatsa S. Bhat wrote:
> If interrupt handlers can also be readers, then one of the ways to make
> per-CPU rwlocks safe, is to disable interrupts at the reader side before
> trying to acquire the per-CPU rwlock and keep it disabled throughout the
> duration of the read-side critical section.
> 
> The goal is to avoid cases such as:
> 
>   1. writer is active and it holds the global rwlock for write
> 
>   2. a regular reader comes in and marks itself as present (by incrementing
>      its per-CPU refcount) before checking whether writer is active.
> 
>   3. an interrupt hits the reader;
>      [If it had not hit, the reader would have noticed that the writer is
>       active and would have decremented its refcount and would have tried
>       to acquire the global rwlock for read].
>      Since the interrupt handler also happens to be a reader, it notices
>      the non-zero refcount (which was due to the reader who got interrupted)
>      and thinks that this is a nested read-side critical section and
>      proceeds to take the fastpath, which is wrong. The interrupt handler
>      should have noticed that the writer is active and taken the rwlock
>      for read.
> 
> So, disabling interrupts can help avoid this problem (at the cost of keeping
> the interrupts disabled for quite long).
> 
> But Oleg had a brilliant idea by which we can do much better than that:
> we can manage with disabling interrupts _just_ during the updates (writes to
> per-CPU refcounts) to safe-guard against races with interrupt handlers.
> Beyond that, we can keep the interrupts enabled and still be safe w.r.t
> interrupt handlers that can act as readers.
> 
> Basically the idea is that we differentiate between the *part* of the
> per-CPU refcount that we use for reference counting vs the part that we use
> merely to make the writer wait for us to switch over to the right
> synchronization scheme.
> 
> The scheme involves splitting the per-CPU refcounts into 2 parts:
> eg: the lower 16 bits are used to track the nesting depth of the reader
> (a "nested-counter"), and the remaining (upper) bits are used to merely mark
> the presence of the reader.
> 
> As long as the overall reader_refcnt is non-zero, the writer waits for the
> reader (assuming that the reader is still actively using per-CPU refcounts for
> synchronization).
> 
> The reader first sets one of the higher bits to mark its presence, and then
> uses the lower 16 bits to manage the nesting depth. So, an interrupt handler
> coming in as illustrated above will be able to distinguish between "this is
> a nested read-side critical section" vs "we have merely marked our presence
> to make the writer wait for us to switch" by looking at the same refcount.
> Thus, it makes it unnecessary to keep interrupts disabled throughout the
> read-side critical section, despite having the possibility of interrupt
> handlers being readers themselves.
> 
> 
> Implement this logic and rename the locking functions appropriately, to
> reflect what they do.

One nit below.  The issues called out in the previous patch still seem
to me to apply.

							Thanx, Paul

> Based-on-idea-by: Oleg Nesterov <oleg@redhat.com>
> Cc: David Howells <dhowells@redhat.com>
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> ---
> 
>  include/linux/percpu-rwlock.h |   15 ++++++++++-----
>  lib/percpu-rwlock.c           |   41 +++++++++++++++++++++++++++--------------
>  2 files changed, 37 insertions(+), 19 deletions(-)
> 
> diff --git a/include/linux/percpu-rwlock.h b/include/linux/percpu-rwlock.h
> index 6819bb8..856ba6b 100644
> --- a/include/linux/percpu-rwlock.h
> +++ b/include/linux/percpu-rwlock.h
> @@ -34,11 +34,13 @@ struct percpu_rwlock {
>  	rwlock_t		global_rwlock;
>  };
> 
> -extern void percpu_read_lock(struct percpu_rwlock *);
> -extern void percpu_read_unlock(struct percpu_rwlock *);
> +extern void percpu_read_lock_irqsafe(struct percpu_rwlock *);
> +extern void percpu_read_unlock_irqsafe(struct percpu_rwlock *);
> 
> -extern void percpu_write_lock(struct percpu_rwlock *);
> -extern void percpu_write_unlock(struct percpu_rwlock *);
> +extern void percpu_write_lock_irqsave(struct percpu_rwlock *,
> +				      unsigned long *flags);
> +extern void percpu_write_unlock_irqrestore(struct percpu_rwlock *,
> +					   unsigned long *flags);
> 
>  extern int __percpu_init_rwlock(struct percpu_rwlock *,
>  				const char *, struct lock_class_key *);
> @@ -68,11 +70,14 @@ extern void percpu_free_rwlock(struct percpu_rwlock *);
>  	__percpu_init_rwlock(pcpu_rwlock, #pcpu_rwlock, &rwlock_key);	\
>  })
> 
> +#define READER_PRESENT		(1UL << 16)
> +#define READER_REFCNT_MASK	(READER_PRESENT - 1)
> +
>  #define reader_uses_percpu_refcnt(pcpu_rwlock, cpu)			\
>  		(ACCESS_ONCE(per_cpu(*((pcpu_rwlock)->reader_refcnt), cpu)))
> 
>  #define reader_nested_percpu(pcpu_rwlock)				\
> -			(__this_cpu_read(*((pcpu_rwlock)->reader_refcnt)) > 1)
> +	(__this_cpu_read(*((pcpu_rwlock)->reader_refcnt)) & READER_REFCNT_MASK)
> 
>  #define writer_active(pcpu_rwlock)					\
>  			(__this_cpu_read(*((pcpu_rwlock)->writer_signal)))
> diff --git a/lib/percpu-rwlock.c b/lib/percpu-rwlock.c
> index 992da5c..a8d177a 100644
> --- a/lib/percpu-rwlock.c
> +++ b/lib/percpu-rwlock.c
> @@ -62,19 +62,19 @@ void percpu_free_rwlock(struct percpu_rwlock *pcpu_rwlock)
>  	pcpu_rwlock->writer_signal = NULL;
>  }
> 
> -void percpu_read_lock(struct percpu_rwlock *pcpu_rwlock)
> +void percpu_read_lock_irqsafe(struct percpu_rwlock *pcpu_rwlock)
>  {
>  	preempt_disable();
> 
>  	/* First and foremost, let the writer know that a reader is active */
> -	this_cpu_inc(*pcpu_rwlock->reader_refcnt);
> +	this_cpu_add(*pcpu_rwlock->reader_refcnt, READER_PRESENT);
> 
>  	/*
>  	 * If we are already using per-cpu refcounts, it is not safe to switch
>  	 * the synchronization scheme. So continue using the refcounts.
>  	 */
>  	if (reader_nested_percpu(pcpu_rwlock)) {
> -		goto out;
> +		this_cpu_inc(*pcpu_rwlock->reader_refcnt);

Hmmm...  If the reader is nested, it -doesn't- need the memory barrier at
the end of this function.  If there is lots of nesting, it might be
worth getting rid of it.

>  	} else {
>  		/*
>  		 * The write to 'reader_refcnt' must be visible before we
> @@ -83,9 +83,19 @@ void percpu_read_lock(struct percpu_rwlock *pcpu_rwlock)
>  		smp_mb(); /* Paired with smp_rmb() in sync_reader() */
> 
>  		if (likely(!writer_active(pcpu_rwlock))) {
> -			goto out;
> +			this_cpu_inc(*pcpu_rwlock->reader_refcnt);
>  		} else {
>  			/* Writer is active, so switch to global rwlock. */
> +
> +			/*
> +			 * While we are spinning on ->global_rwlock, an
> +			 * interrupt can hit us, and the interrupt handler
> +			 * might call this function. The distinction between
> +			 * READER_PRESENT and the refcnt helps ensure that the
> +			 * interrupt handler also takes this branch and spins
> +			 * on the ->global_rwlock, as long as the writer is
> +			 * active.
> +			 */
>  			read_lock(&pcpu_rwlock->global_rwlock);
> 
>  			/*
> @@ -95,26 +105,27 @@ void percpu_read_lock(struct percpu_rwlock *pcpu_rwlock)
>  			 * back to per-cpu refcounts. (This also helps avoid
>  			 * heterogeneous nesting of readers).
>  			 */
> -			if (writer_active(pcpu_rwlock))
> -				this_cpu_dec(*pcpu_rwlock->reader_refcnt);
> -			else
> +			if (!writer_active(pcpu_rwlock)) {
> +				this_cpu_inc(*pcpu_rwlock->reader_refcnt);
>  				read_unlock(&pcpu_rwlock->global_rwlock);
> +			}
>  		}
>  	}
> 
> -out:
> +	this_cpu_sub(*pcpu_rwlock->reader_refcnt, READER_PRESENT);
> +
>  	/* Prevent reordering of any subsequent reads */
>  	smp_rmb();
>  }
> 
> -void percpu_read_unlock(struct percpu_rwlock *pcpu_rwlock)
> +void percpu_read_unlock_irqsafe(struct percpu_rwlock *pcpu_rwlock)
>  {
>  	/*
>  	 * We never allow heterogeneous nesting of readers. So it is trivial
>  	 * to find out the kind of reader we are, and undo the operation
>  	 * done by our corresponding percpu_read_lock().
>  	 */
> -	if (__this_cpu_read(*pcpu_rwlock->reader_refcnt)) {
> +	if (reader_nested_percpu(pcpu_rwlock)) {
>  		this_cpu_dec(*pcpu_rwlock->reader_refcnt);
>  		smp_wmb(); /* Paired with smp_rmb() in sync_reader() */
>  	} else {
> @@ -184,7 +195,8 @@ static void sync_all_readers(struct percpu_rwlock *pcpu_rwlock)
>  		sync_reader(pcpu_rwlock, cpu);
>  }
> 
> -void percpu_write_lock(struct percpu_rwlock *pcpu_rwlock)
> +void percpu_write_lock_irqsave(struct percpu_rwlock *pcpu_rwlock,
> +			       unsigned long *flags)
>  {
>  	/*
>  	 * Tell all readers that a writer is becoming active, so that they
> @@ -192,10 +204,11 @@ void percpu_write_lock(struct percpu_rwlock *pcpu_rwlock)
>  	 */
>  	announce_writer_active(pcpu_rwlock);
>  	sync_all_readers(pcpu_rwlock);
> -	write_lock(&pcpu_rwlock->global_rwlock);
> +	write_lock_irqsave(&pcpu_rwlock->global_rwlock, *flags);
>  }
> 
> -void percpu_write_unlock(struct percpu_rwlock *pcpu_rwlock)
> +void percpu_write_unlock_irqrestore(struct percpu_rwlock *pcpu_rwlock,
> +			 unsigned long *flags)
>  {
>  	/*
>  	 * Inform all readers that we are done, so that they can switch back
> @@ -203,6 +216,6 @@ void percpu_write_unlock(struct percpu_rwlock *pcpu_rwlock)
>  	 * see it).
>  	 */
>  	announce_writer_inactive(pcpu_rwlock);
> -	write_unlock(&pcpu_rwlock->global_rwlock);
> +	write_unlock_irqrestore(&pcpu_rwlock->global_rwlock, *flags);
>  }
> 
> 

^ permalink raw reply

* PS3: Strange issue with kexec and FreeBSD loader
From: Phileas Fogg @ 2013-02-08 23:10 UTC (permalink / raw)
  To: linuxppc-dev

CkhpLAoKaSdtIHVzaW5nIE9wZW5XUlQgcGV0aXRib290IGJvb3Rsb2FkZXIgb24gbXkgUFMzIHRv
IGJvb3QgRnJlZUJTRCBsb2FkZXIgd2hpY2ggaXMgYSBzaW1wbGUgUFBDMzIgRUxGIGZpbGUuCkkg
aGF2ZW4ndCBoYWQgYW55IGlzc3VlcyB3aXRoIGl0IGFuZCBPcGVuV1JUIGJhc2VkIG9uIExpbnV4
IDMuMy44LgpSZWNlbnRseSBpIGJ1aWx0IGFuIE9wZW5XUlQgaW1hZ2Ugd2l0aCBMaW51eCAzLjcs
IGkgaGF2ZSBubyBpc3N1ZXMgYXQgYWxsIHdpdGgga2V4ZWMgYW5kIGFueSBMaW51eCBrZXJuZWxz
IHN0YXJ0aW5nIHdpdGggMi42IGJ1dApGcmVlQlNEIGxvYWRlciB3b24ndCBib290IGFuZCBqdXN0
IGhhbmdzLiBUaGUgc2FtZSBpc3N1ZSB3aXRoIE9wZW5XUlQgYmFzZWQgb24gTGludXggMy42IGtl
cm5lbC4KU28sIGkgc3RhcnRlZCB0byBhbmFseXplIHRoaXMgcHJvYmxlbSBhbmQgZm91bmQgb3V0
IHdoZXJlIGl0IGhhbmdzLgoKSXQgc2VlbXMgdGhhdCB0aGUgcHVyZ2F0b3J5IGNvZGUgZnJvbSBr
ZXhlYy10b29scyBsb29wcyBlbmRsZXNzbHkgaWYgU0hBMjU2IHZlcmlmaWNhdGlvbiBvZiB0aGUg
bG9hZGVkIHNlZ21lbnRzCmZhaWxzLgoKU2VlCsKgIGh0dHA6Ly9naXQua2VybmVsLm9yZy8/cD11
dGlscy9rZXJuZWwva2V4ZWMva2V4ZWMtdG9vbHMuZ2l0O2E9YmxvYl9wbGFpbjtmPXB1cmdhdG9y
eS9wdXJnYXRvcnkuYztoYj01NjZjYThhMTIxNDUxOTZiMDBhZDM3OTM5Y2ZkNThhOTdmOTZiYTg5
CgpCZWNhdXNlIHRoZSBmdW5jdGlvbiBfdmVyaWZ5X3NoYTI1Nl9kaWdlc3QgZmFpbHMsIHRoZSBm
dW5jdGlvbiBfcHVyZ2F0b3J5XyBsb29wcyBlbmRsZXNzbHkuClRoaXMgcHJvYmxlbSBvY2N1cnMg
b25seSB3aXRoIExpbnV4IDMuNiBvciBMaW51eCAzLjcgYW5kIEZyZWVCU0QgbG9hZGVyLgpJIGtp
bGxlZCB0aGUgZW5kbGVzcyBsb29wIGFuZCBjb3VsZCBib290IHRoZSBGcmVlQlNEIGxvYWRlciBv
biBMaW51eCAzLjcgdG9vLgoKQW55IGlkZWEgd2hhdCBjb3VsZCBjYXVzZSB0aGlzIHByb2JsZW0g
PwoKVGhhbmtzLgo=

^ permalink raw reply

* Re: [PATCH v5 04/45] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
From: Paul E. McKenney @ 2013-02-08 23:10 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: linux-doc, peterz, fweisbec, linux-kernel, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, nikunj, linux-pm, rusty, rostedt,
	rjw, namhyung, tglx, linux-arm-kernel, netdev, oleg, sbw, tj,
	akpm, linuxppc-dev
In-Reply-To: <20130122073347.13822.85876.stgit@srivatsabhat.in.ibm.com>

On Tue, Jan 22, 2013 at 01:03:53PM +0530, Srivatsa S. Bhat wrote:
> Using global rwlocks as the backend for per-CPU rwlocks helps us avoid many
> lock-ordering related problems (unlike per-cpu locks). However, global
> rwlocks lead to unnecessary cache-line bouncing even when there are no
> writers present, which can slow down the system needlessly.
> 
> Per-cpu counters can help solve the cache-line bouncing problem. So we
> actually use the best of both: per-cpu counters (no-waiting) at the reader
> side in the fast-path, and global rwlocks in the slowpath.
> 
> [ Fastpath = no writer is active; Slowpath = a writer is active ]
> 
> IOW, the readers just increment/decrement their per-cpu refcounts (disabling
> interrupts during the updates, if necessary) when no writer is active.
> When a writer becomes active, he signals all readers to switch to global
> rwlocks for the duration of his activity. The readers switch over when it
> is safe for them (ie., when they are about to start a fresh, non-nested
> read-side critical section) and start using (holding) the global rwlock for
> read in their subsequent critical sections.
> 
> The writer waits for every existing reader to switch, and then acquires the
> global rwlock for write and enters his critical section. Later, the writer
> signals all readers that he is done, and that they can go back to using their
> per-cpu refcounts again.
> 
> Note that the lock-safety (despite the per-cpu scheme) comes from the fact
> that the readers can *choose* _when_ to switch to rwlocks upon the writer's
> signal. And the readers don't wait on anybody based on the per-cpu counters.
> The only true synchronization that involves waiting at the reader-side in this
> scheme, is the one arising from the global rwlock, which is safe from circular
> locking dependency issues.
> 
> Reader-writer locks and per-cpu counters are recursive, so they can be
> used in a nested fashion in the reader-path, which makes per-CPU rwlocks also
> recursive. Also, this design of switching the synchronization scheme ensures
> that you can safely nest and use these locks in a very flexible manner.
> 
> I'm indebted to Michael Wang and Xiao Guangrong for their numerous thoughtful
> suggestions and ideas, which inspired and influenced many of the decisions in
> this as well as previous designs. Thanks a lot Michael and Xiao!

Looks pretty close!  Some comments interspersed below.  Please either
fix the code or my confusion, as the case may be.  ;-)

							Thanx, Paul

> Cc: David Howells <dhowells@redhat.com>
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> ---
> 
>  include/linux/percpu-rwlock.h |   10 +++
>  lib/percpu-rwlock.c           |  128 ++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 136 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/percpu-rwlock.h b/include/linux/percpu-rwlock.h
> index 8dec8fe..6819bb8 100644
> --- a/include/linux/percpu-rwlock.h
> +++ b/include/linux/percpu-rwlock.h
> @@ -68,4 +68,14 @@ extern void percpu_free_rwlock(struct percpu_rwlock *);
>  	__percpu_init_rwlock(pcpu_rwlock, #pcpu_rwlock, &rwlock_key);	\
>  })
> 
> +#define reader_uses_percpu_refcnt(pcpu_rwlock, cpu)			\
> +		(ACCESS_ONCE(per_cpu(*((pcpu_rwlock)->reader_refcnt), cpu)))
> +
> +#define reader_nested_percpu(pcpu_rwlock)				\
> +			(__this_cpu_read(*((pcpu_rwlock)->reader_refcnt)) > 1)
> +
> +#define writer_active(pcpu_rwlock)					\
> +			(__this_cpu_read(*((pcpu_rwlock)->writer_signal)))
> +
>  #endif
> +
> diff --git a/lib/percpu-rwlock.c b/lib/percpu-rwlock.c
> index 80dad93..992da5c 100644
> --- a/lib/percpu-rwlock.c
> +++ b/lib/percpu-rwlock.c
> @@ -64,21 +64,145 @@ void percpu_free_rwlock(struct percpu_rwlock *pcpu_rwlock)
> 
>  void percpu_read_lock(struct percpu_rwlock *pcpu_rwlock)
>  {
> -	read_lock(&pcpu_rwlock->global_rwlock);
> +	preempt_disable();
> +
> +	/* First and foremost, let the writer know that a reader is active */
> +	this_cpu_inc(*pcpu_rwlock->reader_refcnt);
> +
> +	/*
> +	 * If we are already using per-cpu refcounts, it is not safe to switch
> +	 * the synchronization scheme. So continue using the refcounts.
> +	 */
> +	if (reader_nested_percpu(pcpu_rwlock)) {
> +		goto out;
> +	} else {
> +		/*
> +		 * The write to 'reader_refcnt' must be visible before we
> +		 * read 'writer_signal'.
> +		 */
> +		smp_mb(); /* Paired with smp_rmb() in sync_reader() */
> +
> +		if (likely(!writer_active(pcpu_rwlock))) {
> +			goto out;
> +		} else {
> +			/* Writer is active, so switch to global rwlock. */
> +			read_lock(&pcpu_rwlock->global_rwlock);
> +
> +			/*
> +			 * We might have raced with a writer going inactive
> +			 * before we took the read-lock. So re-evaluate whether
> +			 * we still need to hold the rwlock or if we can switch
> +			 * back to per-cpu refcounts. (This also helps avoid
> +			 * heterogeneous nesting of readers).
> +			 */
> +			if (writer_active(pcpu_rwlock))

The above writer_active() can be reordered with the following this_cpu_dec(),
strange though it might seem.  But this is OK because holding the rwlock
is conservative.  But might be worth a comment.

> +				this_cpu_dec(*pcpu_rwlock->reader_refcnt);
> +			else

In contrast, no reordering can happen here because read_unlock() is
required to keep the critical section underneath the lock.

> +				read_unlock(&pcpu_rwlock->global_rwlock);
> +		}
> +	}
> +
> +out:
> +	/* Prevent reordering of any subsequent reads */
> +	smp_rmb();

This should be smp_mb().  "Readers" really can do writes.  Hence the
name lglock -- "local/global" rather than "reader/writer".

>  }
> 
>  void percpu_read_unlock(struct percpu_rwlock *pcpu_rwlock)
>  {
> -	read_unlock(&pcpu_rwlock->global_rwlock);

We need an smp_mb() here to keep the critical section ordered before the
this_cpu_dec() below.  Otherwise, if a writer shows up just after we
exit the fastpath, that writer is not guaranteed to see the effects of
our critical section.  Equivalently, the prior read-side critical section
just might see some of the writer's updates, which could be a bit of
a surprise to the reader.

> +	/*
> +	 * We never allow heterogeneous nesting of readers. So it is trivial
> +	 * to find out the kind of reader we are, and undo the operation
> +	 * done by our corresponding percpu_read_lock().
> +	 */
> +	if (__this_cpu_read(*pcpu_rwlock->reader_refcnt)) {
> +		this_cpu_dec(*pcpu_rwlock->reader_refcnt);
> +		smp_wmb(); /* Paired with smp_rmb() in sync_reader() */

Given an smp_mb() above, I don't understand the need for this smp_wmb().
Isn't the idea that if the writer sees ->reader_refcnt decremented to
zero, it also needs to see the effects of the corresponding reader's
critical section?

Or am I missing something subtle here?  In any case, if this smp_wmb()
really is needed, there should be some subsequent write that the writer
might observe.  From what I can see, there is no subsequent write from
this reader that the writer cares about.

> +	} else {
> +		read_unlock(&pcpu_rwlock->global_rwlock);
> +	}
> +
> +	preempt_enable();
> +}
> +
> +static inline void raise_writer_signal(struct percpu_rwlock *pcpu_rwlock,
> +				       unsigned int cpu)
> +{
> +	per_cpu(*pcpu_rwlock->writer_signal, cpu) = true;
> +}
> +
> +static inline void drop_writer_signal(struct percpu_rwlock *pcpu_rwlock,
> +				      unsigned int cpu)
> +{
> +	per_cpu(*pcpu_rwlock->writer_signal, cpu) = false;
> +}
> +
> +static void announce_writer_active(struct percpu_rwlock *pcpu_rwlock)
> +{
> +	unsigned int cpu;
> +
> +	for_each_online_cpu(cpu)
> +		raise_writer_signal(pcpu_rwlock, cpu);
> +
> +	smp_mb(); /* Paired with smp_rmb() in percpu_read_[un]lock() */
> +}
> +
> +static void announce_writer_inactive(struct percpu_rwlock *pcpu_rwlock)
> +{
> +	unsigned int cpu;
> +
> +	drop_writer_signal(pcpu_rwlock, smp_processor_id());

Why do we drop ourselves twice?  More to the point, why is it important to
drop ourselves first?

> +
> +	for_each_online_cpu(cpu)
> +		drop_writer_signal(pcpu_rwlock, cpu);
> +
> +	smp_mb(); /* Paired with smp_rmb() in percpu_read_[un]lock() */
> +}
> +
> +/*
> + * Wait for the reader to see the writer's signal and switch from percpu
> + * refcounts to global rwlock.
> + *
> + * If the reader is still using percpu refcounts, wait for him to switch.
> + * Else, we can safely go ahead, because either the reader has already
> + * switched over, or the next reader that comes along on that CPU will
> + * notice the writer's signal and will switch over to the rwlock.
> + */
> +static inline void sync_reader(struct percpu_rwlock *pcpu_rwlock,
> +			       unsigned int cpu)
> +{
> +	smp_rmb(); /* Paired with smp_[w]mb() in percpu_read_[un]lock() */

As I understand it, the purpose of this memory barrier is to ensure
that the stores in drop_writer_signal() happen before the reads from
->reader_refcnt in reader_uses_percpu_refcnt(), thus preventing the
race between a new reader attempting to use the fastpath and this writer
acquiring the lock.  Unless I am confused, this must be smp_mb() rather
than smp_rmb().

Also, why not just have a single smp_mb() at the beginning of
sync_all_readers() instead of executing one barrier per CPU?

> +
> +	while (reader_uses_percpu_refcnt(pcpu_rwlock, cpu))
> +		cpu_relax();
> +}
> +
> +static void sync_all_readers(struct percpu_rwlock *pcpu_rwlock)
> +{
> +	unsigned int cpu;
> +
> +	for_each_online_cpu(cpu)
> +		sync_reader(pcpu_rwlock, cpu);
>  }
> 
>  void percpu_write_lock(struct percpu_rwlock *pcpu_rwlock)
>  {
> +	/*
> +	 * Tell all readers that a writer is becoming active, so that they
> +	 * start switching over to the global rwlock.
> +	 */
> +	announce_writer_active(pcpu_rwlock);
> +	sync_all_readers(pcpu_rwlock);
>  	write_lock(&pcpu_rwlock->global_rwlock);
>  }
> 
>  void percpu_write_unlock(struct percpu_rwlock *pcpu_rwlock)
>  {
> +	/*
> +	 * Inform all readers that we are done, so that they can switch back
> +	 * to their per-cpu refcounts. (We don't need to wait for them to
> +	 * see it).
> +	 */
> +	announce_writer_inactive(pcpu_rwlock);
>  	write_unlock(&pcpu_rwlock->global_rwlock);
>  }
> 
> 

^ permalink raw reply

* Re: [PATCH v5 04/45] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
From: Paul E. McKenney @ 2013-02-08 22:47 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: linux-doc, peterz, fweisbec, linux-kernel, walken, mingo,
	linux-arch, linux, xiaoguangrong, wangyun, nikunj, linux-pm,
	rusty, rostedt, rjw, tglx, linux-arm-kernel, netdev, oleg, sbw,
	Srivatsa S. Bhat, Tejun Heo, akpm, linuxppc-dev
In-Reply-To: <87ip6gutsq.fsf@sejong.aot.lge.com>

On Tue, Jan 29, 2013 at 08:12:37PM +0900, Namhyung Kim wrote:
> On Thu, 24 Jan 2013 10:00:04 +0530, Srivatsa S. Bhat wrote:
> > On 01/24/2013 01:27 AM, Tejun Heo wrote:
> >> On Thu, Jan 24, 2013 at 01:03:52AM +0530, Srivatsa S. Bhat wrote:
> >>> CPU 0                          CPU 1
> >>>
> >>> read_lock(&rwlock)
> >>>
> >>>                               write_lock(&rwlock) //spins, because CPU 0
> >>>                               //has acquired the lock for read
> >>>
> >>> read_lock(&rwlock)
> >>>    ^^^^^
> >>> What happens here? Does CPU 0 start spinning (and hence deadlock) or will
> >>> it continue realizing that it already holds the rwlock for read?
> >> 
> >> I don't think rwlock allows nesting write lock inside read lock.
> >> read_lock(); write_lock() will always deadlock.
> >> 
> >
> > Sure, I understand that :-) My question was, what happens when *two* CPUs
> > are involved, as in, the read_lock() is invoked only on CPU 0 whereas the
> > write_lock() is invoked on CPU 1.
> >
> > For example, the same scenario shown above, but with slightly different
> > timing, will NOT result in a deadlock:
> >
> > Scenario 2:
> >   CPU 0                                CPU 1
> >
> > read_lock(&rwlock)
> >
> >
> > read_lock(&rwlock) //doesn't spin
> >
> >                                     write_lock(&rwlock) //spins, because CPU 0
> >                                     //has acquired the lock for read
> >
> >
> > So I was wondering whether the "fairness" logic of rwlocks would cause
> > the second read_lock() to spin (in the first scenario shown above) because
> > a writer is already waiting (and hence new readers should spin) and thus
> > cause a deadlock.
> 
> In my understanding, current x86 rwlock does basically this (of course,
> in an atomic fashion):
> 
> 
> #define RW_LOCK_BIAS 0x10000
> 
> rwlock_init(rwlock)
> {
> 	rwlock->lock = RW_LOCK_BIAS;
> }
> 
> arch_read_lock(rwlock)
> {
> retry:
> 	if (--rwlock->lock >= 0)
> 		return;
> 
>         rwlock->lock++;
>         while (rwlock->lock < 1)
>         	continue;
> 
>         goto retry;
> }
> 
> arch_write_lock(rwlock)
> {
> retry:
> 	if ((rwlock->lock -= RW_LOCK_BIAS) == 0)
>         	return;
> 
>         rwlock->lock += RW_LOCK_BIAS;
> 	while (rwlock->lock != RW_LOCK_BIAS)
> 		continue;
> 
>         goto retry;
> }
> 
> 
> So I can't find where the 'fairness' logic comes from..

I looked through several of the rwlock implementations, and in all of
them the writer backs off if it sees readers -- or refrains from asserting
ownership of the lock to begin with.

So it should be OK to use rwlock as shown in the underlying patch.

							Thanx, Paul

^ permalink raw reply

* macintosh/windfarm: possible circular locking dependency detected
From: Aaro Koskinen @ 2013-02-08 21:44 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel, Benjamin Herrenschmidt

Hi,

On iMac G5 (the first model, windfarm_pm81 reports "model 3"), enabling
LOCKDEP results in the following warnings when overtemp condition is
encountered.

The kernel is 3.8-rc6+ / 6bacaa9.

[ 7060.058340] windfarm: Clamping CPU frequency to minimum !
[ 7060.229688] 
[ 7060.229756] ======================================================
[ 7060.229799] [ INFO: possible circular locking dependency detected ]
[ 7060.229847] 3.8.0-rc6-imac-00113-g6bacaa9 #2 Not tainted
[ 7060.229882] -------------------------------------------------------
[ 7060.229923] kwindfarm/475 is trying to acquire lock:
[ 7060.229958]  (wf_lock){+.+.+.}, at: [<c000000000447b54>] .wf_set_overtemp+0x34/0xb0
[ 7060.230070] 
[ 7060.230070] but task is already holding lock:
[ 7060.230111]  ((wf_client_list).rwsem){.+.+.+}, at: [<c0000000000851a8>] .__blocking_notifier_call_chain+0x78/0x110
[ 7060.230219] 
[ 7060.230219] which lock already depends on the new lock.
[ 7060.230219] 
[ 7060.230275] 
[ 7060.230275] the existing dependency chain (in reverse order) is:
[ 7060.230326] 
[ 7060.230326] -> #1 ((wf_client_list).rwsem){.+.+.+}:
[ 7060.230407]        [<c0000000006cbf98>] .down_read+0x58/0xa0
[ 7060.230463]        [<c0000000000851a8>] .__blocking_notifier_call_chain+0x78/0x110
[ 7060.230521]        [<c000000000447d70>] .wf_register_client+0xb0/0x1b0
[ 7060.230573]        [<c000000000449f0c>] .wf_smu_probe+0x1c/0x40
[ 7060.230621]        [<c00000000042e7d0>] .platform_drv_probe+0x30/0x50
[ 7060.230681]        [<c00000000042c868>] .driver_probe_device+0xa8/0x290
[ 7060.230735]        [<c00000000042cb5c>] .__driver_attach+0x10c/0x110
[ 7060.230787]        [<c00000000042a10c>] .bus_for_each_dev+0x7c/0xe0
[ 7060.230837]        [<c00000000042c1c8>] .driver_attach+0x28/0x40
[ 7060.230888]        [<c00000000042bba8>] .bus_add_driver+0x1f8/0x330
[ 7060.230937]        [<c00000000042d4f4>] .driver_register+0xa4/0x230
[ 7060.230989]        [<c00000000042eb10>] .platform_driver_register+0x60/0x80
[ 7060.231044]        [<c000000000949924>] .wf_smu_init+0x8c/0xa8
[ 7060.231098]        [<c000000000009aa8>] .do_one_initcall+0x168/0x1e0
[ 7060.231151]        [<c000000000927b70>] .kernel_init_freeable+0x134/0x204
[ 7060.231209]        [<c00000000000a1bc>] .kernel_init+0x1c/0x120
[ 7060.231256]        [<c000000000008aa0>] .ret_from_kernel_thread+0x64/0xc4
[ 7060.231311] 
[ 7060.231311] -> #0 (wf_lock){+.+.+.}:
[ 7060.231381]        [<c0000000000a3efc>] .lock_acquire+0x5c/0x90
[ 7060.231434]        [<c0000000006cb5b0>] .mutex_lock_nested+0x90/0x3b0
[ 7060.231488]        [<c000000000447b54>] .wf_set_overtemp+0x34/0xb0
[ 7060.231539]        [<c00000000044a844>] .wf_smu_notify+0x914/0xf90
[ 7060.231590]        [<c000000000084e2c>] .notifier_call_chain+0xcc/0x1f0
[ 7060.231643]        [<c0000000000851c4>] .__blocking_notifier_call_chain+0x94/0x110
[ 7060.231699]        [<c000000000448a18>] .wf_thread_func+0x88/0x150
[ 7060.231747]        [<c00000000007c22c>] .kthread+0xec/0x100
[ 7060.231794]        [<c000000000008aa0>] .ret_from_kernel_thread+0x64/0xc4
[ 7060.231850] 
[ 7060.231850] other info that might help us debug this:
[ 7060.231850] 
[ 7060.231905]  Possible unsafe locking scenario:
[ 7060.231905] 
[ 7060.231945]        CPU0                    CPU1
[ 7060.231976]        ----                    ----
[ 7060.232007]   lock((wf_client_list).rwsem);
[ 7060.232054]                                lock(wf_lock);
[ 7060.232104]                                lock((wf_client_list).rwsem);
[ 7060.232160]   lock(wf_lock);
[ 7060.233707] 
[ 7060.233707]  *** DEADLOCK ***
[ 7060.233707] 
[ 7060.238238] 1 lock held by kwindfarm/475:
[ 7060.239756]  #0:  ((wf_client_list).rwsem){.+.+.+}, at: [<c0000000000851a8>] .__blocking_notifier_call_chain+0x78/0x110
[ 7060.241354] 
[ 7060.241354] stack backtrace:
[ 7060.244453] Call Trace:
[ 7060.246001] [c0000000564534f0] [c000000000012978] .show_stack+0x78/0x1b0 (unreliable)
[ 7060.247582] [c0000000564535a0] [c0000000006d09c8] .print_circular_bug+0x2e8/0x320
[ 7060.249157] [c000000056453650] [c0000000000a36c0] .__lock_acquire+0x17b0/0x1a70
[ 7060.250734] [c0000000564537d0] [c0000000000a3efc] .lock_acquire+0x5c/0x90
[ 7060.252305] [c000000056453870] [c0000000006cb5b0] .mutex_lock_nested+0x90/0x3b0
[ 7060.253879] [c000000056453960] [c000000000447b54] .wf_set_overtemp+0x34/0xb0
[ 7060.255460] [c0000000564539f0] [c00000000044a844] .wf_smu_notify+0x914/0xf90
[ 7060.257019] [c000000056453ad0] [c000000000084e2c] .notifier_call_chain+0xcc/0x1f0
[ 7060.258598] [c000000056453b80] [c0000000000851c4] .__blocking_notifier_call_chain+0x94/0x110
[ 7060.260169] [c000000056453c30] [c000000000448a18] .wf_thread_func+0x88/0x150
[ 7060.261725] [c000000056453cc0] [c00000000007c22c] .kthread+0xec/0x100
[ 7060.263267] [c000000056453e30] [c000000000008aa0] .ret_from_kernel_thread+0x64/0xc4
[ 7060.270273] windfarm: Overtemp condition detected !
[ 7062.033879] windfarm: CPU frequency unclamped !
[ 7062.035508] windfarm: Overtemp condition cleared !
[ 7221.058391] windfarm: Clamping CPU frequency to minimum !
[ 7221.196337] windfarm: Overtemp condition detected !
[ 7223.030000] windfarm: CPU frequency unclamped !
[ 7223.067554] windfarm: Overtemp condition cleared !

A.

^ permalink raw reply

* [PATCH][v4] PPC: add paravirt idle loop for 64-bit book E
From: Stuart Yoder @ 2013-02-08 19:22 UTC (permalink / raw)
  To: benh; +Cc: kvm-ppc, linuxppc-dev, agraf, kvm, Stuart Yoder

From: Stuart Yoder <stuart.yoder@freescale.com>

Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
---

-removed KVM prefix to patch subject, patch is not KVM specific

 arch/powerpc/kernel/epapr_hcalls.S |    2 ++
 arch/powerpc/kernel/idle_book3e.S  |   32 ++++++++++++++++++++++++++++++--
 2 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/epapr_hcalls.S b/arch/powerpc/kernel/epapr_hcalls.S
index 62c0dc2..9f1ebf7 100644
--- a/arch/powerpc/kernel/epapr_hcalls.S
+++ b/arch/powerpc/kernel/epapr_hcalls.S
@@ -17,6 +17,7 @@
 #include <asm/asm-compat.h>
 #include <asm/asm-offsets.h>
 
+#ifndef CONFIG_PPC64
 /* epapr_ev_idle() was derived from e500_idle() */
 _GLOBAL(epapr_ev_idle)
 	CURRENT_THREAD_INFO(r3, r1)
@@ -42,6 +43,7 @@ epapr_ev_idle_start:
 	 * _TLF_NAPPING.
 	 */
 	b	idle_loop
+#endif
 
 /* Hypercall entry point. Will be patched with device tree instructions. */
 .global epapr_hypercall_start
diff --git a/arch/powerpc/kernel/idle_book3e.S b/arch/powerpc/kernel/idle_book3e.S
index 4c7cb400..bfb73cc 100644
--- a/arch/powerpc/kernel/idle_book3e.S
+++ b/arch/powerpc/kernel/idle_book3e.S
@@ -16,11 +16,13 @@
 #include <asm/ppc-opcode.h>
 #include <asm/processor.h>
 #include <asm/thread_info.h>
+#include <asm/epapr_hcalls.h>
 
 /* 64-bit version only for now */
 #ifdef CONFIG_PPC64
 
-_GLOBAL(book3e_idle)
+.macro BOOK3E_IDLE name loop
+_GLOBAL(\name)
 	/* Save LR for later */
 	mflr	r0
 	std	r0,16(r1)
@@ -67,7 +69,33 @@ _GLOBAL(book3e_idle)
 
 	/* We can now re-enable hard interrupts and go to sleep */
 	wrteei	1
-1:	PPC_WAIT(0)
+	\loop
+
+.endm
+
+.macro BOOK3E_IDLE_LOOP
+1:
+	PPC_WAIT(0)
 	b	1b
+.endm
+
+/* epapr_ev_idle_start below is patched with the proper hcall
+   opcodes during kernel initialization */
+.macro EPAPR_EV_IDLE_LOOP
+idle_loop:
+	LOAD_REG_IMMEDIATE(r11, EV_HCALL_TOKEN(EV_IDLE))
+
+.global epapr_ev_idle_start
+epapr_ev_idle_start:
+	li      r3, -1
+	nop
+	nop
+	nop
+	b       idle_loop
+.endm
+
+BOOK3E_IDLE epapr_ev_idle EPAPR_EV_IDLE_LOOP
+
+BOOK3E_IDLE book3e_idle BOOK3E_IDLE_LOOP
 
 #endif /* CONFIG_PPC64 */
-- 
1.7.9.7

^ permalink raw reply related

* Re: [PATCH v5 00/45] CPU hotplug: stop_machine()-free CPU hotplug
From: Srivatsa S. Bhat @ 2013-02-08 18:09 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: linux-doc, peterz, fweisbec, linux-kernel, walken, mingo,
	linux-arch, vincent.guittot, xiaoguangrong, wangyun, paulmck,
	nikunj, linux-pm, Rusty Russell, rostedt, rjw, namhyung, tglx,
	linux-arm-kernel, netdev, oleg, sbw, tj, akpm, linuxppc-dev
In-Reply-To: <51152B81.2050501@linux.vnet.ibm.com>

On 02/08/2013 10:14 PM, Srivatsa S. Bhat wrote:
> On 02/08/2013 09:11 PM, Russell King - ARM Linux wrote:
>> On Thu, Feb 07, 2013 at 11:41:34AM +0530, Srivatsa S. Bhat wrote:
>>> On 02/07/2013 09:44 AM, Rusty Russell wrote:
>>>> "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> writes:
>>>>> On 01/22/2013 01:03 PM, Srivatsa S. Bhat wrote:
>>>>>                  Avg. latency of 1 CPU offline (ms) [stop-cpu/stop-m/c latency]
>>>>>
>>>>> # online CPUs    Mainline (with stop-m/c)       This patchset (no stop-m/c)
>>>>>
>>>>>       8                 17.04                          7.73
>>>>>
>>>>>      16                 18.05                          6.44
>>>>>
>>>>>      32                 17.31                          7.39
>>>>>
>>>>>      64                 32.40                          9.28
>>>>>
>>>>>     128                 98.23                          7.35
>>>>
>>>> Nice!
>>>
>>> Thank you :-)
>>>
>>>>  I wonder how the ARM guys feel with their quad-cpu systems...
>>>>
>>>
>>> That would be definitely interesting to know :-)
>>
>> That depends what exactly you'd like tested (and how) and whether you'd
>> like it to be a test-chip based quad core, or an OMAP dual-core SoC.
>>
> 
> The effect of stop_machine() doesn't really depend on the CPU architecture
> used underneath or the platform. It depends only on the _number_ of
> _logical_ CPUs used.
> 
> And stop_machine() has 2 noticeable drawbacks:
> 1. It makes the hotplug operation itself slow
> 2. and it causes disruptions to the workloads running on the other
> CPUs by hijacking the entire machine for significant amounts of time.
> 
> In my experiments (mentioned above), I tried to measure how my patchset
> improves (reduces) the duration of hotplug (CPU offline) itself. Which is
> also slightly indicative of the impact it has on the rest of the system.
> 
> But what would be nice to test, is a setup where the workloads running on
> the rest of the system are latency-sensitive, and measure the impact of
> CPU offline on them, with this patchset applied. That would tell us how
> far is this useful in making CPU hotplug less disruptive on the system.
> 
> Of course, it would be nice to also see whether we observe any reduction
> in hotplug duration itself (point 1 above) on ARM platforms with lot
> of CPUs. [This could potentially speed up suspend/resume, which is used
> rather heavily on ARM platforms].
> 
> The benefits from this patchset over mainline (both in terms of points
> 1 and 2 above) is expected to increase, with increasing number of CPUs in
> the system.
> 

Adding Vincent to CC, who had previously evaluated the performance and
latency implications of CPU hotplug on ARM platforms, IIRC.

Regards,
Srivatsa S. Bhat

^ permalink raw reply

* Re: [PATCH] net: fec_mpc52xx: Read MAC address from device-tree
From: Anatolij Gustschin @ 2013-02-08 16:41 UTC (permalink / raw)
  To: Stefan Roese; +Cc: linuxppc-dev
In-Reply-To: <1360339952-8724-1-git-send-email-sr@denx.de>

Hi Stefan,

On Fri,  8 Feb 2013 17:12:32 +0100
Stefan Roese <sr@denx.de> wrote:

> Until now, the MPC5200 FEC ethernet driver relied upon the bootloader
> (U-Boot) to write the MAC address into the ethernet controller
> registers. The Linux driver should not rely on such a thing. So
> lets read the MAC address from the DT as it should be done here.

Somehow the patch didn't reach the netdev list. Could you please resend?

Thanks,
Anatolij

^ permalink raw reply

* Re: [PATCH v5 00/45] CPU hotplug: stop_machine()-free CPU hotplug
From: Srivatsa S. Bhat @ 2013-02-08 16:44 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: linux-doc, peterz, fweisbec, linux-kernel, walken, mingo,
	linux-arch, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm,
	Rusty Russell, rostedt, rjw, namhyung, tglx, linux-arm-kernel,
	netdev, oleg, sbw, tj, akpm, linuxppc-dev
In-Reply-To: <20130208154113.GV17833@n2100.arm.linux.org.uk>

On 02/08/2013 09:11 PM, Russell King - ARM Linux wrote:
> On Thu, Feb 07, 2013 at 11:41:34AM +0530, Srivatsa S. Bhat wrote:
>> On 02/07/2013 09:44 AM, Rusty Russell wrote:
>>> "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> writes:
>>>> On 01/22/2013 01:03 PM, Srivatsa S. Bhat wrote:
>>>>                  Avg. latency of 1 CPU offline (ms) [stop-cpu/stop-m/c latency]
>>>>
>>>> # online CPUs    Mainline (with stop-m/c)       This patchset (no stop-m/c)
>>>>
>>>>       8                 17.04                          7.73
>>>>
>>>>      16                 18.05                          6.44
>>>>
>>>>      32                 17.31                          7.39
>>>>
>>>>      64                 32.40                          9.28
>>>>
>>>>     128                 98.23                          7.35
>>>
>>> Nice!
>>
>> Thank you :-)
>>
>>>  I wonder how the ARM guys feel with their quad-cpu systems...
>>>
>>
>> That would be definitely interesting to know :-)
> 
> That depends what exactly you'd like tested (and how) and whether you'd
> like it to be a test-chip based quad core, or an OMAP dual-core SoC.
> 

The effect of stop_machine() doesn't really depend on the CPU architecture
used underneath or the platform. It depends only on the _number_ of
_logical_ CPUs used.

And stop_machine() has 2 noticeable drawbacks:
1. It makes the hotplug operation itself slow
2. and it causes disruptions to the workloads running on the other
CPUs by hijacking the entire machine for significant amounts of time.

In my experiments (mentioned above), I tried to measure how my patchset
improves (reduces) the duration of hotplug (CPU offline) itself. Which is
also slightly indicative of the impact it has on the rest of the system.

But what would be nice to test, is a setup where the workloads running on
the rest of the system are latency-sensitive, and measure the impact of
CPU offline on them, with this patchset applied. That would tell us how
far is this useful in making CPU hotplug less disruptive on the system.

Of course, it would be nice to also see whether we observe any reduction
in hotplug duration itself (point 1 above) on ARM platforms with lot
of CPUs. [This could potentially speed up suspend/resume, which is used
rather heavily on ARM platforms].

The benefits from this patchset over mainline (both in terms of points
1 and 2 above) is expected to increase, with increasing number of CPUs in
the system.

Regards,
Srivatsa S. Bhat

^ permalink raw reply

* [PATCH] net: fec_mpc52xx: Read MAC address from device-tree
From: Stefan Roese @ 2013-02-08 16:12 UTC (permalink / raw)
  To: net; +Cc: linuxppc-dev, Anatolij Gustschin

Until now, the MPC5200 FEC ethernet driver relied upon the bootloader
(U-Boot) to write the MAC address into the ethernet controller
registers. The Linux driver should not rely on such a thing. So
lets read the MAC address from the DT as it should be done here.

This fixes a problem with a MPC5200 board that uses the SPL U-Boot
version without FEC initialization before Linux booting for
boot speedup.

Additionally a status line will now be printed upon successful
driver probing, also displaying this MAC address.

Signed-off-by: Stefan Roese <sr@denx.de>
Cc: Anatolij Gustschin <agust@denx.de>
---
 drivers/net/ethernet/freescale/fec_mpc52xx.c | 30 +++++++++++++++++-----------
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_mpc52xx.c b/drivers/net/ethernet/freescale/fec_mpc52xx.c
index 2933d08..f4c3897 100644
--- a/drivers/net/ethernet/freescale/fec_mpc52xx.c
+++ b/drivers/net/ethernet/freescale/fec_mpc52xx.c
@@ -110,15 +110,6 @@ static void mpc52xx_fec_set_paddr(struct net_device *dev, u8 *mac)
 	out_be32(&fec->paddr2, (*(u16 *)(&mac[4]) << 16) | FEC_PADDR2_TYPE);
 }
 
-static void mpc52xx_fec_get_paddr(struct net_device *dev, u8 *mac)
-{
-	struct mpc52xx_fec_priv *priv = netdev_priv(dev);
-	struct mpc52xx_fec __iomem *fec = priv->fec;
-
-	*(u32 *)(&mac[0]) = in_be32(&fec->paddr1);
-	*(u16 *)(&mac[4]) = in_be32(&fec->paddr2) >> 16;
-}
-
 static int mpc52xx_fec_set_mac_address(struct net_device *dev, void *addr)
 {
 	struct sockaddr *sock = addr;
@@ -928,10 +919,22 @@ static int __devinit mpc52xx_fec_probe(struct platform_device *op)
 	priv->t_irq = bcom_get_task_irq(priv->tx_dmatsk);
 
 	/* MAC address init */
-	if (!is_zero_ether_addr(mpc52xx_fec_mac_addr))
+	if (!is_zero_ether_addr(mpc52xx_fec_mac_addr)) {
 		memcpy(ndev->dev_addr, mpc52xx_fec_mac_addr, 6);
-	else
-		mpc52xx_fec_get_paddr(ndev, ndev->dev_addr);
+	} else {
+		struct device_node *np = op->dev.of_node;
+		const void *p;
+
+		/* Read MAC-address */
+		p = of_get_property(np, "local-mac-address", NULL);
+		if (p == NULL) {
+			dev_err(&ndev->dev, "%s: Can't find local-mac-address property\n",
+				np->full_name);
+			rv = -ENXIO;
+			goto err_irq_dispose;
+		}
+		memcpy(ndev->dev_addr, p, 6);
+	}
 
 	priv->msg_enable = netif_msg_init(debug, MPC52xx_MESSAGES_DEFAULT);
 
@@ -970,11 +973,14 @@ static int __devinit mpc52xx_fec_probe(struct platform_device *op)
 
 	/* We're done ! */
 	dev_set_drvdata(&op->dev, ndev);
+	printk(KERN_INFO "%s: %s MAC %pM\n",
+	       ndev->name, op->dev.of_node->full_name, ndev->dev_addr);
 
 	return 0;
 
 err_node:
 	of_node_put(priv->phy_node);
+err_irq_dispose:
 	irq_dispose_mapping(ndev->irq);
 err_rx_tx_dmatsk:
 	if (priv->rx_dmatsk)
-- 
1.8.1.2

^ permalink raw reply related

* Re: [PATCH v5 00/45] CPU hotplug: stop_machine()-free CPU hotplug
From: Russell King - ARM Linux @ 2013-02-08 15:41 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: linux-doc, peterz, fweisbec, linux-kernel, walken, mingo,
	linux-arch, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm,
	Rusty Russell, rostedt, rjw, namhyung, tglx, linux-arm-kernel,
	netdev, oleg, sbw, tj, akpm, linuxppc-dev
In-Reply-To: <51134596.4080106@linux.vnet.ibm.com>

On Thu, Feb 07, 2013 at 11:41:34AM +0530, Srivatsa S. Bhat wrote:
> On 02/07/2013 09:44 AM, Rusty Russell wrote:
> > "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> writes:
> >> On 01/22/2013 01:03 PM, Srivatsa S. Bhat wrote:
> >>                  Avg. latency of 1 CPU offline (ms) [stop-cpu/stop-m/c latency]
> >>
> >> # online CPUs    Mainline (with stop-m/c)       This patchset (no stop-m/c)
> >>
> >>       8                 17.04                          7.73
> >>
> >>      16                 18.05                          6.44
> >>
> >>      32                 17.31                          7.39
> >>
> >>      64                 32.40                          9.28
> >>
> >>     128                 98.23                          7.35
> > 
> > Nice!
> 
> Thank you :-)
> 
> >  I wonder how the ARM guys feel with their quad-cpu systems...
> > 
> 
> That would be definitely interesting to know :-)

That depends what exactly you'd like tested (and how) and whether you'd
like it to be a test-chip based quad core, or an OMAP dual-core SoC.

^ permalink raw reply

* [tip:sched/core] cputime: Restore CPU_ACCOUNTING config defaults for PPC64
From: tip-bot for Stephen Rothwell @ 2013-02-08 15:18 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: namhyung.kim, zhong, sfr, peterz, linux-kernel, rostedt, fweisbec,
	paul.gortmaker, hpa, tglx, paulmck, linuxppc-dev, mingo
In-Reply-To: <20130208141938.f31b7b9e1acac5bbe769ee4c@canb.auug.org.au>

Commit-ID:  02fc8d37229d15c654876cf9ce56b5c1cf7945d7
Gitweb:     http://git.kernel.org/tip/02fc8d37229d15c654876cf9ce56b5c1cf7945d7
Author:     Stephen Rothwell <sfr@canb.auug.org.au>
AuthorDate: Fri, 8 Feb 2013 14:19:38 +1100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 8 Feb 2013 15:23:12 +0100

cputime: Restore CPU_ACCOUNTING config defaults for PPC64

Commit abf917cd91cb ("cputime: Generic on-demand virtual cputime
accounting") inadvertantly changed the default CPU_ACCOUNTING
config for PPC64.  Repair that.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: ppc-dev <linuxppc-dev@lists.ozlabs.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Link: http://lkml.kernel.org/r/20130208141938.f31b7b9e1acac5bbe769ee4c@canb.auug.org.au
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 init/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/init/Kconfig b/init/Kconfig
index a05f843..ccb9f8f 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -332,7 +332,7 @@ config VIRT_CPU_ACCOUNTING
 choice
 	prompt "Cputime accounting"
 	default TICK_CPU_ACCOUNTING if !PPC64
-	default VIRT_CPU_ACCOUNTING if PPC64
+	default VIRT_CPU_ACCOUNTING_NATIVE if PPC64
 
 # Kind of a stub config for the pure tick based cputime accounting
 config TICK_CPU_ACCOUNTING

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox