From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933381AbcAKVQO (ORCPT ); Mon, 11 Jan 2016 16:16:14 -0500 Received: from mail-db3on0066.outbound.protection.outlook.com ([157.55.234.66]:64042 "EHLO emea01-db3-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S932924AbcAKVQK (ORCPT ); Mon, 11 Jan 2016 16:16:10 -0500 Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=cmetcalf@ezchip.com; Subject: Re: [PATCH v9 00/13] support "task_isolation" mode for nohz_full To: Gilad Ben Yossef , Steven Rostedt , Ingo Molnar , Peter Zijlstra , Andrew Morton , Rik van Riel , Tejun Heo , Frederic Weisbecker , Thomas Gleixner , "Paul E. McKenney" , Christoph Lameter , Viresh Kumar , Catalin Marinas , Will Deacon , Andy Lutomirski , Daniel Lezcano , , , References: <1451936091-29247-1-git-send-email-cmetcalf@ezchip.com> From: Chris Metcalf Message-ID: <56941B86.9090009@ezchip.com> Date: Mon, 11 Jan 2016 16:15:50 -0500 User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0 MIME-Version: 1.0 In-Reply-To: <1451936091-29247-1-git-send-email-cmetcalf@ezchip.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [12.216.194.146] X-ClientProxiedBy: BLUPR17CA0027.namprd17.prod.outlook.com (2a01:111:e400:c464::37) To AM3PR02MB114.eurprd02.prod.outlook.com (2a01:111:e400:8807::15) X-Microsoft-Exchange-Diagnostics: 1;AM3PR02MB114;2:9ZzXTbpuO8eAPgenxJ4MiLRa+rLam0nFd1kNN/bpU+RcjkcfxJ9nYwYbffh2NM46eYKvpBRY2hEfqOlzJltgk5qO1JCYR7sWAUb0VFvKqDl9mtxUAIP5aOdb45eKm6rmtpIrH5tDm+cQ/8AYPSZZkQ==;3:97lZNCfJKq4qB65J9tDTerVzS8bt9cY8HNdVhPot972GJfcUgYdTjaeaccIGYZ93UMwLk5p8QbCzzTwVkqce6ma5Ptj8DQ+M5n0alVJZ9kfjjrFTdTqsM/1aT2tbr9OV;25:J4q+N68XIIz1ehKxJrEG8ZOewdKkx2OjsU2bEkzlQ4Rmh91ZOjegz+iHoyHBFpA7w53CyQ4DRAJRvU/EC7v415BqTzT3V3YTFbGmW4BfhI0KwFD2heO+1NJDe6KggrGnmGanp1PKVspaWcMo9neSDTi7OrppvPvK/mr4uK82FGDdUR/idLc2QdfeQ5lVjTJEvTqZt5oIk1LCavRtD9oXrC7UmyE0F/ZiTzi3dXLlA92VDif5S7TMstE/NvHwJs16;20:wDxTMKfvTH45HkvCXeHDEckSHIuBUMUnHMUwQWlqAYvzg+vSO9TkG6/oJip6WlMj9kVAGaYzsQebvISZuUv9nBdQmhckMYwIifv3XBbjN0FnkRf43TEbDWDAt2kWAlLg/hoo6jpf/vwezeghnIaISU9C3BMeiYNvnhN5QUhHT4c= X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:AM3PR02MB114; X-MS-Office365-Filtering-Correlation-Id: c5e597dd-68e7-426b-a9da-08d31acc6bf0 X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(121898900299872); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(2401047)(5005006)(520078)(8121501046)(10201501046)(3002001);SRVR:AM3PR02MB114;BCL:0;PCL:0;RULEID:;SRVR:AM3PR02MB114; X-Microsoft-Exchange-Diagnostics: 1;AM3PR02MB114;4:J/vJAVBeVudFMzA/KWqyx8GFBJqxgCaH0T4KDyzCsKdw47FYSIZo+Guhiw6Da70QdzgAVZxNFxVS7zmAdWz+OWhpnEr6Ph/iOAryVfDl4kP2U75lQq/kJ1YgRzIgpSsb5rtcabNdhOK1fpjTO3qCUExnOt6TjVUxahCCBmhaOJvQ4d4gZ1IV9EiFoo5PywcKhTzmjRi1AwpVIL6bOpyvMJDe5BhSTVrtlc39KJZZMlo1POwfIix7PucQKfvXCbn+FGNGiMxr4YTeC7y6y6XEQPoe7efZaTz+OaR041CsynJfUrSY/UpXvNXur/sKuKJFeOZkTOXhdeS0EYOE5MSlVg0t6p4+kygmkKuIwmcxnr8IsMIm8ndXDYfHtBJ8IwDQEtL6Mc+L34c6mVC749a0u5ZZxxps+rdVrhBTKmuCCk24Sfolm4+UYOEMeIm2KM2n X-Forefront-PRVS: 0818724663 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10009020)(6009001)(6049001)(479174004)(377454003)(52044002)(24454002)(199003)(189002)(64126003)(54356999)(99136001)(87266999)(59896002)(2906002)(76176999)(2201001)(2950100001)(230700001)(97736004)(50986999)(92566002)(47776003)(4001350100001)(65956001)(81156007)(107886002)(40100003)(5004730100002)(19580395003)(65816999)(65806001)(6116002)(105586002)(50466002)(19580405001)(189998001)(3846002)(33656002)(23746002)(80316001)(83506001)(122386002)(1096002)(87976001)(15975445007)(86362001)(101416001)(5001770100001)(106356001)(5008740100001)(77096005)(42186005)(586003)(36756003)(5001960100002)(66066001)(921003)(1121003)(18886065003);DIR:OUT;SFP:1101;SCL:1;SRVR:AM3PR02MB114;H:[10.7.0.41];FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?Windows-1252?Q?1;AM3PR02MB114;23:OZgD3aeRBkSIGmtiOkdbI2d3eCq7akEXSdbp7o?= =?Windows-1252?Q?2cojyl72EK8B3jW6jLlwpyYbZRO7lqNHH0rvpuqx7p3jW4BwLzBopRUR?= =?Windows-1252?Q?9u1vmSBI1/S9TzhkDo1PO4aLrDd/SKUNSnXN5ipUfRdNe5pF7YxENxex?= =?Windows-1252?Q?SaMOAu8laAmNiGBE29lzuDqzUxshWhFAlMbusIkBFLwZJFBVhrZ3Ewse?= =?Windows-1252?Q?kipe5fpimW1kNQT+Cb60jHHdjqJ+siHx+uVBnt6/dr9pQ2KTNUf+zedf?= =?Windows-1252?Q?/Eo/ahwJKY/qKDzATG7ehlf4d7I7FaUQU9td9yo8Q9DNfmQwCiloAfyd?= =?Windows-1252?Q?hGexiwYEQOiFnFfOzuDuNheN1BzwNK6zUCyuQZZ8Rnidf/4ScbZZw6s1?= =?Windows-1252?Q?2ggN2jVF6lZh5RckVRj/EmCYL9yStZOBo71eg58EQjdC4WO7Pjlz3gQ2?= =?Windows-1252?Q?TKsYeme5HDWFrkhgh+EzI1DOolJQcbd1l+nshNSgDdDf3lHOGhA4IgUm?= =?Windows-1252?Q?B0jSSRLbFVh3KIa4UV5TqbxZ1UNrKrT/AyBKUrtWYQ1JUDk25LZPbJ+b?= =?Windows-1252?Q?0ThCyqPg/VPDmFKSlai9hQRg5Wr/boOxAaFEb4KUhuHqDzA/Xmoz/OB/?= =?Windows-1252?Q?Xi6CC+tFC+68PVgm0RtpZeis/r67efyHye+dApdDvs+dFFXrueFlfcgu?= =?Windows-1252?Q?+se7ojN3aBxh4SsnSOwKaHatbU2LiFZftKRNx2PN0mpfCz5ytWktJcdD?= =?Windows-1252?Q?aj/h63Zjul59YdGwd5qC/mNS9ZXsjdX+q1cQPvYRi7t+6jk6Zu98SVD6?= =?Windows-1252?Q?T3Z85flTccH8NPkRWos6ViUz1NrS0Vrr9ndOExEhCW6bfmA1bJ/ZcVZ6?= =?Windows-1252?Q?BaYNN5jplAnM6LeeRqTrSTALUuXnNhTk649aJ7FZYbWkWWKIFQiPXgtd?= =?Windows-1252?Q?PJUgTR7yFugoshqpsk9TZBcOcWwVJc3EyZkP+pi14aasCY+h7B9FgPAe?= =?Windows-1252?Q?QEiT1qMfxhdVtWXvyWSs62C401phQNsi2tyBeVpvSGhEMlB7ON4yea30?= =?Windows-1252?Q?AFGTYVKrU+1OWVFq801E/5QVFaBnLqrcoVOO4Wcl4CGD7uq8MdB8/XxQ?= =?Windows-1252?Q?RSwkuG6z0jsrsuhNcKTBVg5y0LMabsmrMwVWulG8Mic8mdmCCGH8Wj4+?= =?Windows-1252?Q?0ydp9WVtlHLwZYPjv9sNfAYLhEwIDW64DhQKTXDQXuAPMtozB0XPWOAI?= =?Windows-1252?Q?owjRCe+unG2XtH4F+wdpPPxLVfhVgEqGVs54YBCqpfg+rQF6nL+AHbw/?= =?Windows-1252?Q?OxYZ5r21TmSRjlKVcgbvC2r5gTs/cWSt1Z2hoKAGb0D7JoP/2q34iT+p?= =?Windows-1252?Q?WwzluUl3+yI6SDoxwS3Xwr0ot7GIva03Fbtzt7T/9bY3T8GB0bhpFVqw?= =?Windows-1252?Q?lN0eVE4ZAXKnQVi+kbENAWpbf6klnyCqUTP2P7dPWWRKv8aQ7BiFZIyx?= =?Windows-1252?Q?tPc4VTZIKMEftI9GwR1HRBaUo6+Wcdq/VQeAoCMiF7DnyhctIak/78uS?= =?Windows-1252?Q?Wo5GAMRpWFOiM8eNATF2nwaYRM/MwZt+4FQ4GDfbOM1oLG/9mWrv0ZYw?= =?Windows-1252?Q?=3D=3D?= X-Microsoft-Exchange-Diagnostics: 1;AM3PR02MB114;5:R4dRmplJezqdN00Tom9Z8eGYkCOHD8TdI0b1zCxyQfAuVpggxx7qc4hw6dPoUYFOr3X+GUG9CbpiEGB/T/mAWOMQ63dX+9p/Jrwja/2dTHco9Sg+qfjdtLzIXCH3/gc1+rxB4Egp9gqlT5jd3mcqoA==;24:cW/CncG2v65V/GxBY5uNKvUh2VV2whhggZd6Y+j3JT5TgrgBsqqxv6pA4vtVnOeHj732A6YJEHJwUYhBb1NP9fREZQ2NxAXUcYzCrgrtszU= SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: ezchip.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 11 Jan 2016 21:16:05.4896 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM3PR02MB114 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Ping! There has been no substantive feedback to this version of the patch in the week since I posted it, which optimistically suggests to me that people may be satisfied with it. If that's true, Frederic, I assume this would be pulled into your tree? I have slightly updated the v9 patch series since this posting: - Incorporated a fix to initialize cpu_isolation_mask early if no cpu_isolation= boot argument was given, to avoid crashing on CPUMASK_OFFSTACK platforms. - Incorporated Mark Rutland's changes to convert arm64 assembly to C code instead of using my own version. The updated patch series is available in the branch at git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile.git dataplane I will post a v10 with those couple of small changes if I don't hear any other feedback, or of course feel free to pull from the git repo. On 01/04/2016 02:34 PM, Chris Metcalf wrote: > It has been a couple of months since the v8 version of this patch, > since various other priorities came up at work. Since it's been > a while I will try to summarize where I think we got to on the > various issues that were raised with v8. > > 1. Andy Lutomirski raised the issue of whether it really made sense to > only attempt to set up the conditions for task isolation, ask the kernel > nicely for it, and then wait until it happened. He wondered if a > SCHED_ISOLATED class might be a helpful abstraction. Steven Rostedt > also suggested having an interface that would force everything else > off a core to enable SCHED_ISOLATED to succeed. Frederick added > some concerns about enforcing the test that the process was in a > good state to enter task isolation. > > I tried to address the different design philosphies for what I called > the original "polite" mode and the reviewers' suggestions for an > "aggressive" mode in this email: > > https://lkml.org/lkml/2015/10/26/625 > > As I said there, on balance I think the "polite" option is still > better. Obviously folks are welcome to disagree and I'm happy to > continue that conversation (or perhaps I convinced everyone). > > 2. Andy didn't like the idea of having a "STRICT" mode which > delivered a signal to a process for violating the contract that it > will promise to stay out of the kernel. Gilad Ben Yossef argued that > it made sense to have a way for the kernel to enforce the requested > correctness guarantee of never being interrupted. Andy pointed out > that we should then really deliver such a signal when the kernel > delivers an asynchronous interrupt to the core as well. In particular > this is a concern for the application-error case of a process that > calls unmap() on one core while a thread on another core is running > STRICT, and thus gets an unexpected TLB flush. > > This patch series addresses that concern by including support for > IRQs, IPIs, and similar asynchronous interrupts to also send the > STRICT signal to the process. We don't try to send the signal if > we are in an NMI, and instead just force a console backtrace like > you would get in task_isolation_debug mode. > > 3. Frederick nack'ed my patch for a boot flag to disable the 1Hz > periodic scheduler tick. > > I'm still hoping he's open to changing his mind about that, but in > this patch series I have removed that boot flag. > > Various other changes have been introduced since v8: > > https://lkml.kernel.org/r/1445373372-6567-1-git-send-email-cmetcalf@ezchip.com > > - Rebased to Linux 4.4-rc5. > > - Since nohz_full and isolnodes have been separated back out again in > 4.4, I introduced a new task_isolation=MASK boot argument that sets > both of them. The task isolation support now requires that this > boot flag have been used; it intentionally doesn't work if you've > just enabled nohz_full and isolcpus separately. I could be > convinced that doing it the other way around makes sense, though. > > - I folded the two STRICT mode patches together since there didn't > seem to be much value in having the second patch that just enabled > having a settable signal. I also refactored the various routines > that report on interrupts/exceptions/etc to make it easier to hook > in from the case where we are interrupted asynchronously. > > - For the debug support, I moved most of the functionality into > kernel/isolation.c and out of kernel/sched/core.c, leaving only a > small hook to handle mapping a remote cpu to a task struct safely. > In addition to implementing Andy's suggestion of signalling a task > when it is interrupted asynchronously, I also added a ratelimit > hook so we won't spam the console if (for example) a timer interrupt > runs amok - particularly since when this happens without ratelimit, > it can end up self-perpetuating the timer interrupt. > > - I added a task_isolation_debug_cpumask() helper function to check > all the cpus in a mask to see if they are being interrupted > inappropriately. > > - I made the check for irq_enter() robust to architectures that > have already entered user mode context_tracking before calling > irq_enter() by testing user_mode(get_irq_regs()) instead of > context_tracking_in_user(), and split out the code to a separate > inlined function so I could comment it better. > > - For arm64, I added a task_isolation_debug_cpumask() hook for > smp_cross_call(), which I had missed in the earlier versions. > > - I generalized the fix for tile to set up a clockevents hook for > set_state_oneshot_stopped() to also apply to the arm_arch_timer, > which I realized was showing the same problem. For both cases, > this seems to be what Viresh had in mind with commit 8fff52fd509345 > ("clockevents: Introduce CLOCK_EVT_STATE_ONESHOT_STOPPED state"). > > - For tile, I adopted the arm model of doing user_exit() calls in the > early assembly code (a new patch in this series). I also added a > missing task_isolation_debug hook for tile's IPI and remote cache > flush code. > > Chris Metcalf (12): > vmstat: add vmstat_idle function > lru_add_drain_all: factor out lru_add_drain_needed > task_isolation: add initial support > task_isolation: support PR_TASK_ISOLATION_STRICT mode > task_isolation: add debug boot flag > arch/x86: enable task isolation functionality > arch/arm64: adopt prepare_exit_to_usermode() model from x86 > arch/arm64: enable task isolation functionality > arch/tile: adopt prepare_exit_to_usermode() model from x86 > arch/tile: move user_exit() to early kernel entry sequence > arch/tile: enable task isolation functionality > arm, tile: turn off timer tick for oneshot_stopped state > > Christoph Lameter (1): > vmstat: provide a function to quiet down the diff processing > > Documentation/kernel-parameters.txt | 16 +++ > arch/arm64/include/asm/thread_info.h | 18 ++- > arch/arm64/kernel/entry.S | 6 +- > arch/arm64/kernel/ptrace.c | 12 +- > arch/arm64/kernel/signal.c | 35 ++++-- > arch/arm64/kernel/smp.c | 2 + > arch/arm64/mm/fault.c | 4 + > arch/tile/include/asm/processor.h | 2 +- > arch/tile/include/asm/thread_info.h | 8 +- > arch/tile/kernel/intvec_32.S | 51 +++----- > arch/tile/kernel/intvec_64.S | 54 +++------ > arch/tile/kernel/process.c | 83 +++++++------ > arch/tile/kernel/ptrace.c | 19 +-- > arch/tile/kernel/single_step.c | 8 +- > arch/tile/kernel/smp.c | 26 ++-- > arch/tile/kernel/time.c | 1 + > arch/tile/kernel/traps.c | 13 +- > arch/tile/kernel/unaligned.c | 16 ++- > arch/tile/mm/fault.c | 6 +- > arch/tile/mm/homecache.c | 2 + > arch/x86/entry/common.c | 10 +- > arch/x86/kernel/traps.c | 2 + > arch/x86/mm/fault.c | 2 + > drivers/clocksource/arm_arch_timer.c | 2 + > include/linux/isolation.h | 80 +++++++++++++ > include/linux/sched.h | 3 + > include/linux/swap.h | 1 + > include/linux/vmstat.h | 4 + > include/uapi/linux/prctl.h | 8 ++ > init/Kconfig | 20 ++++ > kernel/Makefile | 1 + > kernel/irq_work.c | 5 +- > kernel/isolation.c | 225 +++++++++++++++++++++++++++++++++++ > kernel/sched/core.c | 18 +++ > kernel/signal.c | 5 + > kernel/smp.c | 6 +- > kernel/softirq.c | 33 +++++ > kernel/sys.c | 9 ++ > mm/swap.c | 13 +- > mm/vmstat.c | 24 ++++ > 40 files changed, 665 insertions(+), 188 deletions(-) > create mode 100644 include/linux/isolation.h > create mode 100644 kernel/isolation.c > -- Chris Metcalf, EZChip Semiconductor http://www.ezchip.com