From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753284AbbJBRQP (ORCPT ); Fri, 2 Oct 2015 13:16:15 -0400 Received: from mail-am1on0062.outbound.protection.outlook.com ([157.56.112.62]:26752 "EHLO emea01-am1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751841AbbJBRQM (ORCPT ); Fri, 2 Oct 2015 13:16:12 -0400 Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=cmetcalf@ezchip.com; Subject: Re: [PATCH v7 02/11] task_isolation: add initial support To: Thomas Gleixner References: <1443453446-7827-1-git-send-email-cmetcalf@ezchip.com> <1443453446-7827-3-git-send-email-cmetcalf@ezchip.com> <20151001121414.GB3432@lerouge> <560D6725.9000609@ezchip.com> CC: Frederic Weisbecker , Gilad Ben Yossef , Steven Rostedt , Ingo Molnar , Peter Zijlstra , Andrew Morton , Rik van Riel , Tejun Heo , "Paul E. McKenney" , Christoph Lameter , Viresh Kumar , Catalin Marinas , Will Deacon , Andy Lutomirski , , , From: Chris Metcalf Message-ID: <560EBBC5.7000709@ezchip.com> Date: Fri, 2 Oct 2015 13:15:49 -0400 User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [12.216.194.146] X-ClientProxiedBy: CY1PR1001CA0038.namprd10.prod.outlook.com (25.163.136.48) To DB5PR02MB0775.eurprd02.prod.outlook.com (25.161.243.146) X-Microsoft-Exchange-Diagnostics: 1;DB5PR02MB0775;2:th6lJuQfJI0qk4e6AmvBfOftUwgVqO+uml0NtUdVqaQOASZ6kTOc+ajNmUbeo+220/FeRpCEosGRZo8F0tV9PUmP/YPOg5xIF0L7XqnV64xbFzEiBTH9XffSQxfOD/Dkx16NPzm8QsQg/0MrQ43q9+sTw/DS/WYADi88YgncE4k=;3:e0ALyIl5zmi3XU9mLcy0oH32Mai0d34ajg4c/VL3X/0WrChHZoxkn3PUz8j0znoCLMbP+u0Ye114QP+m//MbyoWSfISNLTZOd5roz0DRVQ4NRJpwKR17f9QcAz79DmadKU5feDkLp2mHsW5xAzl7yA==;25:H5Dv+pDB5IWHtbctHbQifP8u6H6aTnYemnaLDaGxabqbALaGwEjsAzLR8RXjM8GINZ8EggVKguG+mHKkIMlroLBVxnKhyXPxAQ1ZmhXtCGlQoG1ld/8glozoibYMJREAiCQ/fOq4ODPQzPwvz7thOLJBfK0kHlVmPKGUDv3kPD6/JegdbDgykc6JY1A3CUhPYONSfZyvdXEh2KtCmZ78oOLu0q64v2aAduSM7Z8mbr4XXtP7+c3cvuzz9Lv+/59Tu+sNiG80r7Seia0CtWqMGw==;20:CEBYsCi78ouuogeVF3tbgoUmD3Ugj1cVE5MHQLcNOh0rITk1dXK73Bme0R48bfHpBaDT5AQ1bpxnBn9hpyYlwTQzg4B5VswrsgzGboSrRu+EpIAPo0Zgy/iYKUi/fBj/SgInDJ5KJENUUFb/dnY8CtNG6B6GXmWqXMdI5bjBRHo= X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:DB5PR02MB0775; X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(2401047)(5005006)(520078)(8121501046)(3002001);SRVR:DB5PR02MB0775;BCL:0;PCL:0;RULEID:;SRVR:DB5PR02MB0775; X-Microsoft-Exchange-Diagnostics: 1;DB5PR02MB0775;4:jXlD/Cfu7rAHcjK5RSns5UVUBguOi2IaThNE9b62bTTfOBRBSN1UgPSOe+PB8lpj0kKT18BPZh3VLrg4VU9DK2sHUFNd3sZDEQnLSLfxOYAnlVcnluYTaS+4by+usVtWc1ifNhLVofHeHHHVWENEozuUrDUa6+ee/f0D6JybttPvhkMjtAp4VCbgoNvcLAm+R5P9cp7ZdH0qPADwdXbf6dn8CiSXpkN/ff6z+Ypql8d3Jpg14qHQ58EAZ7A85zsPxE81U0pJjtCKS3z8segsv22fEct4D2K91Yd9mnHAY/67j9YdUKQbqOsSZ/Fr7zT4d2QV/LOtiqkn1w+Xc5Voc8CSIxvUqGwsOjTg+A795BU= X-Forefront-PRVS: 0717E25089 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10009020)(6049001)(6009001)(24454002)(479174004)(76104003)(377454003)(199003)(189002)(50986999)(76176999)(65816999)(33656002)(46102003)(83506001)(101416001)(59896002)(54356999)(86362001)(87266999)(23746002)(19580395003)(80316001)(19580405001)(50466002)(65806001)(47776003)(64706001)(87976001)(122386002)(15975445007)(92566002)(189998001)(77096005)(66066001)(68736005)(65956001)(2950100001)(105586002)(40100003)(106356001)(110136002)(5008740100001)(64126003)(62966003)(77156002)(36756003)(4001350100001)(4001540100001)(5001830100001)(42186005)(81156007)(93886004)(5007970100001)(97736004)(5001860100001)(5004730100002)(5001960100002)(18886065003);DIR:OUT;SFP:1101;SCL:1;SRVR:DB5PR02MB0775;H:[10.7.0.41];FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?Windows-1252?Q?1;DB5PR02MB0775;23:xkXp0rQwgpo42dR0Vxo86PhA0w9TqPzzh1cya?= =?Windows-1252?Q?229Ihe4xzKnXkUuP31mOP4Y7dFh4rqNxw14r1pGo8YEww5+Kpk89xbtt?= =?Windows-1252?Q?rHR7wthNCH1FyXTH0DnJqwlytVA6wLqTbxbRHevBXfxeXKvw4a+KkCO8?= =?Windows-1252?Q?MMj4uf8FJ1JeuM6WhJgBRG2BwUYBF40QT5lJ/iUxfaHJQnz+3E7XdcTu?= =?Windows-1252?Q?ETU8EXeNKuHdGf0JNYGzWmaUXb7ot3ykB5v+68PQLaRbnU+sDDLSVi1N?= =?Windows-1252?Q?GaqnJ2BE4bVmBS16GFDQpGqpo9kfDDLQbI5mk0jAnhB7stgjo+b2jG7E?= =?Windows-1252?Q?2qtDL2dbVlBvXr9fZlp36yWRRdkxhZcforshNpoc9xIyGXqwgPeE6vi5?= =?Windows-1252?Q?yiRKUOaCV6VvPetr/LuoAOHIduGiHPIQCLKa0ln4VCZhLALnYTuGtEo5?= =?Windows-1252?Q?sBP4IAFDACWQ2lquu5H+ApbGv2iQQ6dH+v6HS/Hmr6Qa8AYRhSDsdS6L?= =?Windows-1252?Q?Hgkm/iiNmddZ0ApO/CNtmLB0xQU9EesF426xITCWPawcTIOGxdpyzfSq?= =?Windows-1252?Q?G2oORk4mzzYVaMmD54dj5o35uZZK+GkDvIlWNqpj86esumRZQOfSG4JW?= =?Windows-1252?Q?e/5PE4K9XvpFqUh57MmeM+5KKM4NDcGtcyOzK5aMnU4XMETWGjyY7PMW?= =?Windows-1252?Q?Te+hP0nLrdOWoBpjEfDLban+NNuN7GlJixqtL8yO0cVbbUCCyEEruepo?= =?Windows-1252?Q?IUoHcMUyd/aOu3Hs0wjKvNdSMBvZW1Eqqj2S3alj1RD5oRbW2nTGymaN?= =?Windows-1252?Q?WCPaiUWY99HYMNDdT2wP4M1EBEmdyA2MBlez4IKfRMo/ibM2by6CppLy?= =?Windows-1252?Q?3oiPFJ/e6bxQnwoiqlos9wvCikODz6PvE0UoQoxHDRF1lkIKB1z+Giua?= =?Windows-1252?Q?WdI7uRU2BagwfH7Gsa/njtB0WQfkckJRtF0tmsDllO/TZGjwsuZ0393k?= =?Windows-1252?Q?XuPda/9COAADGGKfOrJu/pW5GPu7mmVcsMKaa5l016vo4RXVdLaBoAGm?= =?Windows-1252?Q?fiA8Trv4PYU6UZKRyeyLsAraKiryt1LATMiypgzk6FMDJ/n7I+JaZ+ZX?= =?Windows-1252?Q?ZFbjoZ7M53IrtikmGyczMTqmCsRFetYbOEr7uJ4i8ofoMhNM2k72eQaI?= =?Windows-1252?Q?kVh3qhr/TU3mqBvO95o7Jr7nUSW9QhzG91fc2plMTFF2DzCJpwqFc5bE?= =?Windows-1252?Q?/gCZUecfGe4QECG+OyOYIjZMP+T93FftqfWZMhyzRXpfkJMQKQLjJsES?= =?Windows-1252?Q?UbgBWLgEChjyKPn8Q1xCLsOJAoX5uaIYMvQ+gkR06JKuOrjMWI9PbywW?= =?Windows-1252?Q?JOf4qqBvRwtS76GXQ9A9TkFDCteDvhvQyeHacZWDAlK/xC/+phaNzfr3?= =?Windows-1252?Q?K2LwlAUuV+Gbpqxc151iZDjvVfZIj6ad2XeSKxCAXpPr+0gOZuQUM52X?= =?Windows-1252?Q?T3lzgxjr4xn80D37lNh4xlxtW9VlS2elpGPeMphP+LX3rOlAzXhaKN3j?= =?Windows-1252?Q?543LNUH8+Csfe9RbRRDf8HysP+rJqyHCyYCCmi+8w3BnHIZWRx95lBNK?= =?Windows-1252?Q?L8uCBfdexw1+UxYG0GmJqLb7OQ27nRKTrAe66GjONsw?= X-Microsoft-Exchange-Diagnostics: 1;DB5PR02MB0775;5:yyvxQqGJxTma5ZjbSalFqwOzxdT2SgNrhWixqVNa5sWkmbep8WWtbtefT0EIUdaEKzK0B3UgCFPxhJRnqG8AUw6TyyyT7m1JSSOfVoKpJOSYCOsHM0NkhtJpDSZ5nXXr/Sttpx6XVFYL95tFtEH5sg==;24:g3GglgT4tEw0aD48+Vfj46ttXvZifFE9ZVtlEcAi/Ys/OfF6aei/Z4gEk/3V4O1dCiXwbHXyxVrtXARwH/ukGbaeg2qA5zwCsHc9zechSVo=;20:eVUMLggjTYkYqPqwkX0vrBgvBKzBfUEN1MLLApQSP7B6eVGvELQkdJ66GmiuOz0o3Rzli3OFqo5+poJIjVUMkQ== SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: ezchip.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 02 Oct 2015 17:16:04.1285 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB5PR02MB0775 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/01/2015 05:20 PM, Thomas Gleixner wrote: > On Thu, 1 Oct 2015, Chris Metcalf wrote: >> But first I want to address the question of the basic semantics >> of the patch series. I wrote up a description of why it's useful >> in my email yesterday: >> >> https://lkml.kernel.org/r/560C4CF4.9090601@ezchip.com >> >> I haven't directly heard from you as to whether you buy the >> basic premise of "hard isolation" in terms of protecting tasks >> from all kernel interrupts while they execute in userspace. > Just for the record. The first serious initiative to solve that > problem started here in my own company when I guided Frederic through > the endavour of figuring out what needs to be done to achieve > that. That was the assignement of his master thesis, which I gave him. Thanks for that background. I didn't know you had gotten Frederic started down that path originally. >> So I first want to address what is effectively the API concern that >> you raised, namely that you're concerned that there is a wait >> loop in the implementation. > That wait loop is just a place holder for the underlying more serious > concern I have with this whole approach. And I raised that concern > several times in the past and I'm happy to do so again. > > The people working on this, especially you, are just dead set to > achieve a certain functionality by jamming half baken mechanisms into > the kernel and especially into the low level entry/exit code. And > that's something which really annoys me, simply because you refuse to > tackle the problems which have been identified as need to be solved 5+ > years ago when Frederic did his thesis. I think you raise a good point. I still claim my arguments are plausible, but you may be right that this is an instance where forcing a different approach is better for the kernel community as a whole. Given that, what would you think of the following two changes to my proposed patch series: 1. Rather than spinning in a busy loop if timers are pending, we reschedule if more than one task is ready to run. This directly targets the "architected" problem with the scheduler tick, rather than sweeping up the scheduler tick and any other timers into the one catch-all of "any timer ready to fire". (We can use sched_can_stop_tick() to check the case where other tasks can preempt us.) This would then provide part of the semantics of the task-isolation flag. The other part is running whatever code can be run to avoid the various ways tasks might get interrupted later (lru_add_drain(), quiet_vmstat(), etc) that are not appropriate to run unconditionally for tasks that aren't trying to be isolated. 2. Remove the tie between disabling the 1 Hz max deferment and task isolation per se. Instead add a boot flag (e.g. "debug_1hz_tick") that lets us turn off the 1 Hz tick to make it easy to experiment with both the negative effects of the missing tick, as well as to try to learn in parallel what actual timer interrupts are firing "on purpose" rather than just due to the 1 Hz tick to try to eliminate them as well. For #1, I'm not sure if it's better to hack up the scheduler's pick_next_task callback methods to avoid task-isolation tasks when other tasks are also available to run, or just to observe that there are additional tasks ready to run during exit to userspace, and yield the cpu to allow those other tasks to run. The advantage of doing it at exit to userspace is that we can easily yield in a loop and pay attention to whether we seem not to be making forward progress with that task and generate a suitable warning; it also keeps a lot of task-isolation stuff out of the core scheduler code, which may be a plus. With these changes, and booting with the "debug_1hz_tick" flag, I'm seeing a couple of timer ticks hit my task-isolation task in the first 20 ms or so, and then it quiesces. I will plan to work on figuring out what is triggering those interrupts and seeing how to fix them. My hope is that in parallel with that work, other folks can be working on how to fix problems that occur more silently with the scheduler tick max deferment disabled; I'm also happy to work on those problems to the extent that I understand them (and I'm always happy to learn more). As part of the patch series I'd extend the proposed task_isolation_debug flag to also track timer scheduling events against task-isolation tasks that are ready to run in userspace (no other runnable tasks). What do you think of this approach? -- Chris Metcalf, EZChip Semiconductor http://www.ezchip.com