From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932171AbbIBRZx (ORCPT ); Wed, 2 Sep 2015 13:25:53 -0400 Received: from mail-am1on0067.outbound.protection.outlook.com ([157.56.112.67]:44050 "EHLO emea01-am1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1755717AbbIBRZv (ORCPT ); Wed, 2 Sep 2015 13:25:51 -0400 Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=cmetcalf@ezchip.com; Subject: Re: futex atomic vs ordering constraints To: Peter Zijlstra References: <20150826181659.GW16853@twins.programming.kicks-ass.net> <20150901163140.GK1612@arm.com> <20150901164247.GO16853@twins.programming.kicks-ass.net> <20150902125555.GT16853@twins.programming.kicks-ass.net> <55E71F92.4000001@ezchip.com> <20150902170008.GU19282@twins.programming.kicks-ass.net> CC: Thomas Gleixner , Will Deacon , Linus Torvalds , Oleg Nesterov , Paul McKenney , Ingo Molnar , "mtk.manpages@gmail.com" , "dvhart@infradead.org" , "dave@stgolabs.net" , "Vineet.Gupta1@synopsys.com" , "ralf@linux-mips.org" , "ddaney@caviumnetworks.com" , "linux-kernel@vger.kernel.org" , , From: Chris Metcalf Message-ID: <55E7310E.7030200@ezchip.com> Date: Wed, 2 Sep 2015 13:25:34 -0400 User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <20150902170008.GU19282@twins.programming.kicks-ass.net> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [12.216.194.146] X-ClientProxiedBy: DM3PR13CA0003.namprd13.prod.outlook.com (25.164.193.13) To AM2PR02MB0772.eurprd02.prod.outlook.com (25.163.146.16) X-Microsoft-Exchange-Diagnostics: 1;AM2PR02MB0772;2:Tcpo/sCH1KrzIkZDZ1TvevAL0AfjesJicL0hs+v2qgqbosIxg5htnSjeREvg//NK46ymbZfJhMa7dejgagngeHJ3kZecE/gw3FaexdZgQF5wCVr5u47bQcGmOwPcNfdhn7BP6+mCMPsezAeokpzc+cFMVemq/r5KLyq6hpcPr78=;3:YBePeOS5nyLem2t/GUx4ANlDqNd3UHNs0iPTNS4rkzj1ELuA94Ws+54B0Ob+CS4DxzW7rtMIZRIEhebvVoQ9euxDwNqq56ACf2YnsPU2JYLDXjVFqUwDjAm86uQerY9RIKAvavH42A4Wgy+w4T78bQ==;25:HcSMuNHG36RGu69ZJ6pJbloypBRaBghPVoxv//HI/ShiAQg7E47kq1i4hAu8ZJPyEIvcVVkTRXWwpB3rf/qXL0feU3eT+BXIfDqCf06wEcaGTi8Ubu5JCNezZHMvJlHW1xVFW1P00vky9rarczJ6XV7TTww1/FSmEiO2SL9Jbuyq4xFOP46aG2Wqk5VAHWmvNv+2X/JYjFJLziEZWfly8Al++b6LFTg30drp4ONCUn39DdySoPt0s8HYWf/EqIVP;20:hWk4l2IndeTuJbpw0Px2hMp1sSxbCNAxcmhBOoalNIaPQdWgY+6G7zctIdnDbKMOPYqDJswa3Ev9qqGfPKE07ittK1WzlS0e1r9MfaCJLQH8lRfTJeNepJPiqpCAcabeaGgJjKWrAYgqxkfrBw+SvEMKc4jVWyG3ueoEoasyteo= X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:AM2PR02MB0772; X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(8121501046)(5005006)(3002001);SRVR:AM2PR02MB0772;BCL:0;PCL:0;RULEID:;SRVR:AM2PR02MB0772; X-Microsoft-Exchange-Diagnostics: 1;AM2PR02MB0772;4:mu6Fpb3Pnpeyy0jkNAgY+8UXOiRMux1LbLdhDve8wpopyw9lMxQ+2nD+bmnZai5taZ+4Pazc2OhSHwLd5f9Lc9feiCDc/8wzaRI3aSTEhorsn28/2WBaHQDxzsP0c8PzuYd6ZtucHWE6X3a+KsxdJMcHexb06LC91kySJaZ3ccKPf5RM5ilA9JUIJTRNSAkmbSFri3vfKXAnZZXVIUFmXWofCpJH9iWVIzF5U4MHAa2oY3qbUINj/3PTKC0clMDOD6jFQjGKjeEFpsAuRdkjbZwx1b7xcgUmpYlKSgQLADRZSUDl7ExzPY03V1dkD+7R X-Forefront-PRVS: 0687389FB0 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10009020)(6009001)(6049001)(189002)(24454002)(199003)(479174004)(377454003)(87266999)(76176999)(83506001)(87976001)(50986999)(23746002)(50466002)(54356999)(65816999)(86362001)(15975445007)(77156002)(59896002)(62966003)(93886004)(77096005)(68736005)(92566002)(2950100001)(5007970100001)(40100003)(5004730100002)(106356001)(122386002)(101416001)(105586002)(46102003)(33656002)(4001350100001)(4001540100001)(81156007)(64126003)(5001830100001)(97736004)(110136002)(47776003)(64706001)(5001960100002)(5001860100001)(189998001)(42186005)(66066001)(65806001)(65956001)(36756003)(19580395003)(80316001)(18886065003);DIR:OUT;SFP:1101;SCL:1;SRVR:AM2PR02MB0772;H:[10.7.0.41];FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?Windows-1252?Q?1;AM2PR02MB0772;23:DuiJMNwvCwtRfIGFC0d4tn89ZEtSZki/JZqU/?= =?Windows-1252?Q?oJ9NES26EHclQjx56PUL7T6m71JVwYkoKmV/W+iaseX4CbgLe7r48Tz/?= =?Windows-1252?Q?70veR9tZ7r7EhCFHWpR6tadYfUf6ms1deiF44XXPQEzs2DPBeFyhmEb6?= =?Windows-1252?Q?IoljZAhPL73Q/rPumsxqUx1iIsRzejtruVFOKQOWfx4DYwawYBQQaOSq?= =?Windows-1252?Q?M0N3BgCfNGbCz8xbci36iBMIR68qLJDzL3l4EOILMktOm+/A888uZPrM?= =?Windows-1252?Q?Tzr5A/p4uxSNJArzLsFEYsoleOOUUl3nAsg+0iLhJMNXICCmIGA6AjnN?= =?Windows-1252?Q?gfNOJalL9SvJoXR77w3saA7uSQtOzgvj37J/vxEaET7IHOArlaXmOVkp?= =?Windows-1252?Q?vqBL3VUceLuiAWhp38EH5bAyIuLe3s0L42zD+MV58wIjgsUWiudlMWFK?= =?Windows-1252?Q?WjHi7ehF+nwDklO6baU7/QqVwA9AuOSR7gTyYQAYsRXNrcB6Or5l3UZI?= =?Windows-1252?Q?3el32ELjaK7/DjRay9jgZRmGrjeOg5fmxdMLbtFmZJQoPk3ufG85IqMJ?= =?Windows-1252?Q?66veKRQx2vILlCGWpfNUXSRQ51g5HG5P1dVCx5HyyiKRO2642oeG7WZt?= =?Windows-1252?Q?be0CgYiDoKww4TAQ5NUmJgcacS84FqOe5cggkffmtK7jFdPltwCmaNH0?= =?Windows-1252?Q?mKIEeyskKrVQHvaaEZk/e8FFjnwGijFfP/T/TE1LPIT9ACHD0L7nWnv6?= =?Windows-1252?Q?BG7jQ04N7S/dxFH+e8/tcFxo6Ne22ZNf/4WjlE5J7w7ff1wl//Mgq7vQ?= =?Windows-1252?Q?mf4skh/Ac4XtTsTRptNYZglZhSlGs6TCmlFjhlnfBvlsV1bkC7OT6g6y?= =?Windows-1252?Q?WII6heLowwi/A+ZZD8f77qZPdIeou5pVavl7+PaV4tWV4iTJ1yCXEd2I?= =?Windows-1252?Q?pPvdpwI2L4B9V03Q92FLSUW9d5dSX59rXNWpd47rCvovuSO3wQEAMiUi?= =?Windows-1252?Q?whkUN+igbDQuxnNrM+XroWPbAj1MA6ijyjK7OJW9vFf4GX4BxDiMGQbe?= =?Windows-1252?Q?gjEQwHc52kBk38QI4QkuItnnGSQhp2bn1OvL2SZEACRbXA5fzi+D9mI/?= =?Windows-1252?Q?xBfQ4C60QFrXaLXOl7kA0tbZRXxW4Y89z1v+KBMLVvB++feEjLGNfTQH?= =?Windows-1252?Q?ZbDWkYV/4+qZU4q00NiSAtfG3PrF7XDxUtYJYdQUQ5f/lDmCrO8ToVjq?= =?Windows-1252?Q?vhFLoezxkqORccOiKvH5gTT2G9EjcDZw2E1YGbYi1pIg1NjJt1jG/2ko?= =?Windows-1252?Q?guQ54qMjo2pasw5oSTBG1R0RfieaZDUQjpRbJX4KYhDcmE0GQ56bNzHi?= =?Windows-1252?Q?qIooNAhOKDT0S2VU6I5DdoJQ+JdNnN8/RFXt/7NSBTQ90lNDToMw0sLC?= =?Windows-1252?Q?0rlJq6Z7qPvGYdpGEhQg8Cqn6R/NoU+Lkcc1g6sK6iBczj2MmaIM1aQZ?= =?Windows-1252?Q?99F3aDStv//U3drxiR2pkMlKMcS3vYvh1lHnUzaLlVw4YedDJwF+pLbS?= =?Windows-1252?Q?KXg1T7ISCWv0VY=3D?= X-Microsoft-Exchange-Diagnostics: 1;AM2PR02MB0772;5:Dwdd5sX4R9TZFRp55gMTT1Td+41g6XVIQ9WB+QW48X4LYHLPdKxHs5fASAlAiAUif7rK6YYS4E1JvthwNHdK/U1ohDyeS8SIGmN+oA+ncCpFzvNP8ywOeNqVZkf7zLLAZilOAe1+vfUMCcQdy1Tn3Q==;24:Q8WCZJvy8Q+APkKicP4vkr7h7084GxL6VBgPUNJgnR2S57J185dOOCDuQne2oiyqRtvVnsYun5dOaCRV4rsp5wMUhFooCeaMyavI8Uhnwkk=;20:BUHEx13IgdMEq/6Efw2OoHxC5ZLxL5S//cGUsDY2iSDRtF23b6z9MCGi5mDNvaNdsUG1C3bvNIOqO8RAvgrZxA== SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: ezchip.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 02 Sep 2015 17:25:46.0373 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM2PR02MB0772 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09/02/2015 01:00 PM, Peter Zijlstra wrote: > On Wed, Sep 02, 2015 at 12:10:58PM -0400, Chris Metcalf wrote: >> On 09/02/2015 08:55 AM, Peter Zijlstra wrote: >>> So here goes.. >>> >>> Chris, I'm awfully sorry, but I seem to be Tile challenged. >>> >>> TileGX seems to define: >>> >>> #define smp_mb__before_atomic() smp_mb() >>> #define smp_mb__after_atomic() smp_mb() >>> >>> However, its atomic_add_return() implementation looks like: >>> >>> static inline int atomic_add_return(int i, atomic_t *v) >>> { >>> int val; >>> smp_mb(); /* barrier for proper semantics */ >>> val = __insn_fetchadd4((void *)&v->counter, i) + i; >>> barrier(); /* the "+ i" above will wait on memory */ >>> return val; >>> } >>> >>> Which leaves me confused on smp_mb__after_atomic(). >> Are you concerned about whether it has proper memory >> barrier semantics already, i.e. full barriers before and after? >> In fact we do have a full barrier before, but then because of the >> "+ i" / "barrier()", we know that the only other operation since >> the previous mb(), namely the read of v->counter, has >> completed after the atomic operation. As a result we can >> omit explicitly having a second barrier. >> >> It does seem like all the current memory-order semantics are >> correct, unless I'm missing something! > So I'm reading that code like: > > MB > [RmW] ret = *val += i > > > So what is stopping later memory ops like: > > [R] a = *foo > [S] *bar = b > > From getting reordered with the RmW, like: > > MB > > [R] a = *foo > [S] *bar = b > > [RmW] ret = *val += i > > Are you saying Tile does not reorder things like that? If so, why then > is smp_mb__after_atomic() a full mb(). If it does, I don't see how your > add_return is correct. > > Alternatively I'm just confused.. Tile does not do out-of-order instruction issue, but it does have an out-of-order memory subsystem, so in addition to stores becoming unpredictably visible without a memory barrier, loads will also potentially not read from memory predictably after issue. As a result, later operations that use a register that was previously loaded may stall instruction issue until the load value is available. A memory fence instruction will cause the core to wait for all stores to become visible and all load values to be available. So [R] can't move up to before [RmW] due to the in-order issue nature of the processor. And smp_mb__after_atomic() has to be a full mb() because that's the only barrier we have available to guarantee that the load has read from memory. (If the value of the actual atomic was passed to smp_mb__after_atomic() then we could just generate a fake use of the value, basically generating something like "move r1, r1", which would cause the instruction issue to halt until the value had been read.) -- Chris Metcalf, EZChip Semiconductor http://www.ezchip.com