From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx2.suse.de ([195.135.220.15]:58449 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751456AbcADNCs (ORCPT ); Mon, 4 Jan 2016 08:02:48 -0500 Date: Mon, 4 Jan 2016 05:02:31 -0800 From: Davidlohr Bueso To: Manfred Spraul Cc: Andrew Morton , LKML , 1vier1@web.de, Ingo Molnar , felixh@informatik.uni-bremen.de, stable@vger.kernel.org Subject: Re: [PATCH] ipc/sem.c: Fix complex_count vs. simple op race Message-ID: <20160104130231.GA3013@linux-uzut.site> References: <1451736291-8115-1-git-send-email-manfred@colorfullife.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: <1451736291-8115-1-git-send-email-manfred@colorfullife.com> Sender: stable-owner@vger.kernel.org List-ID: On Sat, 02 Jan 2016, Manfred Spraul wrote: >Commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") introduced a race: > >sem_lock has a fast path that allows parallel simple operations. >There are two reasons why a simple operation cannot run in parallel: >- a non-simple operations is ongoing (sma->sem_perm.lock held) >- a complex operation is sleeping (sma->complex_count != 0) > >As both facts are stored independently, a thread can bypass the current >checks by sleeping in the right positions. See below for more details >(or kernel bugzilla 105651). > >The patch fixes that by creating one variable (complex_mode) >that tracks both reasons why parallel operations are not possible. > >The patch also updates stale documentation regarding the locking. > >With regards to stable kernels: >The patch is required for all kernels that include the commit 6d07b68ce16a >("ipc/sem.c: optimize sem_lock()") (3.10?) > >The alternative is to revert the patch that introduced the race. I am just now catching up with this, but a quick thought is that we probably want to keep 6d07b68ce16a as waiting on unlocking all sem->lock should be much worse for performance than keeping track of the complex 'mode'. Specially if we have a large array. Also, any idea what workload exposed this race? Anyway, will take a closer look at the patch/issue. Thanks, Davidlohr