From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752607AbdLFSSb (ORCPT ); Wed, 6 Dec 2017 13:18:31 -0500 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:48070 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752529AbdLFSSY (ORCPT ); Wed, 6 Dec 2017 13:18:24 -0500 Date: Wed, 6 Dec 2017 10:18:20 -0800 From: "Paul E. McKenney" To: David Rientjes Cc: linux-kernel@vger.kernel.org Subject: Re: set_bit() + down_write() Reply-To: paulmck@linux.vnet.ibm.com References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17120618-2213-0000-0000-00000246B0BC X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00008161; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000244; SDB=6.00956359; UDB=6.00483430; IPR=6.00736396; BA=6.00005729; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00018387; XFM=3.00000015; UTC=2017-12-06 18:18:20 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17120618-2214-0000-0000-0000585E14EF Message-Id: <20171206181820.GZ7829@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-12-06_07:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1712060260 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Dec 06, 2017 at 01:48:17AM -0800, David Rientjes wrote: > Hi Paul, > > I have a question about whether this code in exit_mmap() could be racy: > > set_bit(MMF_OOM_SKIP, &mm->flags); > if (unlikely(tsk_is_oom_victim(current))) { > /* > * Wait for oom_reap_task() to stop working on this > * mm. Because MMF_OOM_SKIP is already set before > * calling down_read(), oom_reap_task() will not run > * on this "mm" post up_write(). > * > * tsk_is_oom_victim() cannot be set from under us > * either because current->mm is already set to NULL > * under task_lock before calling mmput and oom_mm is > * set not NULL by the OOM killer only if current->mm > * is found not NULL while holding the task_lock. > */ > down_write(&mm->mmap_sem); > up_write(&mm->mmap_sem); > } > > This is supposed to serialize __oom_reap_task_mm() from operating on an mm > with MMF_OOM_SKIP set while under the protection of > down_read(&mm->mmap_sem). > > Is it possible that MMF_OOM_SKIP above actually gets set after up_write()? > > If so, that would explain why we see the oom reaper operating on mm's with > MMF_OOM_SKIP set. Well, set_bit() has no ordering semantics, but up_write() does provide some ordering, but is not fully ordered. So it all depends on what the other end is doing. If the other end is this same task, then it will see things fully ordered. If the other end holds ->mmap_sem, then it will see things fully ordered. If the other end does not hold ->mmap_sem, things are more complicated, and it depends on exactly what the other end is doing. As an example (not directly related to your example above), here is something that would -not- be guaranteed to be ordered: Task 0: ... WRITE_ONCE(x, 1); WRITE_ONCE(y, 1); up_write(&mm->mmap_sem); Task 1: down_write(&mm->mmap_sem); r1 = READ_ONCE(z); r2 = READ_ONCE(y); ... Task 2: WRITE_ONCE(z, 1); smp_mb(); r3 = READ_ONCE(x); It really is possible on some architectures to end up with r1==0, r2==1, and r3==0. But what exactly was the other end doing? What architecture were you running on? And what version of Linux were you using? Thanx, Paul