From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752002AbdG0O4K (ORCPT ); Thu, 27 Jul 2017 10:56:10 -0400 Received: from mx1.redhat.com ([209.132.183.28]:60146 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751938AbdG0O4J (ORCPT ); Thu, 27 Jul 2017 10:56:09 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 21BDC32E8AC Authentication-Results: ext-mx05.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx05.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=aarcange@redhat.com Date: Thu, 27 Jul 2017 16:55:59 +0200 From: Andrea Arcangeli To: Michal Hocko Cc: "Kirill A. Shutemov" , Andrew Morton , David Rientjes , Tetsuo Handa , Oleg Nesterov , Hugh Dickins , linux-mm@kvack.org, LKML Subject: Re: [PATCH] mm, oom: allow oom reaper to race with exit_mmap Message-ID: <20170727145559.GD29716@redhat.com> References: <20170724161146.GQ25221@dhcp22.suse.cz> <20170725142626.GJ26723@dhcp22.suse.cz> <20170725151754.3txp44a2kbffsxdg@node.shutemov.name> <20170725152300.GM26723@dhcp22.suse.cz> <20170725153110.qzfz7wpnxkjwh5bc@node.shutemov.name> <20170725160359.GO26723@dhcp22.suse.cz> <20170725191952.GR29716@redhat.com> <20170726054557.GB960@dhcp22.suse.cz> <20170726162912.GA29716@redhat.com> <20170727065023.GB20970@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170727065023.GB20970@dhcp22.suse.cz> User-Agent: Mutt/1.8.3 (2017-05-23) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Thu, 27 Jul 2017 14:56:09 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 27, 2017 at 08:50:24AM +0200, Michal Hocko wrote: > Yes this will work and it won't depend on the oom_lock. But isn't it > just more ugly than simply doing > > if (tsk_is_oom_victim) { > down_write(&mm->mmap_sem); > locked = true; > } > free_pgtables(...) > [...] > if (locked) > down_up(&mm->mmap_sem); To me not doing if (tsk_is_oom...) { down_write; up_write } is by default a confusing implementation, because it's not strict and not strict code is not self documenting and you've to think twice of why you're doing something the way you're doing it. The doubt on what was the point to hold the mmap_sem during free_pgtables is precisely why I started digging into this issue because it didn't look possible you could truly benefit from holding the mmap_sem during free_pgtables. I also don't like having a new invariant that your solution relies on, that is mm->mmap = NULL, when we can make just set the MMF_OOM_SKIP a bit earlier that it gets set anyway and use that to control the other side of the race. I like strict code that uses as fewer invariants as possible and that never holds a lock for any instruction more than it is required (again purely for self documenting reasons, the CPU won't notice much one instruction more or less). Even with your patch the two branches are unnecessary, that may not be measurable, but it's still wasted CPU. It's all about setting mm->mmap before the up_write. In fact my patch should at least put an incremental unlikely around my single branch added to exit_mmap. I see the {down_write;up_write} Hugh's ksm_exit-like as a strict solution to this issue and I wrote it specifically while trying to research a way to be more strict because from the start it didn't look the holding of the mmap_sem during free_pgtables was necessary. I'm also fine to drop the oom_lock but I think it can be done incrementally as it's a separate issue, my second patch should allow for it with no adverse side effects. All I care about is the exit_mmap path because it runs too many times not to pay deep attention to every bit of it ;). Thanks, Andrea