From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8D526C433FE for ; Wed, 12 Oct 2022 07:55:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229829AbiJLHzx (ORCPT ); Wed, 12 Oct 2022 03:55:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42148 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229811AbiJLHzw (ORCPT ); Wed, 12 Oct 2022 03:55:52 -0400 Received: from mail-pj1-x1030.google.com (mail-pj1-x1030.google.com [IPv6:2607:f8b0:4864:20::1030]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CF8D319034 for ; Wed, 12 Oct 2022 00:55:50 -0700 (PDT) Received: by mail-pj1-x1030.google.com with SMTP id a5-20020a17090aa50500b002008eeb040eso2963002pjq.1 for ; Wed, 12 Oct 2022 00:55:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to:subject :user-agent:mime-version:date:message-id:from:to:cc:subject:date :message-id:reply-to; bh=qiqNvIkOrZUzletpRV3bkCPEPfM+NFMYt7swiI2PMTg=; b=eWwavECkNBx7QDz3LrB5tUigO6qg+o/VFkgPbH/i4E9uOLTWrmHuji/EqOwlqmNZ/4 CHijdWChIMxdO+ZtQu5JBp58xwglXywYbjPhiebzNPDrl7xTDNW90igd/aWecF9xZsch u37EX40PeR89u8lripyec1NSK0qBIWu+4c0teNsZLb6C+x/jfW+1ZMmSi6H9M1ONW+mB yOzvpQ2j7dhknBUg4YVPqzRD/qspgPW4P8l7dBuAdNkIt/dVmD9H1faPStcHveMXCd6C j2zQzaPXtLdgDFu2PDUxM/a4EazEqNJRLSS+DIvuR36jw/zYDfCk1bU6aOjoqxQVa+vJ S5kQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to:subject :user-agent:mime-version:date:message-id:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=qiqNvIkOrZUzletpRV3bkCPEPfM+NFMYt7swiI2PMTg=; b=1f+Jv6JhCjo2gA/vvrS4i/TW6bmgjkHKfkgXA45IN6C8zDpMVAUW4bED2PAm6G4xKx kBPJvvtvS9AZE8zTKogwiL87KQPMzcoqlyYfzp56rjWgwWr04/Mq7llyDUm/Hi5fKxxD CK31i1nu7u4XF3eVAuhSYnks8HJJPvFvoRRGCgGegiOvH3BX5qiju5dhqDfcFyPEvljV 6sXM5tTFThOPAaoF4vRlnBEA+CNYpz6MPOF/VmgqtXKAa8hgfew362/9gixt3b4Tb1y4 6yzwj3QMAWWUeucbn7Jm2HOMqLMstxRRoICCEecfvjhtSnjp8QrxXOvf1ftore/1Ta+D X6Cw== X-Gm-Message-State: ACrzQf1UzfxJL7bjxPWgKGwzJ2nuA/YWAmWWYVnVE9UUZpfMvjO4Vwi5 gWeX72nEaOk1kPiG033zTlgu5Y3a2NUXIA== X-Google-Smtp-Source: AMsMyM6Mop5fQSWO9SSyVhxQQI6VpyWF9Lwlbp0HTZdM2K3anCfuy8BCtnUThbozq5SvynKT1CbmCw== X-Received: by 2002:a17:902:c94d:b0:17d:9ea9:9c02 with SMTP id i13-20020a170902c94d00b0017d9ea99c02mr29512846pla.68.1665561350354; Wed, 12 Oct 2022 00:55:50 -0700 (PDT) Received: from [10.68.76.92] ([139.177.225.229]) by smtp.gmail.com with ESMTPSA id o11-20020a17090ab88b00b0020af2411721sm789475pjr.34.2022.10.12.00.55.46 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 12 Oct 2022 00:55:49 -0700 (PDT) Message-ID: <582cf257-bc0d-c96e-e72e-9164cff4fce1@bytedance.com> Date: Wed, 12 Oct 2022 15:55:44 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.3.2 Subject: Re: [External] Re: [RFC] mm: add new syscall pidfd_set_mempolicy() To: Michal Hocko Cc: corbet@lwn.net, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, wuyun.abel@bytedance.com References: <20221010094842.4123037-1-hezhongkun.hzk@bytedance.com> From: Zhongkun He In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-api@vger.kernel.org Hi michal, thanks for your reply and suggestiones. > Please add some explanation why the cpuset interface is not usable for > that usecase. OK. >> To solve the issue, this patch introduces a new syscall >> pidfd_set_mempolicy(2). it sets the NUMA memory policy of the thread >> specified in pidfd. >> >> In current process context there is no locking because only the process >> accesses its own memory policy, so task_work is used in >> pidfd_set_mempolicy() to update the mempolicy of the process specified >> in pidfd, avoid using locks and race conditions. > > Why cannot you alter kernel_set_mempolicy (and do_set_mempolicy) to > accept a task rather than operate on current? I have tried it before this patch, but I found a problem.The allocation and update of mempolicy are in the current context, so it is not protected by any lock.But when the mempolicy is modified by other processes, the race condition appears. Say something like the following pidfd_set_mempolicy target task stack alloc_pages mpol = get_task_policy; task_lock(task); old = task->mempolicy; task->mempolicy = new; task_unlock(task); mpol_put(old); page = __alloc_pages(mpol); There is a situation that when the old mempolicy is released, the target task is still using the policy.It would be better if there are suggestions on this case. > I have to really say that I dislike the task_work approach because it > detaches the syscall from the actual operation and the caller simply > doesn't know when the operation has been completed. I agree with you.This is indeed a problem. > Please also describe the security model.got it.