From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2406C7EE43 for ; Tue, 6 Jun 2023 20:12:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239714AbjFFUMI (ORCPT ); Tue, 6 Jun 2023 16:12:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45816 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239688AbjFFUL5 (ORCPT ); Tue, 6 Jun 2023 16:11:57 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 399FA10F7 for ; Tue, 6 Jun 2023 13:11:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1686082270; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lzgqGsD7+CrrqOciLpvZvEV1lKYjsInQHGa9T++ytWk=; b=JSIJOwSqemO3mL+JsKt1qTZNP2OiDRpAaDCM8aoP7FJhxPSVP5u0OljfR0MgxGrG0+J26j vYHl2wzeqObZvQD6pfAd43ZcW9O2TGupENTOtPiLe7CcnSKrL5kw9EwbLH7tW095SgSz3X grpBYmVNZMs8Ssv/duZMXVk0An2/zvE= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-648-7QNlY6cENyyi_C61lCve9w-1; Tue, 06 Jun 2023 16:11:04 -0400 X-MC-Unique: 7QNlY6cENyyi_C61lCve9w-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 843AE185A791; Tue, 6 Jun 2023 20:11:03 +0000 (UTC) Received: from [10.22.34.1] (unknown [10.22.34.1]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5284C40CFD46; Tue, 6 Jun 2023 20:11:02 +0000 (UTC) Message-ID: <563fd5e1-650a-e329-8aab-2fa1953a9f49@redhat.com> Date: Tue, 6 Jun 2023 16:11:02 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.7.1 Subject: Re: [RFC PATCH 0/5] cgroup/cpuset: A new "isolcpus" paritition Content-Language: en-US To: Tejun Heo Cc: =?UTF-8?Q?Michal_Koutn=c3=bd?= , Zefan Li , Johannes Weiner , Jonathan Corbet , Shuah Khan , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Juri Lelli , Valentin Schneider , Frederic Weisbecker , Mrunal Patel , Ryan Phillips , Brent Rowsell , Peter Hunt , Phil Auld References: <759603dd-7538-54ad-e63d-bb827b618ae3@redhat.com> <405b2805-538c-790b-5bf8-e90d3660f116@redhat.com> <18793f4a-fd39-2e71-0b77-856afb01547b@redhat.com> From: Waiman Long In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 Precedence: bulk List-ID: X-Mailing-List: linux-doc@vger.kernel.org On 6/6/23 15:58, Tejun Heo wrote: > Hello, Waiman. > > On Mon, Jun 05, 2023 at 10:47:08PM -0400, Waiman Long wrote: > ... >> I had a different idea on the semantics of the cpuset.cpus.exclusive at the >> beginning. My original thinking is that it was the actual exclusive CPUs >> that are allocated to the cgroup. Now if we treat this as a hint of what >> exclusive CPUs should be used and it becomes valid only if the cgroup can > I wouldn't call it a hint. It's still hard allocation of the CPUs to the > cgroups that own them. Setting up a partition requires exclusive CPUs and > thus would depend on exclusive allocations set up accordingly. > >> become a valid partition. I can see it as a value that can be hierarchically >> set throughout the whole cpuset hierarchy. >> >> So a transition to a valid partition is possible iff >> >> 1) cpuset.cpus.exclusive is a subset of cpuset.cpus and is a subset of >> cpuset.cpus.exclusive of all its ancestors. > Yes. > >> 2) If its parent is not a partition root, none of the CPUs in >> cpuset.cpus.exclusive are currently allocated to other partitions. This the > Not just that, the CPUs aren't available to cgroups which don't have them > set in the .exclusive file. IOW, if a CPU is in cpus.exclusive of some > cgroups, it shouldn't appear in cpus.effective of cgroups which don't have > the CPU in their cpus.exclusive. > > So, .exclusive explicitly establishes exclusive ownership of CPUs and > partitions depend on that with an implicit "turn CPUs exclusive" behavior in > case the parent is a partition root for backward compatibility. The current CPU exclusive behavior is limited to sibling cgroups only. Because of the hierarchical nature of cpu distribution, the set of exclusive CPUs have to appear in all its ancestors. When partition is enabled, we do a sibling exclusivity test at that point to verify that it is exclusive. It looks like you want to do an exclusivity test even when the partition isn't active. I can certainly do that when the file is being updated. However, it will fail the write if the exclusivity test fails just like the v1 cpuset.cpus.exclusive flag if you are OK with that. > >> same remote partition concept in my v2 patch. If its parent is a partition >> root, part of its exclusive CPUs will be distributed to this child partition >> like the current behavior of cpuset partition. > Yes, similar in a sense. Please do away with the "once .reserve is used, the > behavior is switched" part. That behavior has been gone in my v2 patch. > Instead, it can be sth like "if the parent is a > partition root, cpuset implicitly tries to set all CPUs in its cpus file in > its cpus.exclusive file" so that user-visible behavior stays unchanged > depending on past history. If parent is a partition root, auto reservation will be done and cpus.exclusive will be set automatically just like before. So existing applications using partition will not be affected. Cheers, Longman