From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f48.google.com (mail-wm1-f48.google.com [209.85.128.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8CAD81F9F73 for ; Fri, 8 Aug 2025 08:58:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.48 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754643513; cv=none; b=djbNXtmi8P3TOiecAOkKPYi0N+o+tTJ6piqDIhWMt7iRsd3tVotDPvetNEZMR0ZZFNCqg/b+/yDBBO47K0Y/0j8cSIzo++CjBF+RLnmXQEcEIe720fRQxPqc6tKqz/gcSNFRo1Z9ijwGsrBYnNNmI9Oqb654CfngOqfDms8vwIE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754643513; c=relaxed/simple; bh=2Cd3py93CGXf5mOUzpOs5vVjwes2SdDigwaiSPyUqF8=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=rFc4Ye8PGOibvbpzAMGgC9iNkuikNGwwJO4pwFMXEnv0IoTHFBhaGNjwKDpe+ouKpiWWNyAuU/LlZTO6SAbQY0npWULGANhZ0j46cO5bHrJ0HuGVt4fBYFAmXtH3UVRphMe6VDnulPlRoRcu62AGmipG/iXqvNiGb0ukrzltqIM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b=M8iPgaqk; arc=none smtp.client-ip=209.85.128.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b="M8iPgaqk" Received: by mail-wm1-f48.google.com with SMTP id 5b1f17b1804b1-459d40d16bdso12236475e9.0 for ; Fri, 08 Aug 2025 01:58:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1754643510; x=1755248310; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=lOQVbhQSKupTIDYGcHNIv/5EXRTeGhpJxZzqsjXmSWw=; b=M8iPgaqkss+DKH2EQUlU/8ytVM/eCBYItZh2N8kCndZUrKryObgYTLTi0i0Qk6EdBe h6Zo4XQ8XBNxAeDTG+GYYOan6VGcnoY7C3p6sDLgiBkImDNJmqY0fcsqFJ1ma7trxMfM itf9dcU2rWBcQGTXsFj36ukdfeooZe105uPKYb4Jh1NizP2s0xTX3Rx/CYwOgg/obSeZ wEsqDo5Kt/6MzONNlhYxzU3EVZmsFR0BwOzJ8947OH/lHCccLf2N2mu80FaHUjPxNR15 QfdW1WFtfLhDsxY9Qw6agRDuhW0+Hbm7CPl7pJhy1SYOQ1lEL0U2LZ06A04+LAhtZraC VjJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1754643510; x=1755248310; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=lOQVbhQSKupTIDYGcHNIv/5EXRTeGhpJxZzqsjXmSWw=; b=pj9ZU0+L+sNQ9V4S/iZHIZnmJNsjH5R6dOA3nBU5MfvAAbh5611zDnLnEth+HaogpY 2fGD6KBVKRZRqtj2aMczij+hlhqzp1ZcEvdHlkO1AFxIhPu8sNfkliC7lkDfBB9WrmOl Gaw9JGt2WLVxpocqqVF0LmPJAeha/ZyU+zDR9d1Bxkoqduogku2gPZbQOSw0VpsBNU3e AKZCEGQpvZS0WjXYlPb1LhObgTmoZOomX8mqTsE127B5q1cxSo09SWERg4ByehejczL3 MO2poO7CFmh3IdogSOavU/LJTVgd+KthGHYzzpqnuCOCEs3yQTQ1DQXlKu0FQrP1sPtW hVVA== X-Forwarded-Encrypted: i=1; AJvYcCVUiuAbaqPBb7K8uFlI5fRbPv7XcReKnf9PJeHlRWHxG2VRj8ldpsBiAJV40MwltSc8IIIHwJ9Nm3c=@vger.kernel.org X-Gm-Message-State: AOJu0YxsnXAy4QUTODc/gkAqd3ZypRAzpEOgSKs5fTjzbpOsPY2vA23P xvGjkv+b/BCJsePR65e/z8FhtHljs1N5Ml33tbC75EmP0Xx4P/PJWoLxVvh1fX/6E4Y= X-Gm-Gg: ASbGncsVFs8vNnHCUpzUvbuYOThsBp3SdxCt6l3+7Rg6tKHaCyptaTFTXlvSJ5mQhWp odZGtSTOhdMSCog90xAOrklZ/4+Il20/Cdqphvly1TL+u30+WyzG5o+fd2BkdwwX8ORPq8ki3Rl 3CiIq8rJQb9PESv2crhbfyZ3wFtPhDMEOWXSa3+jaZR9/mN+byEElm+wUb5qEM73BuaMmA7vfxO luVdWxNZb6yi1APicEhb3HeQwt++EHWHBktJAYRi633IGKU37r3J7szR+JYRRPi9d/dPfpC8QxH PVfI5O0zDIVCuzH+3t3Av2ar2fEYmblMgDkGNTvMpNZpXAJohEkrFj9CO0UT4oWxADuMwR6Iu3c MgmyqVRuQh3nBS86gKmWdObrcCKyiMKvayyI= X-Google-Smtp-Source: AGHT+IFzVmndy14yZFXdn3AITFDDpuqCDmasSizVoBUj53EPQz0livj4I20cGXLIqIwR20S7zVle5Q== X-Received: by 2002:a05:600c:190b:b0:459:e048:af42 with SMTP id 5b1f17b1804b1-459f4fac94amr17073275e9.24.1754643509756; Fri, 08 Aug 2025 01:58:29 -0700 (PDT) Received: from localhost (109-81-80-221.rct.o2.cz. [109.81.80.221]) by smtp.gmail.com with UTF8SMTPSA id ffacd0b85a97d-3b79c47ae8esm29556901f8f.61.2025.08.08.01.58.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Aug 2025 01:58:29 -0700 (PDT) Date: Fri, 8 Aug 2025 10:58:28 +0200 From: Michal Hocko To: Zihuan Zhang Cc: "Rafael J . Wysocki" , Peter Zijlstra , Oleg Nesterov , David Hildenbrand , Jonathan Corbet , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , len brown , pavel machek , Kees Cook , Andrew Morton , Lorenzo Stoakes , "Liam R . Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Catalin Marinas , Nico Pache , xu xin , wangfushuai , Andrii Nakryiko , Christian Brauner , Thomas Gleixner , Jeff Layton , Al Viro , Adrian Ratiu , linux-pm@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH v1 0/9] freezer: Introduce freeze priority model to address process dependency issues Message-ID: References: <20250807121418.139765-1-zhangzihuan@kylinos.cn> <4c46250f-eb0f-4e12-8951-89431c195b46@kylinos.cn> Precedence: bulk X-Mailing-List: linux-doc@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Fri 08-08-25 15:52:31, Zihuan Zhang wrote: > > 在 2025/8/8 15:00, Michal Hocko 写道: > > On Fri 08-08-25 09:13:30, Zihuan Zhang wrote: > > [...] > > > However, in practice, we’ve observed cases where tasks appear stuck in > > > uninterruptible sleep (D state) during the freeze phase  — and thus cannot > > > respond to signals or enter the refrigerator. These tasks are technically > > > TASK_FREEZABLE, but due to the nature of their sleep state, they don’t > > > freeze promptly, and may require multiple retry rounds, or cause the entire > > > suspend to fail. > > Right, but that is an inherent problem of the freezer implemenatation. > > It is not really clear to me how priorities or layers improve on that. > > Could you please elaborate on that? > > Thanks for the follow-up. > > From our observations, we’ve seen processes like Xorg that are in a normal > state before freezing begins, but enter D state during the freeze window. > Upon investigation, > > we found that these processes often depend on other user processes (e.g., > I/O helpers or system services), and when those dependencies are frozen > first, the dependent process (like Xorg) gets stuck and can’t be frozen > itself. OK, I see. > This led us to treat such processes as “hard to freeze” tasks — not because > they’re inherently unfreezable, but because they are more likely to become > problematic if not frozen early enough. > > So our model works as follows: >     •    By default, freezer tries to freeze all freezable tasks in each > round. >     •    With our approach, we only attempt to freeze tasks whose > freeze_priority is less than or equal to the current round number. >     •    This ensures that higher-priority (i.e., harder-to-freeze) tasks > are attempted earlier, increasing the chance that they freeze before being > blocked by others. > > Since we cannot know in advance which tasks will be difficult to freeze, we > use heuristics: >     •    Any task that causes freeze failure or is found in D state during > the freeze window is treated as hard-to-freeze in the next attempt and its > priority is increased. >     •    Additionally, users can manually raise/reduce the freeze priority > of known problematic tasks via an exposed sysfs interface, giving them > fine-grained control. This would have been a very useful information for the changelog so that we can understand what you are trying to achieve. > This doesn’t change the fundamental logic of the freezer — it still retries > until all tasks are frozen — but by adjusting the traversal order, > >  we’ve observed significantly fewer retries and more reliable success in > scenarios where these D state transitions occur. OK, I believe I do understand what you are trying to achieve but I am not conviced this is a robust way to deal with the problem. This all seems highly timing specific that might work in very specific usecase but you are essentially trying to fight tiny race windows with a very probabilitistic interface. Also the interface seems to be really coarse grained and it can easily turn out insufficient for other usecases while it is not entirely clear to me how this could be extended for those. I believe it would be more useful to find sources of those freezer blockers and try to address those. Making more blocked tasks __set_task_frozen compatible sounds like a general improvement in itself. Thanks -- Michal Hocko SUSE Labs