From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 818292D2496; Sun, 22 Mar 2026 17:40:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774201225; cv=none; b=qlqEatBRe4uZ5ZYzJyC5CxaZf2YKhKqX2vV8G5ohayxwD/O3A/IW9Pz5LVIDeRNf2uH6F9xiNeBpRYZIWnf5pqaLqTMW6VmBoVqQLeHdcwGbcV0kPQNF0dQ27BOLahXVycOTqI8rnEN5T23FufmUV+PYbHQ/Vwgok940OPYSTO4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774201225; c=relaxed/simple; bh=t1kmCjtJgBoqc5PGMnOUBWW1n3gt6R1pJatljm8Xo5k=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Y2PcR2ZAJN4wM1gAsAdcVrOgxI48yt/ZZ7spoQ9LBbeo1eeVMX4fwWc+nKTZkn2Fnveo3zuSAGbD8XNmZsCIfXO/8olVBTC3KjcmQ02LRKdmp3bxClgH5a6ZC26x160pvXGUoyuZr2o9N+iZwHlVamblZJXRERMQ/63cHGXFhDA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=dQ4Cel6C; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="dQ4Cel6C" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0E7A8C19424; Sun, 22 Mar 2026 17:40:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1774201225; bh=t1kmCjtJgBoqc5PGMnOUBWW1n3gt6R1pJatljm8Xo5k=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=dQ4Cel6CZjAb6jCoyMoJdsoWtXwH1HECSI5Fldc6qTvmsclGT0FN1w4Xya6STcLF9 vGRco51ydJMZq8iri3WweLfPiKdGfZSE2ZJmr9uWAONMMAM2Wj3aZfUxtd8HqYzbRX 6D2dm0Xi8QsRIqgoqNPDiDRpK+MZeELGNV7uKTZ9VR2tPnLmtIlVxuTPzjXKvn1wRE SV8GwCYIpQTrLAu4qdWyQz6uCORu3eRLAI1iqAdXgntxc4D9qLeqqTRi8l4P/tQPik 9jCvOAmll1Zzqheby51ACgIBrJg8wBErQ6fuheIywnPDfscMYtXiY4WO5XkpuTfeL5 kPciwjYOdThlg== From: SeongJae Park To: SeongJae Park Cc: Andrew Morton , damon@lists.linux.dev, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: (sashiko review) [RFC PATCH v4 01/10] mm/damon/core: introduce damon_ctx->paused Date: Sun, 22 Mar 2026 10:40:16 -0700 Message-ID: <20260322174018.83729-1-sj@kernel.org> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20260322170700.83123-1-sj@kernel.org> References: Precedence: bulk X-Mailing-List: damon@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On Sun, 22 Mar 2026 10:06:59 -0700 SeongJae Park wrote: > Forwarding Sashiko review for doing discussions via mails. > > # review url: https://sashiko.dev/#/patchset/20260322155728.81434-2-sj@kernel.org > # start of sashiko.dev inline review > commit 2a3be2c8b6c327ef72163cd54e6121ef72b9c932 > Author: SeongJae Park > > mm/damon/core: introduce damon_ctx->paused > > DAMON supports only start and stop of the execution. When it is > stopped, its internal data that it self-trained goes away. It will be > useful if the execution can be paused and resumed with the previous > self-trained data. > > > diff --git a/mm/damon/core.c b/mm/damon/core.c > > index db6c67e52d2b..0ab2cfa848e6 100644 > > --- a/mm/damon/core.c > > +++ b/mm/damon/core.c > > [ ... ] > > > @@ -3003,6 +3004,14 @@ static int kdamond_fn(void *data) > > kdamond_call(ctx, false); > > if (ctx->maybe_corrupted) > > break; > > + while (ctx->pause) { > > + damos_walk_cancel(ctx); > > Is it safe to repeatedly call damos_walk_cancel() here? > > In damos_walk(), the walk_control_lock is dropped after setting > ctx->walk_control to check damon_is_running(), which opens a race window. > > Concurrently, kdamond_fn() calls damos_walk_cancel(), which reads > ctx->walk_control, drops the lock, and completes the old request. > > If a new caller invokes damos_walk() and successfully registers a new > walk_control, could damos_walk_cancel() resume, reacquire the lock, > and unconditionally set ctx->walk_control = NULL, dropping the new > request? > > This might leave the new caller permanently deadlocked in > wait_for_completion(). Nice catch. Orthogonal to this patch, though. The deadlock could happen in below sequence. CPU0 │CPU1 ──────────────────────────────┼──────────────────────── damos_walk() │ │register request │ │wait completion │damos_walk_cancel() │ │ │complete the request ▼wakeup,return │ │ damos_walk() │ │ │register new request │ │ │ │ │remove the new request │wait completion │ ▼return ▼ nobody completes it. │ Nonetheless, kdamond_fn() is calling damos_walk() already in several places including this loop. This issue hence exists regardless of this patch. I will work on fixing this as a separate hotfix. Below fix may work. ''' --- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -2321,7 +2321,9 @@ static void damos_walk_cancel(struct damon_ctx *ctx) control->canceled = true; complete(&control->completion); mutex_lock(&ctx->walk_control_lock); - ctx->walk_control = NULL; + /* A new damos_walk() caller could added a new request meanwhile */ + if (ctx->walk_control == control) + ctx->walk_control = NULL; mutex_unlock(&ctx->walk_control_lock); } ''' > > > + kdamond_usleep(ctx->attrs.sample_interval); > > Could this result in a busy loop if sample_interval is 0? > > The DAMON API allows attrs.sample_interval to be 0 (for example, when > tuning is disabled via goal.aggrs == 0). > > If sample_interval is 0, this invokes usleep_range_idle(0, 1), which > sleeps for at most 1 microsecond using an hrtimer. > > Since the normally expensive monitoring operations are bypassed while > paused, would this cause the thread to loop almost instantly and create > an interrupt storm? This is an intended behavior for users who really impatient. So this should be fine. Thanks, SJ [...]