From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-177.mta1.migadu.com (out-177.mta1.migadu.com [95.215.58.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7D125399892 for ; Fri, 3 Jul 2026 09:02:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.177 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1783069373; cv=none; b=Qgj3fycCQ4r538TbaAyzfWXIVOTo4ibmMFCRr6w7/i2xUPskX/PMIDy+MBNiLiZAIdsNqm11CYpmnALSERN77JsH4z1IF87QaztTN+aJrgn/kIjQLAyLa69xQUcP9fpr+/QrzWtzD2VyTcuUgpa+S1E24rNA56a+VoG3S4ksnt4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1783069373; c=relaxed/simple; bh=JAwPI/hmIxe2EFHyscxT9tmXGIEsrX5r+QlEOoqDRv0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version:Content-Type; b=IuwmsPJB4X7MRFZzKMSGHmqPPONYB7mSkWLAUSxsbMQLOJtPyvODghfCRauqjGcV6SIk14snZoytwhcB01StlLRt3aDilWefxJCmA3s/UD6N2Glx6fTYRiQaWs31Cnwk/yrPO2F5V7N92IKuvT7nlKhpUr00dVEe+ltIoBgEJ/o= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=pJSxGIwi; arc=none smtp.client-ip=95.215.58.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="pJSxGIwi" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1783069369; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LzXLulUbBSbZLvJAq7z0tj7+rdC0gx8arHAJzkhxD9k=; b=pJSxGIwiCViVhM7YfkOworfjBjsSN6hJ51Ojh+C07ZWE94Uigy7V2fl54i7o2f54b17f9n SGqOyOgHBjWFc29laKmIvkt1gCeQ6vVeZ57eAnm6JMmZllpxQgcWkH30BRjMWNi6pGFt8r cCL2HmJEDDTUmc3Ct0RnBM4vYIRPkOA= From: Lance Yang To: david@kernel.org Cc: pfalcato@suse.de, akpm@linux-foundation.org, ljs@kernel.org, baolin.wang@linux.alibaba.com, liam@infradead.org, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org, lance.yang@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, stable@vger.kernel.org, viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, willy@infradead.org, song@kernel.org, ehagberg@janestreet.com, ziy@nvidia.com, gleventhal@janestreet.com Subject: Re: [PATCH stable] mm/khugepaged: write all dirty file folios when collapsing Date: Fri, 3 Jul 2026 17:02:32 +0800 Message-Id: <20260703090232.26261-1-lance.yang@linux.dev> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT On Fri, Jul 03, 2026 at 10:55:42AM +0200, David Hildenbrand (Arm) wrote: >On 7/2/26 18:54, Pedro Falcato wrote: >> As-is, khugepaged and writable-file opening exclude each other. A file >> cannot be open writeable and have THPs (because the filesystem is not aware >> of them). khugepaged will never collapse file pages for files that are >> opened writeable. On an open(O_RDWR/O_WRONLY), the page cache for that >> particular file is dropped. This is fine because nothing could've been >> dirtied. >> >> However, there is an edge-case: collapse_file() might not be able to >> coexist with concurrent writers, but it can coexist with dirty folios >> (from previous writers). Therefore, the following can happen: >> >> open(file, O_RDWR) >> write(file) >> close(file) > >Okay, folios are dirty. > >> madvise(file_mapping, MADV_COLLAPSE, some non-dirty range) > >collapse_file() has > > if (!is_shmem && (folio_test_dirty(folio) || > folio_test_writeback(folio))) { > ... > result = SCAN_PAGE_DIRTY_OR_WRITEBACK; > goto out_unlock; > } > >Making us abort collapse. > >What am I missing? Hmm ... dirty folios can be outside the range being collapsed ... For example: write/dirty: [6M, 8M) MADV_COLLAPSE: [0M, 2M) collapse_file() only checks the folios in the collapse range, so the dirty/writeback check passes for [0M, 2M). But after that, for the old READ_ONLY_THP_FOR_FS case, nr_thps gets bumped for the mapping. Then a later writable open can hit ... filemap_nr_thps(mapping) -> truncate_inode_pages(mapping, 0) and that drops page cache for the whole mapping, including the dirty folios at [6M, 8M) ... Cheers, Lance