From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out30-133.freemail.mail.aliyun.com (out30-133.freemail.mail.aliyun.com [115.124.30.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CCC67248886 for ; Mon, 26 Jan 2026 20:13:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.133 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769458423; cv=none; b=iivoHaMGBuNhXn9nLcrcm1hGOUAnMuE7y5Hku7ES2bb1yDG+tiUEtJx20Ww95ZjUF7MzNdjWKPILUzZbFRAvOQlhPucsAH1YIFTQWP1m/+usZj2A/mxILTooePtJ5z79mFyJJfjDgJ2lK83m9Wpbc9fPY2g3z6spv0PitS0UoOE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769458423; c=relaxed/simple; bh=YcrML87ShYkGKB6ZyS7KSUrjevy8i/2PiiASIF3Ueu4=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=ryap/UF+8prJaRY5wy5hIDoq4RXFba846NTdi89TmdA/ZkHjX2ODNZFc2zgGXOYJIZMDQj3A+LKyNpKf872wLb7PzZMgo5oSxrWFGVSS/Ou37b6hsFBni2MDpCLhERR9L8ThPfgNr7KddCuf8G8KArLKaSjmTb4cDs7EsujpIEo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=PuU+KAGZ; arc=none smtp.client-ip=115.124.30.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="PuU+KAGZ" DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1769458413; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=gu7IeuPqkTXcHbpTJTB+5i/GAquRe193dCB4L9W0mjA=; b=PuU+KAGZiM0Q4g+7X1BfwSEwlMZ3IPRliM9m7w1PoQvO0QL8uK+IDa2grm+bC9LGfZ2t5M9VAQogO2gNFWCJjteNVuW2G1HHJ+nhdA7JhhxB7Gn0ZLLjOig/kuu1bQ+mapf3bPJ+OQ/91wFI03G+3CtgFfGGhsm21R1OVVPl+KU= Received: from 30.180.182.138(mailfrom:hsiangkao@linux.alibaba.com fp:SMTPD_---0WxzDUWW_1769458411 cluster:ay36) by smtp.aliyun-inc.com; Tue, 27 Jan 2026 04:13:32 +0800 Message-ID: Date: Tue, 27 Jan 2026 04:13:31 +0800 Precedence: bulk X-Mailing-List: multikernel@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [ANNOUNCE] DAXFS: A zero-copy, dmabuf-friendly filesystem for shared memory To: Cong Wang , Matthew Wilcox Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Cong Wang , multikernel@lists.linux.dev References: <55e3d9f6-50d2-48c0-b7e3-fb1c144cf3e8@linux.alibaba.com> From: Gao Xiang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 2026/1/27 03:48, Cong Wang wrote: > On Mon, Jan 26, 2026 at 11:16 AM Matthew Wilcox wrote: >> >> On Mon, Jan 26, 2026 at 09:38:23AM -0800, Cong Wang wrote: >>> If you are interested in adding multikernel support to EROFS, here is >>> the codebase you could start with: >>> https://github.com/multikernel/linux. PR is always welcome. >> >> I think the onus is rather the other way around. Adding a new filesystem >> to Linux has a high bar to clear because it becomes a maintenance burden >> to the rest of us. Convince us that what you're doing here *can't* >> be done better by modifying erofs. >> >> Before I saw the email from Gao Xiang, I was also going to suggest that >> using erofs would be a better idea than supporting your own filesystem. >> Writing a new filesystem is a lot of fun. Supporting a new filesystem >> and making it production-quality is a whole lot of pain. It's much >> better if you can leverage other people's work. That's why DAX is a >> support layer for filesystems rather than its own filesystem. > > Great question. > > The core reason is multikernel assumes little to none compatibility. > > Specifically for this scenario, struct inode is not compatible. This > could rule out a lot of existing filesystems, except read-only ones. I don't quite get the point here, assuming you know filesystems. > > Now back to EROFS, it is still based on a block device, which > itself can't be shared among different kernels. ramdax is actually > a perfect example here, its label_area can't be shared among > different kernels. > > Let's take one step back: even if we really could share a device > with multiple kernels, it still could not share the memory footprint, > with DAX + EROFS, we would still get: > 1) Each kernel creates its own DAX mappings > 2) And faults pages independently > > There is no cross-kernel page sharing accounting. > > I hope this makes sense. No, EROFS on-disk format designs for any backend, so you could use this format backed by: 1) raw block device 2) file 3) a pure ramdaxfs (it's still WIP) Why not? because an ordinary container image user doesn't assume a fs especially for a particular type of device, especially for golden image usage. You cannot say, oh, I build an image, maybe, you have to use it just for ramdax usage, oh, you backed by a file on the block device, you have to convert to another format to use: EROFS on-disk format should allow for _all the device backend_. At a quick glance of your code, it seems it's much premature and ineffective because subdirectories just like a link chain, and maybe it is only somewhat reasonable for ramdax usage, but it's still _not_ cache-friendly. The reason why it doesn't work for you because _multikernel_ isn't an offical upsteam requirement, all upstream virtualization users directly use virtio-pmem now. I think for the upstream kernels, you'd like to make multikernel an offical upstream requirement first, then there will be drivers for you to do multikernel ramdax, rather than the raw usage of 1) memremap 2) vmf_insert_mixed in the filesystem drivers, I do think they are _red line_ for any new filesytem drivers (instead of legacy cramfs MTD XIP old code). Anyway, I really think your current use cases are already covered by EROFS for many years. Thanks, Gao Xiang > > Regards, > Cong