From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 30AD218C03F for ; Sun, 22 Jun 2025 11:46:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750592811; cv=none; b=pkAQpJSYbakWKZ922648eMwWhWXfZnAhU4g8GV4RLL9ie0uofoMtbZEcoonwYGPteMTnKH49HqfrjT4Pj4oqwkR9xx0B1Fsi+55fBzr8Ys7+u5rOSQtMK6v8KCPCCx/JySQ6C6BQY7Yf9wI//WU4GwwnwE3f5arhWiibgRZZXuo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750592811; c=relaxed/simple; bh=2oG38Q0GsFsLgRs552DyDfT2HVe8ihP3NKRL2apCSco=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: In-Reply-To:Content-Type:Content-Disposition; b=aIF84fpZGXjC0JjxGPS2hCfdarI8pSVV9eHMUhQo9OAkShG4bLHtSlar+sgn+JWD3Qu9hdPB8KNwnQdLJggDSLuMvReY13Y0x2spl6rEW75vIv8CNc8riaxSJ/b8SaPM9vx1mIo33go7i/4ZKI2g20N/8m7HOG2M0p0zebDu6/U= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=cO6PNFYu; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="cO6PNFYu" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1750592807; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3fwu4wIu37+BQVKx5/JoQxIBmmuBVAZzveltwGrIK2A=; b=cO6PNFYus8bRiqBM3xRwYqG/jFhYMx0trzLtNor3LH/vAYZZEV1QksLcKPSP+unFE4pqrz TlovtWWoy7wxcUOD+2kftuC4YzAL874IO0b6l7pLBRdSwWwbp+wCVixpGlF3DmJPr8aSQ9 58LMJ5bfsz6eehLzCJt658SJgLNELtg= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-360-QaQjGknGPlmUawUpwCipBA-1; Sun, 22 Jun 2025 07:46:46 -0400 X-MC-Unique: QaQjGknGPlmUawUpwCipBA-1 X-Mimecast-MFC-AGG-ID: QaQjGknGPlmUawUpwCipBA_1750592805 Received: by mail-wr1-f71.google.com with SMTP id ffacd0b85a97d-3a6d90929d6so222037f8f.2 for ; Sun, 22 Jun 2025 04:46:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750592805; x=1751197605; h=organization:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=3fwu4wIu37+BQVKx5/JoQxIBmmuBVAZzveltwGrIK2A=; b=Zuus/KClVMPq0MwNB7B2e/Pg9TvFwCOtCOfb6BE3waJtThUagAPcDFbfWIVHtjLHxJ 5eEbz8F61lEtl0mWqIsWyFpYbcnDVy10IhvwvscxqalVgdZb3QXjg+C8X8e466M4VfjZ b3Q7TEPX8S+f6abJhvm36ECBrtEp6SXqO7UzElNDOnxcPq/RMCCRC3pOcOhWk6h+dDNk rKggIhNSsKLlmMvU6VErx0QHGrySnwH5gHIIw/+72d9ji6A16X521hRZs9Pdy7k1RSN1 iS8ug/tPmkRt4zJF+YCLG9eodr4oPMjhMPsulM0kQzwR39T56lUfBaU7389H87ptfPyj o0tw== X-Forwarded-Encrypted: i=1; AJvYcCXZjcRh2qBjno2+324VSEIDYsIG53WJAXiopgYzrQhCD/qNHy2CdAKXff9A/tkkQZnA8LqH@lists.linux.dev X-Gm-Message-State: AOJu0Yyfcf8jzK0zyUNCTUHI45SrJsum0eg5FEZ/NVwg5jb99VQvkhW5 zNjSdOWFhBsAHYj0MxDlB7J3k3R9HsEgiqne0hduW39rDJ4o/WBvqYX61vJgYIvOIUqXnvGHg98 5XTbmY0LcKqbX3lL+bg6HdzjeVxDuLuCw9pnRPLbjjgWUMdbuwdwryPU= X-Gm-Gg: ASbGncsAsBhwnSPf5bSnZ6086XP1qOPFUyTY1tRWmtdt4/1GlVw3rmv/ImhIOtH/vEn sE9km7kCRRNDaRENk/1z2JJgPKBSIAKN8qOb+jh1NaVA0usHrRvOKTY/7QfPnBEdksFaIeoOALI H7oJUxy36DDci7fe9bLiyGyzgjqmRueCbMZIaXverQrGZS2yrc0kaf56pkk+fpEnlVVDgt44T32 /p1s5wKOQOgKmidEYyl9L/NznBhW+rJRUsw9eat8hZzcvbKVn9XiQyANFwEBo+VinO1UY9sbui/ Mr0iuourxnp6Qf2+rdSaczP4kxG1D1SrnWImaSeQbYRFHtoRYsWfMGTSc264CDTBa2gxMgKQcxB oHkZcq+LYnOI= X-Received: by 2002:a5d:64ed:0:b0:3a4:d994:be7d with SMTP id ffacd0b85a97d-3a6d12e21f8mr8308997f8f.23.1750592805256; Sun, 22 Jun 2025 04:46:45 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGCEvznbOA94dgr3meR9Tx/+nXhvdmRCl5wWPTcloTlj6Ms+f5KakaXj8KlZD08hN8QhZ6MYw== X-Received: by 2002:a5d:64ed:0:b0:3a4:d994:be7d with SMTP id ffacd0b85a97d-3a6d12e21f8mr8308984f8f.23.1750592804737; Sun, 22 Jun 2025 04:46:44 -0700 (PDT) Received: from dcbz.redhat.com (p200300fe2f0a45fc6b5d3353abf0efd0.dip0.t-ipconnect.de. [2003:fe:2f0a:45fc:6b5d:3353:abf0:efd0]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3a6d1189977sm6714079f8f.82.2025.06.22.04.46.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 22 Jun 2025 04:46:44 -0700 (PDT) Date: Sun, 22 Jun 2025 13:46:42 +0200 From: Adrian Reber To: Andrei Vagin Cc: Radostin Stoyanov , Andrei Vagin , criu@lists.linux.dev Subject: Re: Optimizing C/R Image Format for Kubernetes Message-ID: References: Precedence: bulk X-Mailing-List: criu@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In-Reply-To: X-Operating-System: Linux (6.13.8-200.fc41.x86_64) X-Load-Average: 1.71 1.44 1.38 X-Unexpected: The Spanish Inquisition X-GnuPG-Key: gpg --recv-keys D3C4906A X-Url: Organization: Red Hat X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: jt1uAU69jOrwXvOaRoO5kgxZQsDi31HW9aeL2yiDCa8_1750592805 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit On Fri, Jun 20, 2025 at 12:34:22PM -0700, Andrei Vagin wrote: > On Thu, Jun 19, 2025 at 4:06 AM Adrian Reber wrote: > ... > > > > > Here's my vision for an ideal image format for C/R-ed containers: > > > * Filesystem Delta as an Overlay Layer: The filesystem delta should be > > > treated just like any other container image delta. This means it would > > > be specified as one of the overlay layers when a container is mounted. > > > > Yes. The current format was my wrong decision as I was not familiar > > with how those delta layers are working. > > > > > * Directly Accessible CRIU Images: Once an image is pulled locally, the > > > CRIU images should not be bundled in a tar archive. Instead, they > > > should be placed directly in a directory, allowing CRIU to use them > > > immediately without any extra extraction steps. > > > > This is not actually true. The OCI image does not contain the tar > > archive but the actual checkpoint files directly: > > > > # podman pull quay.io/adrianreber/checkpoint-test:tag73 > > Trying to pull quay.io/adrianreber/checkpoint-test:tag73... > > Getting image source signatures > > Copying blob e65839d7ec1b done > > Copying config 27d63848a3 done > > Writing manifest to image destination > > Storing signatures > > 27d63848a32d24c68b131f99880411c11af6519820ef22b989a86b7f10038c79 > > # podman image mount quay.io/adrianreber/checkpoint-test:tag73 > > /var/lib/containers/storage/overlay/98aaf3c7dc28cfb2e79893ef952380b00169dcce910be48bbea1143b07ae2a0e/merged > > # ls -la /var/lib/containers/storage/overlay/98aaf3c7dc28cfb2e79893ef952380b00169dcce910be48bbea1143b07ae2a0e/merged > > total 44 > > dr-xr-xr-x. 1 root root 4096 Jun 19 10:53 . > > drwx------. 6 root root 4096 Jun 19 10:53 .. > > -rw-------. 1 root root 1120 Feb 1 11:11 bind.mounts > > drw-------. 2 root root 4096 Feb 1 11:11 checkpoint > > -rw-------. 1 root root 616 Feb 1 11:11 config.dump > > -rw-------. 1 root root 0 Feb 1 11:11 dump.log > > -rw-r--r--. 1 root root 315 Feb 1 11:11 io.kubernetes.cri-o.LogPath > > -rw-r--r--. 1 root root 2048 Feb 1 11:11 rootfs-diff.tar > > -rw-------. 1 root root 11276 Feb 1 11:11 spec.dump > > -rw-r--r--. 1 root root 49 Feb 1 11:11 stats-dump > > > > We currently have some metadata defined in > > github.com/checkpoint-restore/checkpointctl which we want to use in > > all three projects (podman, containerd and cri-o). > > You know, maybe there's a difference between CRI-O and containerd. > I followed the steps from the containerd test to create an image: > https://github.com/containerd/containerd/blob/main/contrib/checkpoint/checkpoint-restore-kubernetes-test.sh#L105 > > root@gke-cluster-1-default-pool-595f3f31-2wft:/home/avagin# docker > create --name test-image avagin/test-cpt:0.5 ls > c80dbf467d99a0e3a6684d6cb36d29c212a6b12bdfc9af8abe2ffe3fcb69a5de > root@gke-cluster-1-default-pool-595f3f31-2wft:/home/avagin# docker > export test-image | tar -t > .dockerenv > blobs/ > blobs/sha256/ > blobs/sha256/5159244823d7bfa959a4249c912ffef669c5596fcf41a866264823152b6dbba9 > blobs/sha256/9178f6d56b033b8221dda746c3fd9ad98552569f05e66241365ef8a722da96be > blobs/sha256/eca4c8bdd20acb007a5594777ace63727d2c17413a54d3a5a817e252d0390902 > dev/ > dev/console > dev/pts/ > dev/shm/ > etc/ > etc/hostname > etc/hosts > etc/mtab > etc/resolv.conf > index.json > oci-layout > proc/ > sys/ > root@gke-cluster-1-default-pool-595f3f31-2wft:/home/avagin# docker > export test-image | tar -x -C test-img/ > # tar -tf test-img/blobs/sha256/eca4c8bdd20acb007a5594777ace63727d2c17413a54d3a5a817e252d0390902 > checkpoint/ > checkpoint/cgroup.img > checkpoint/core-1.img > checkpoint/core-8.img > checkpoint/descriptors.json > checkpoint/fdinfo-2.img > checkpoint/fdinfo-3.img > checkpoint/files.img > checkpoint/fs-1.img I am a bit confused. Using the following steps I see this: # kubectl apply -f /root/sleep.yaml pod/sleeper created # CP=$(curl -s --insecure --cert /var/run/kubernetes/client-admin.crt --key /var/run/kubernetes/client-admin.key -X POST "https://localhost:10250/checkpoint/default/sleeper/sleep" | jq -r ".items[0]") # newcontainer=$(buildah from scratch) # buildah add "$newcontainer" $CP / # buildah config --annotation=org.criu.checkpoint.container.name=test "$newcontainer" # buildah commit "$newcontainer" checkpoint-image:latest # buildah rm "$newcontainer" # podman image mount checkpoint-image:latest /var/lib/containers/storage/overlay/58681367751de52d5c779da8ee826d3ba51b21c880e4051f88ee64746d02017e/merged # ls -la /var/lib/containers/storage/overlay/58681367751de52d5c779da8ee826d3ba51b21c880e4051f88ee64746d02017e/merged total 32 dr-xr-xr-x. 1 root root 155 Jun 22 13:27 . drwx------. 6 root root 69 Jun 22 13:27 .. drwx------. 2 root root 4096 Jun 22 13:27 checkpoint -rw-------. 1 root root 555 Jun 22 13:27 config.dump -rw-------. 1 root root 0 Jun 22 13:27 container.log -rw-r--r--. 1 root root 202 Jun 22 13:27 rootfs-diff.tar -rw-r--r--. 1 root root 4424 Jun 22 13:27 spec.dump -rw-------. 1 root root 46 Jun 22 13:27 stats-dump -rw-------. 1 root root 298 Jun 22 13:27 status -rw-------. 1 root root 1666 Jun 22 13:27 status.dump # cat /var/lib/containers/storage/overlay/58681367751de52d5c779da8ee826d3ba51b21c880e4051f88ee64746d02017e/merged/config.dump | jq { "id": "d974adb0cc366bbb49ef83123eac019f2326b90c5af6eab18db0abb6a084c329", "name": "sleep_sleeper_default_250c35ee-e0a4-4bf2-a681-09d7b3faf175_1", "rootfsImage": "quay.io/adrianreber/sleep:alpine", "rootfsImageRef": "quay.io/adrianreber/sleep@sha256:d504e702fa984e59d0573ff23a16023adb16a5405abf4ba35a64a62dbc9d3a6d", "rootfsImageName": "quay.io/adrianreber/sleep:alpine", "runtime": "io.containerd.runc.v2", "createdTime": "2025-06-22T11:27:13.446696907Z", "checkpointedTime": "2025-06-22T13:27:18.835719622+02:00", "restoredTime": "0001-01-01T00:00:00Z", "restored": false } I guess docker export provides something else than podman image mount. But, whatever we have right now, we can change it to something better. No problem. We are the authors of all the implementations in containerd and CRI-O (and Podman) and can change it. Adrian