Netdev List
 help / color / mirror / Atom feed
From: Askar Safin <safinaskar@gmail.com>
To: avagin@gmail.com
Cc: akpm@linux-foundation.org, alexander@mihalicyn.com,
	axboe@kernel.dk, bernd@bsbernd.com, brauner@kernel.org,
	criu@lists.linux.dev, david@kernel.org, dhowells@redhat.com,
	fuse-devel@lists.linux.dev, hch@infradead.org, jack@suse.cz,
	joannelkoong@gmail.com, linux-api@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, miklos@szeredi.hu, netdev@vger.kernel.org,
	patches@lists.linux.dev, pfalcato@suse.de, rostedt@goodmis.org,
	safinaskar@gmail.com, torvalds@linux-foundation.org,
	val@packett.cool, viro@zeniv.linux.org.uk, willy@infradead.org
Subject: Re: [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2
Date: Tue, 23 Jun 2026 12:42:11 +0300	[thread overview]
Message-ID: <20260623094211.1080873-1-safinaskar@gmail.com> (raw)
In-Reply-To: <CANaxB-xVCP5HSUNwphFrKPdW0Qh1pA33A6npac60WArkZMFt7w@mail.gmail.com>

Andrei Vagin <avagin@gmail.com>:
> Actually, this change introduces a performance and functional
> regression for CRIU.
> 
> Here is a brief overview of how CRIU currently dumps memory pages:
> 
> CRIU injects a parasite code blob into the target process's address
> space. The parasite invokes vmsplice() with the SPLICE_F_GIFT flag to
> pin physical pages directly inside a pipe without copying them. The main
> CRIU process then takes over from outside the target context, calling
> splice() on the other end of the pipe to stream the data directly into
> checkpoint image files or a remote network socket.
> 
> I ran a simple test that creates an anonymous mapping and touches every
> page within it:
> Without this patch, CRIU takes 9 seconds to dump the test process.
> With this patch, It takes 18 seconds...
> 
> Plus, it obviously introduces some memory overhead.
> 
> If these changes are merged, we will need to completely rework the
> memory dumping mechanism in CRIU. Using vmsplice() in this proposed form
> no longer makes any sense for our architecture...

I just have read some docs for CRIU. I found this statement:

> #### Why `splice` is Better:
> *   **Consistency via COW**: The `SPLICE_F_GIFT` flag ensures that if the process modifies a "gifted" page after resuming, the kernel performs a **Copy-on-Write (COW)**. The pipe buffer > continues to hold the *original* version of the page as it existed at the moment of the `vmsplice()` call, ensuring a perfectly consistent snapshot of that page.

This is wrong (with released kernels). I confirmed this by testing this on my current kernel (6.12.90).

See the code in the end of this message.

If you actually rely on mentioned consistency, then, it seems, CRIU is broken.

So, in fact, my patch actually brings consistency to CRIU. :)

-- 
Askar Safin




#define _GNU_SOURCE

#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/uio.h>
#include <sys/wait.h>
#include <errno.h>

int
main (void)
{
    int p[2];
    if (pipe (p) != 0)
        abort ();
    char buf[1] = {'a'};
    struct iovec iov[] = {
        {
            .iov_base = buf,
            .iov_len = 1,
        }
    };
    // I pass "SPLICE_F_NONBLOCK | SPLICE_F_GIFT" here, because this is what criu passes
    if (vmsplice (p[1], iov, 1, SPLICE_F_NONBLOCK | SPLICE_F_GIFT) != 1)
        abort ();
    if (close (p[1]) != 0)
        abort ();
    buf[0] = 'b';
    char buf2[1];
    if (read (p[0], buf2, 1) != 1)
        abort ();
    printf ("[%c]\n", buf2[0]); // Prints "b" as opposed to "a" on Linux 6.12.90
    return 0;
}

  reply	other threads:[~2026-06-23  9:42 UTC|newest]

Thread overview: 108+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-31  1:01 [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 Askar Safin
2026-05-31  1:01 ` [PATCH 1/3] tee: fs/splice.c: remove unused parameter "flags" from "link_pipe" Askar Safin
2026-05-31  1:01 ` [PATCH 2/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 Askar Safin
2026-06-03 20:56   ` Stefan Metzmacher
2026-06-03 21:17     ` Askar Safin
2026-06-04  9:06       ` David Laight
2026-06-04 14:17         ` Linus Torvalds
2026-06-04 17:38           ` David Laight
2026-06-04 19:30             ` Linus Torvalds
2026-06-04 21:32               ` David Laight
2026-06-04 21:42                 ` Linus Torvalds
2026-06-05  9:32                   ` Florian Weimer
2026-06-05 15:54                     ` Linus Torvalds
2026-06-05 16:27                       ` Linus Torvalds
2026-06-05 16:30                       ` Florian Weimer
2026-06-05 17:12                         ` Linus Torvalds
2026-06-06  9:16                           ` David Laight
2026-06-05  1:57                 ` Nathan Chancellor
2026-06-05  8:23                   ` David Laight
2026-06-04 23:25             ` Askar Safin
2026-06-05 11:02   ` Mark Brown
2026-06-05 16:02     ` Linus Torvalds
2026-06-05 16:26       ` Mark Brown
2026-06-05 17:21         ` David Hildenbrand (Arm)
2026-06-08 17:19   ` Alexander Gordeev
2026-06-08 18:42     ` David Hildenbrand (Arm)
2026-06-16  0:36     ` Askar Safin
2026-05-31  1:01 ` [PATCH 3/3] splice: remove PIPE_BUF_FLAG_GIFT Askar Safin
2026-05-31  8:54 ` [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 Pedro Falcato
2026-05-31 19:01   ` David Hildenbrand (Arm)
2026-05-31 21:21   ` Askar Safin
2026-06-01 16:16     ` Christian Brauner
2026-06-02 21:12   ` Askar Safin
2026-06-02 21:37     ` Pedro Falcato
2026-06-02 22:06       ` Linus Torvalds
2026-06-02 22:41         ` Pedro Falcato
2026-06-02 23:07           ` Askar Safin
2026-06-02 22:54         ` Askar Safin
2026-06-03  0:05           ` Linus Torvalds
2026-06-03  1:08             ` Askar Safin
2026-06-03  3:51             ` Andy Lutomirski
2026-06-03  4:20               ` Linus Torvalds
2026-06-03  6:45                 ` Christian Brauner
2026-06-03 13:40                   ` Christian Brauner
2026-06-03 15:26                     ` Linus Torvalds
2026-06-03 18:10                 ` Andy Lutomirski
2026-06-03 18:28                   ` Linus Torvalds
2026-06-03 19:22                     ` David Howells
2026-06-03 19:59                     ` Linus Torvalds
2026-06-03 21:31                     ` Andy Lutomirski
2026-06-03 21:36                       ` Linus Torvalds
2026-06-03 21:38                         ` Linus Torvalds
2026-06-03 22:23                         ` Andy Lutomirski
2026-06-03 22:53                           ` Linus Torvalds
2026-06-05 15:15                             ` Stefan Metzmacher
2026-06-05 15:58                               ` Linus Torvalds
2026-06-10  5:14                                 ` Herbert Xu
2026-06-10 14:17                                   ` Linus Torvalds
2026-06-10  5:09                             ` Herbert Xu
2026-06-10  9:55                               ` David Laight
2026-06-03 22:43                       ` Askar Safin
2026-06-03 22:49                         ` Andy Lutomirski
2026-06-03 23:00                           ` Askar Safin
2026-06-04  0:01                             ` Linus Torvalds
2026-06-03 18:12                 ` Jakub Kicinski
2026-06-05  9:43                 ` Stefan Metzmacher
2026-06-05 12:19                   ` David Laight
2026-06-05 15:20                     ` Stefan Metzmacher
2026-06-06 10:22                       ` David Laight
2026-06-03 11:43               ` Pedro Falcato
2026-06-03 18:14                 ` Jakub Kicinski
2026-06-01  3:11 ` Andy Lutomirski
2026-06-01 15:36   ` Matthew Wilcox
2026-06-01 15:50     ` Linus Torvalds
2026-06-01 16:17       ` Christian Brauner
2026-06-01 16:22         ` Linus Torvalds
2026-06-03 19:24       ` David Howells
2026-06-01 16:23 ` Christian Brauner
2026-06-01 17:17   ` Linus Torvalds
2026-06-01 17:33     ` Al Viro
2026-06-01 20:04       ` Steven Rostedt
2026-06-02  0:28         ` Andrew Morton
2026-06-02  8:25           ` David Hildenbrand (Arm)
2026-06-02 18:44             ` Eric Biggers
2026-06-03  7:50               ` David Hildenbrand (Arm)
2026-06-04  6:32           ` Willy Tarreau
2026-06-04 14:31             ` Linus Torvalds
2026-06-04 15:53               ` Willy Tarreau
2026-06-04 15:58                 ` Linus Torvalds
2026-06-04 16:15                   ` Willy Tarreau
2026-06-05 15:41                     ` Willy Tarreau
2026-06-05 20:54                   ` The 8472
2026-06-04 15:53             ` Andy Lutomirski
2026-06-04 16:09               ` Willy Tarreau
2026-06-04 17:25                 ` Andy Lutomirski
2026-06-03  9:57       ` Miklos Szeredi
2026-06-15  6:25       ` Val Packett
2026-06-15 13:11         ` Joanne Koong
2026-06-16  1:15           ` Askar Safin
2026-06-16  6:38             ` Joanne Koong
2026-06-16  7:38               ` David Hildenbrand (Arm)
2026-06-17 11:07           ` Christian Brauner
2026-06-17 19:57             ` Andrei Vagin
2026-06-18  6:34               ` Andrei Vagin
2026-06-23  9:42                 ` Askar Safin [this message]
2026-06-05  8:35   ` Collin Funk
2026-06-04  0:45 ` [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 Askar Safin
2026-06-04  1:52   ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260623094211.1080873-1-safinaskar@gmail.com \
    --to=safinaskar@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander@mihalicyn.com \
    --cc=avagin@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=bernd@bsbernd.com \
    --cc=brauner@kernel.org \
    --cc=criu@lists.linux.dev \
    --cc=david@kernel.org \
    --cc=dhowells@redhat.com \
    --cc=fuse-devel@lists.linux.dev \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=joannelkoong@gmail.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=miklos@szeredi.hu \
    --cc=netdev@vger.kernel.org \
    --cc=patches@lists.linux.dev \
    --cc=pfalcato@suse.de \
    --cc=rostedt@goodmis.org \
    --cc=torvalds@linux-foundation.org \
    --cc=val@packett.cool \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox