Re: Change soft-dirty interface?

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Minchan Kim <minchan@kernel.org>
To: Pavel Emelyanov <xemul@parallels.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	KOSAKI Motohiro <kosaki.motohiro@gmail.com>,
	linux-mm@kvack.org
Subject: Re: Change soft-dirty interface?
Date: Sat, 15 Jun 2013 15:41:02 +0900	[thread overview]
Message-ID: <20130615064102.GA7470@gmail.com> (raw)
In-Reply-To: <51BB0065.7090408@parallels.com>


Hi Pavel,

Sorry for the delaying.
Maybe our timezone difference and my boys's interrupt.

On Fri, Jun 14, 2013 at 03:37:09PM +0400, Pavel Emelyanov wrote:
> On 06/14/2013 03:22 PM, Minchan Kim wrote:
> > Hello Pavel,
> > 
> > On Fri, Jun 14, 2013 at 02:01:23PM +0400, Pavel Emelyanov wrote:
> >>>>>>> If it's not allowed, another approach should be new system call.
> >>>>>>>
> >>>>>>>         int sys_softdirty(pid_t pid, void *addr, size_t len);
> >>>>>>
> >>>>>> This looks like existing sys_madvise() one.
> >>>>>
> >>>>> Except pid part. It is added by your purpose, which external task
> >>>>> can control any process.
> >>
> >> In CRIU we can work with pid-less syscalls just fine :) So extending regular
> >> madvise would work.
> > 
> > I didn't know that.
> > Just out of curiosity. How can CRIU control other tasks without pid?
> 
> We use the parasite-injection technique [1]. Briefly -- we put a code into
> other task's address space using ptrace() and /proc/PID/map_files/ and make
> this code run and do what we need. Thus we can call madvise() "on" another
> task.

Interesting.

> 
> [1] http://lwn.net/Articles/454304/


> 
> >>
> >>>>>>
> >>>>>>> If we approach new system call, we don't need to maintain current
> >>>>>>> proc interface and it would be very handy to get a information
> >>>>>>> without pagemap (open/read/close) so we can add a parameter to
> >>>>>>> get a dirty information easily.
> >>>>>>>
> >>>>>>>         int sys_softdirty(pid_t pid, void *addr, size_t len, unsigned char *vec)
> >>>>>>>
> >>>>>>> What do you think about it?
> >>>>>>>
> >>>>>>
> >>>>>> This is OK for me, though there's another issue with this API I'd like
> >>>>>> to mention -- consider your app is doing these tricks with soft-dirty
> >>>>>> and at the same time CRIU tools live-migrate it using the soft-dirty bits
> >>>>>> to optimize the freeze time.
> >>>>>>
> >>>>>> In that case soft-dirty bits would be in wrong state for both -- you app
> >>>>>> and CRIU, but with the proc API we could compare the ctime-s of the 
> >>>>>> clear_refs file and find out, that someone spoiled the soft-dirty state
> >>>>>> from last time we messed with it and handle it somehow (copy all the memory
> >>>>>> in the worst case). Can we somehow handle this with your proposal?
> >>>>>
> >>>>> Good point I didn't think over that.
> >>>>> A simple idea popped from my mind is we can use read/write lock
> >>>>> so if pid is equal to calling process's one or pid is NULL,
> >>>>> we use read side lock, which can allow marking soft-dirty 
> >>>>> several vmas with parallel. And pid is not equal to calling
> >>>>> process's one, the API should try to hold write-side lock
> >>>>> then, if it's fail, the API should return EAGAIN so that CRIU
> >>>>> can progress other processes and retry it after a while.
> >>>>> Of course, it would make live-lock so that sys_softdirty might
> >>>>> need another argument like "int block".
> >>>>
> >>>> And we need a flag to show SELF_SOFT_DIRTY or EXTERNAL_SOFT_DIRTY
> >>>> and the flag will be protected by above lock. It could prevent mixed
> >>>> case by self and external.
> >>>
> >>> I realized it's not enough. Another idea is here.
> >>> The intenion is followin as,
> >>>
> >>> self softdirty VS self softdirty -> NOT exclusive
> >>> self softdirty VS external softdirty -> exclusive
> >>> external softdirty VS external softdirty-> excluisve
> >>
> >> I think it might work for us. However, I have two comments to the
> >> implementation, please see below.
> >>
> >>> struct softdirty token {
> >>>         u64 external;
> >>>         u64 internal;
> >>> };
> >>>
> >>>        int sys_set_softdirty(pid_t pid, unsigned long start, size_t len,
> >>>                                 struct softdirty *token); 
> > 
> > I should have mentioned that start and len are ignored if pid is not eqaul
> > to caller's pid.
> 
> OK
> 
> >>>        int sys_get_softdirty(pid_t pid, unsigned long start, size_t len, 
> >>>                                 struct softdirty token, char *vec);
> >>
> >> Can you please show an example how to use these two, I don't quite get how
> >> can I do external soft-dirty tracking in atomic manner.
> > 
> > Hmm, I don't know how CRIU works but ...
> > 
> > 	while(1) {
> > 
> > 		struct softdirty token;
> > 		
> > 		sys_set_softdirty(tracked_pid, 0, 0, &token);
> > 		...
> > 		...
> > 		...
> > 		if (!sys_get_softdirty(tacked_pid, 0, 0, token, NULL))
> > 			break;
> > 	}
> > 
> > Maybe do you have a concern about live-lock?
> 
> No, I worry about potential races with which we or application can skip
> dirty page. Let me describe how CRIU uses existing soft-dirty implementation.
> 
> 1. stop the task we want to work on
> 2. read the /proc/pid/pagemap file to find out which pages to
>    read. Those with soft-dirty _cleared_ should be _skipped_
> 3. read task's memory at calculated bitmap
> 4. reset soft dirty bits on task
> 5. resume task execution

Let me try to parse as my term.

1. admin does "echo 4 > /proc/<target>/clear_refs"
2. admin stop the target
3. admin reads the /proc/<target>/pagemap and make bitmap
   with only soft-dirty marked pages so we can avoid unnecessary
   migration
4. admin reads target's dirtied pages via bitmap from 3
5. admin does "echo 4 > /proc/<target>/clear_refs" again to find
   future diry pages of the target.
6. admin resumes the target

Right?
If so, my interface is following as

1. admin does set_softdirty(target, 0, 0, &token);
   (set_softdirty clears all soft-dirty bit from target process's
   page table.
2. admin stop the target
3. admin reads the /proc/target/pagemap and make bitmap
   with only soft-dirty marked pages so we can avoid unnecessary
   migration. 
4. admins does get_softdirty(target, 0, 0, token) to confirm
   someone else spoiled since 1
4-1. If it is reports error, then admins discard the bitmap got
     from 3 and have to read all memory.
5. admin does set_softdirty(target, 0, 0, &token) again to find
   future dirty pages of the target  
5. admin resumes the target.

> 
> With the interface you propose the sequence presumably should look like
> 
> 1. stop the task we want to work on
> 2. call set_softdirty + get_softdirty to get the soft-dirty bitmap and
>    reset one. If it reports error, then the soft-dirty we did before is
>    spoiled and all memory should be read (iow -- bitmap should be filled
>    with 1-s)
> 3. read task's memory at calculated bitmap
> 4. resume task execution

> 
> Am I right with this? If yes, why do we need two calls, wouldn't it be better

I failed to parse your terms so I wrote scnario as my understanding
so please see my above sequence and if you have a comment, please ask
again.

> to merge them into one?

It's not hard part but I wanted to show my intention clearly.
If we all agree on, let's think over interface again.

Thanks!

> 
> Thanks,
> Pavel

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2013-06-15  6:41 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-13  1:53 Change soft-dirty interface? Minchan Kim
2013-06-13  9:10 ` Pavel Emelyanov
2013-06-14  0:32   ` Minchan Kim
2013-06-14  0:41     ` Minchan Kim
2013-06-14  5:07       ` Minchan Kim
2013-06-14 10:01         ` Pavel Emelyanov
2013-06-14 11:22           ` Minchan Kim
2013-06-14 11:37             ` Pavel Emelyanov
2013-06-15  6:41               ` Minchan Kim [this message]
2013-06-19  9:31                 ` Pavel Emelyanov
2013-06-21  1:41                   ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130615064102.GA7470@gmail.com \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=kosaki.motohiro@gmail.com \
    --cc=linux-mm@kvack.org \
    --cc=xemul@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).