linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Change soft-dirty interface?
@ 2013-06-13  1:53 Minchan Kim
  2013-06-13  9:10 ` Pavel Emelyanov
  0 siblings, 1 reply; 11+ messages in thread
From: Minchan Kim @ 2013-06-13  1:53 UTC (permalink / raw)
  To: Pavel Emelyanov, Andrew Morton; +Cc: KOSAKI Motohiro, linux-mm

Hi all, 

Sorry for late interrupting to promote patchset to the mainline.
I'd like to discuss our usecase so I'd like to change per-process
interface with per-range interface.

Our usecase is following as,

A application allocates a big buffer(A) and makes backup buffer(B)
for it and copy B from A.
Let's assume A consists of subranges (A-1, A-2, A-3, A-4).
As time goes by, application can modify anywhere of A.
In this example, let's assume A-1 and A-2 are modified.
When the time happen, we compare A-1 with B-1 to make
diff of the range(On every iteration, we don't need all range's diff by design)
and do something with diff, then we'd like to remark only the A-1 with
soft-dirty, NOT A's all range of the process to track the A-1's
further difference in future while keeping dirty information (A-2, A-3, A-4)
because we will make A-2's diff in next iteration.

We can't do it by existing interface.

So, I'd like to add [addr, len] argument with using proc

    echo 4 0x100000 0x3000 > /proc/self/clear_refs

It doesn't break anything but not sure everyone like the interface
because recently I heard from akpm following comment.

        https://lkml.org/lkml/2013/5/21/529

Although per-process reclaim is another story with this,
I feel he seems to hate doing something on proc interface with
/proc/pid/maps like above range parameter.

If it's not allowed, another approach should be new system call.

        int sys_softdirty(pid_t pid, void *addr, size_t len);

If we approach new system call, we don't need to maintain current
proc interface and it would be very handy to get a information
without pagemap (open/read/close) so we can add a parameter to
get a dirty information easily.

        int sys_softdirty(pid_t pid, void *addr, size_t len, unsigned char *vec)

What do you think about it?

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Change soft-dirty interface?
  2013-06-13  1:53 Change soft-dirty interface? Minchan Kim
@ 2013-06-13  9:10 ` Pavel Emelyanov
  2013-06-14  0:32   ` Minchan Kim
  0 siblings, 1 reply; 11+ messages in thread
From: Pavel Emelyanov @ 2013-06-13  9:10 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Andrew Morton, KOSAKI Motohiro, linux-mm

On 06/13/2013 05:53 AM, Minchan Kim wrote:
> Hi all, 
> 
> Sorry for late interrupting to promote patchset to the mainline.
> I'd like to discuss our usecase so I'd like to change per-process
> interface with per-range interface.
> 
> Our usecase is following as,
> 
> A application allocates a big buffer(A) and makes backup buffer(B)
> for it and copy B from A.
> Let's assume A consists of subranges (A-1, A-2, A-3, A-4).
> As time goes by, application can modify anywhere of A.
> In this example, let's assume A-1 and A-2 are modified.
> When the time happen, we compare A-1 with B-1 to make
> diff of the range(On every iteration, we don't need all range's diff by design)
> and do something with diff, then we'd like to remark only the A-1 with
> soft-dirty, NOT A's all range of the process to track the A-1's
> further difference in future while keeping dirty information (A-2, A-3, A-4)
> because we will make A-2's diff in next iteration.
> 
> We can't do it by existing interface.

So you need to track changes not in the whole range, but in sub-ranges.
OK.

> So, I'd like to add [addr, len] argument with using proc
> 
>     echo 4 0x100000 0x3000 > /proc/self/clear_refs
> 
> It doesn't break anything but not sure everyone like the interface
> because recently I heard from akpm following comment.
> 
>         https://lkml.org/lkml/2013/5/21/529
> 
> Although per-process reclaim is another story with this,
> I feel he seems to hate doing something on proc interface with
> /proc/pid/maps like above range parameter.
> 
> If it's not allowed, another approach should be new system call.
> 
>         int sys_softdirty(pid_t pid, void *addr, size_t len);

This looks like existing sys_madvise() one.

> If we approach new system call, we don't need to maintain current
> proc interface and it would be very handy to get a information
> without pagemap (open/read/close) so we can add a parameter to
> get a dirty information easily.
> 
>         int sys_softdirty(pid_t pid, void *addr, size_t len, unsigned char *vec)
> 
> What do you think about it?
> 

This is OK for me, though there's another issue with this API I'd like
to mention -- consider your app is doing these tricks with soft-dirty
and at the same time CRIU tools live-migrate it using the soft-dirty bits
to optimize the freeze time.

In that case soft-dirty bits would be in wrong state for both -- you app
and CRIU, but with the proc API we could compare the ctime-s of the 
clear_refs file and find out, that someone spoiled the soft-dirty state
from last time we messed with it and handle it somehow (copy all the memory
in the worst case). Can we somehow handle this with your proposal?

Thanks,
Pavel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Change soft-dirty interface?
  2013-06-13  9:10 ` Pavel Emelyanov
@ 2013-06-14  0:32   ` Minchan Kim
  2013-06-14  0:41     ` Minchan Kim
  0 siblings, 1 reply; 11+ messages in thread
From: Minchan Kim @ 2013-06-14  0:32 UTC (permalink / raw)
  To: Pavel Emelyanov; +Cc: Andrew Morton, KOSAKI Motohiro, linux-mm

Hello Pavel,

On Thu, Jun 13, 2013 at 01:10:50PM +0400, Pavel Emelyanov wrote:
> On 06/13/2013 05:53 AM, Minchan Kim wrote:
> > Hi all, 
> > 
> > Sorry for late interrupting to promote patchset to the mainline.
> > I'd like to discuss our usecase so I'd like to change per-process
> > interface with per-range interface.
> > 
> > Our usecase is following as,
> > 
> > A application allocates a big buffer(A) and makes backup buffer(B)
> > for it and copy B from A.
> > Let's assume A consists of subranges (A-1, A-2, A-3, A-4).
> > As time goes by, application can modify anywhere of A.
> > In this example, let's assume A-1 and A-2 are modified.
> > When the time happen, we compare A-1 with B-1 to make
> > diff of the range(On every iteration, we don't need all range's diff by design)
> > and do something with diff, then we'd like to remark only the A-1 with
> > soft-dirty, NOT A's all range of the process to track the A-1's
> > further difference in future while keeping dirty information (A-2, A-3, A-4)
> > because we will make A-2's diff in next iteration.
> > 
> > We can't do it by existing interface.
> 
> So you need to track changes not in the whole range, but in sub-ranges.
> OK.

Right.

> 
> > So, I'd like to add [addr, len] argument with using proc
> > 
> >     echo 4 0x100000 0x3000 > /proc/self/clear_refs
> > 
> > It doesn't break anything but not sure everyone like the interface
> > because recently I heard from akpm following comment.
> > 
> >         https://lkml.org/lkml/2013/5/21/529
> > 
> > Although per-process reclaim is another story with this,
> > I feel he seems to hate doing something on proc interface with
> > /proc/pid/maps like above range parameter.
> > 
> > If it's not allowed, another approach should be new system call.
> > 
> >         int sys_softdirty(pid_t pid, void *addr, size_t len);
> 
> This looks like existing sys_madvise() one.

Except pid part. It is added by your purpose, which external task
can control any process.

> 
> > If we approach new system call, we don't need to maintain current
> > proc interface and it would be very handy to get a information
> > without pagemap (open/read/close) so we can add a parameter to
> > get a dirty information easily.
> > 
> >         int sys_softdirty(pid_t pid, void *addr, size_t len, unsigned char *vec)
> > 
> > What do you think about it?
> > 
> 
> This is OK for me, though there's another issue with this API I'd like
> to mention -- consider your app is doing these tricks with soft-dirty
> and at the same time CRIU tools live-migrate it using the soft-dirty bits
> to optimize the freeze time.
> 
> In that case soft-dirty bits would be in wrong state for both -- you app
> and CRIU, but with the proc API we could compare the ctime-s of the 
> clear_refs file and find out, that someone spoiled the soft-dirty state
> from last time we messed with it and handle it somehow (copy all the memory
> in the worst case). Can we somehow handle this with your proposal?

Good point I didn't think over that.
A simple idea popped from my mind is we can use read/write lock
so if pid is equal to calling process's one or pid is NULL,
we use read side lock, which can allow marking soft-dirty 
several vmas with parallel. And pid is not equal to calling
process's one, the API should try to hold write-side lock
then, if it's fail, the API should return EAGAIN so that CRIU
can progress other processes and retry it after a while.
Of course, it would make live-lock so that sys_softdirty might
need another argument like "int block".

> 
> Thanks,
> Pavel
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Change soft-dirty interface?
  2013-06-14  0:32   ` Minchan Kim
@ 2013-06-14  0:41     ` Minchan Kim
  2013-06-14  5:07       ` Minchan Kim
  0 siblings, 1 reply; 11+ messages in thread
From: Minchan Kim @ 2013-06-14  0:41 UTC (permalink / raw)
  To: Pavel Emelyanov; +Cc: Andrew Morton, KOSAKI Motohiro, linux-mm

On Fri, Jun 14, 2013 at 09:32:13AM +0900, Minchan Kim wrote:
> Hello Pavel,
> 
> On Thu, Jun 13, 2013 at 01:10:50PM +0400, Pavel Emelyanov wrote:
> > On 06/13/2013 05:53 AM, Minchan Kim wrote:
> > > Hi all, 
> > > 
> > > Sorry for late interrupting to promote patchset to the mainline.
> > > I'd like to discuss our usecase so I'd like to change per-process
> > > interface with per-range interface.
> > > 
> > > Our usecase is following as,
> > > 
> > > A application allocates a big buffer(A) and makes backup buffer(B)
> > > for it and copy B from A.
> > > Let's assume A consists of subranges (A-1, A-2, A-3, A-4).
> > > As time goes by, application can modify anywhere of A.
> > > In this example, let's assume A-1 and A-2 are modified.
> > > When the time happen, we compare A-1 with B-1 to make
> > > diff of the range(On every iteration, we don't need all range's diff by design)
> > > and do something with diff, then we'd like to remark only the A-1 with
> > > soft-dirty, NOT A's all range of the process to track the A-1's
> > > further difference in future while keeping dirty information (A-2, A-3, A-4)
> > > because we will make A-2's diff in next iteration.
> > > 
> > > We can't do it by existing interface.
> > 
> > So you need to track changes not in the whole range, but in sub-ranges.
> > OK.
> 
> Right.
> 
> > 
> > > So, I'd like to add [addr, len] argument with using proc
> > > 
> > >     echo 4 0x100000 0x3000 > /proc/self/clear_refs
> > > 
> > > It doesn't break anything but not sure everyone like the interface
> > > because recently I heard from akpm following comment.
> > > 
> > >         https://lkml.org/lkml/2013/5/21/529
> > > 
> > > Although per-process reclaim is another story with this,
> > > I feel he seems to hate doing something on proc interface with
> > > /proc/pid/maps like above range parameter.
> > > 
> > > If it's not allowed, another approach should be new system call.
> > > 
> > >         int sys_softdirty(pid_t pid, void *addr, size_t len);
> > 
> > This looks like existing sys_madvise() one.
> 
> Except pid part. It is added by your purpose, which external task
> can control any process.
> 
> > 
> > > If we approach new system call, we don't need to maintain current
> > > proc interface and it would be very handy to get a information
> > > without pagemap (open/read/close) so we can add a parameter to
> > > get a dirty information easily.
> > > 
> > >         int sys_softdirty(pid_t pid, void *addr, size_t len, unsigned char *vec)
> > > 
> > > What do you think about it?
> > > 
> > 
> > This is OK for me, though there's another issue with this API I'd like
> > to mention -- consider your app is doing these tricks with soft-dirty
> > and at the same time CRIU tools live-migrate it using the soft-dirty bits
> > to optimize the freeze time.
> > 
> > In that case soft-dirty bits would be in wrong state for both -- you app
> > and CRIU, but with the proc API we could compare the ctime-s of the 
> > clear_refs file and find out, that someone spoiled the soft-dirty state
> > from last time we messed with it and handle it somehow (copy all the memory
> > in the worst case). Can we somehow handle this with your proposal?
> 
> Good point I didn't think over that.
> A simple idea popped from my mind is we can use read/write lock
> so if pid is equal to calling process's one or pid is NULL,
> we use read side lock, which can allow marking soft-dirty 
> several vmas with parallel. And pid is not equal to calling
> process's one, the API should try to hold write-side lock
> then, if it's fail, the API should return EAGAIN so that CRIU
> can progress other processes and retry it after a while.
> Of course, it would make live-lock so that sys_softdirty might
> need another argument like "int block".

And we need a flag to show SELF_SOFT_DIRTY or EXTERNAL_SOFT_DIRTY
and the flag will be protected by above lock. It could prevent mixed
case by self and external.

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Change soft-dirty interface?
  2013-06-14  0:41     ` Minchan Kim
@ 2013-06-14  5:07       ` Minchan Kim
  2013-06-14 10:01         ` Pavel Emelyanov
  0 siblings, 1 reply; 11+ messages in thread
From: Minchan Kim @ 2013-06-14  5:07 UTC (permalink / raw)
  To: Pavel Emelyanov; +Cc: Andrew Morton, KOSAKI Motohiro, linux-mm

On Fri, Jun 14, 2013 at 09:41:33AM +0900, Minchan Kim wrote:
> On Fri, Jun 14, 2013 at 09:32:13AM +0900, Minchan Kim wrote:
> > Hello Pavel,
> > 
> > On Thu, Jun 13, 2013 at 01:10:50PM +0400, Pavel Emelyanov wrote:
> > > On 06/13/2013 05:53 AM, Minchan Kim wrote:
> > > > Hi all, 
> > > > 
> > > > Sorry for late interrupting to promote patchset to the mainline.
> > > > I'd like to discuss our usecase so I'd like to change per-process
> > > > interface with per-range interface.
> > > > 
> > > > Our usecase is following as,
> > > > 
> > > > A application allocates a big buffer(A) and makes backup buffer(B)
> > > > for it and copy B from A.
> > > > Let's assume A consists of subranges (A-1, A-2, A-3, A-4).
> > > > As time goes by, application can modify anywhere of A.
> > > > In this example, let's assume A-1 and A-2 are modified.
> > > > When the time happen, we compare A-1 with B-1 to make
> > > > diff of the range(On every iteration, we don't need all range's diff by design)
> > > > and do something with diff, then we'd like to remark only the A-1 with
> > > > soft-dirty, NOT A's all range of the process to track the A-1's
> > > > further difference in future while keeping dirty information (A-2, A-3, A-4)
> > > > because we will make A-2's diff in next iteration.
> > > > 
> > > > We can't do it by existing interface.
> > > 
> > > So you need to track changes not in the whole range, but in sub-ranges.
> > > OK.
> > 
> > Right.
> > 
> > > 
> > > > So, I'd like to add [addr, len] argument with using proc
> > > > 
> > > >     echo 4 0x100000 0x3000 > /proc/self/clear_refs
> > > > 
> > > > It doesn't break anything but not sure everyone like the interface
> > > > because recently I heard from akpm following comment.
> > > > 
> > > >         https://lkml.org/lkml/2013/5/21/529
> > > > 
> > > > Although per-process reclaim is another story with this,
> > > > I feel he seems to hate doing something on proc interface with
> > > > /proc/pid/maps like above range parameter.
> > > > 
> > > > If it's not allowed, another approach should be new system call.
> > > > 
> > > >         int sys_softdirty(pid_t pid, void *addr, size_t len);
> > > 
> > > This looks like existing sys_madvise() one.
> > 
> > Except pid part. It is added by your purpose, which external task
> > can control any process.
> > 
> > > 
> > > > If we approach new system call, we don't need to maintain current
> > > > proc interface and it would be very handy to get a information
> > > > without pagemap (open/read/close) so we can add a parameter to
> > > > get a dirty information easily.
> > > > 
> > > >         int sys_softdirty(pid_t pid, void *addr, size_t len, unsigned char *vec)
> > > > 
> > > > What do you think about it?
> > > > 
> > > 
> > > This is OK for me, though there's another issue with this API I'd like
> > > to mention -- consider your app is doing these tricks with soft-dirty
> > > and at the same time CRIU tools live-migrate it using the soft-dirty bits
> > > to optimize the freeze time.
> > > 
> > > In that case soft-dirty bits would be in wrong state for both -- you app
> > > and CRIU, but with the proc API we could compare the ctime-s of the 
> > > clear_refs file and find out, that someone spoiled the soft-dirty state
> > > from last time we messed with it and handle it somehow (copy all the memory
> > > in the worst case). Can we somehow handle this with your proposal?
> > 
> > Good point I didn't think over that.
> > A simple idea popped from my mind is we can use read/write lock
> > so if pid is equal to calling process's one or pid is NULL,
> > we use read side lock, which can allow marking soft-dirty 
> > several vmas with parallel. And pid is not equal to calling
> > process's one, the API should try to hold write-side lock
> > then, if it's fail, the API should return EAGAIN so that CRIU
> > can progress other processes and retry it after a while.
> > Of course, it would make live-lock so that sys_softdirty might
> > need another argument like "int block".
> 
> And we need a flag to show SELF_SOFT_DIRTY or EXTERNAL_SOFT_DIRTY
> and the flag will be protected by above lock. It could prevent mixed
> case by self and external.

I realized it's not enough. Another idea is here.
The intenion is followin as,

self softdirty VS self softdirty -> NOT exclusive
self softdirty VS external softdirty -> exclusive
external softdirty VS external softdirty-> excluisve

struct softdirty token {
        u64 external;
        u64 internal;
};

       int sys_set_softdirty(pid_t pid, unsigned long start, size_t len,
                                struct softdirty *token); 
       int sys_get_softdirty(pid_t pid, unsigned long start, size_t len, 
                                struct softdirty token, char *vec);

SYSCALL(set_softdirty, ..., token)
{
        struct task_struct *tsk = task_from_pid(pid);
        mutex_lock(&mm->st_lock);
        if (tsk == current)
                tsk->mm->token.internal++; 
        else
                tsk->mm->token.external++;
        token->external = mm->token.external;
        token->internal = mm->token.internal;
        mutex_unlock(&mm->st_lock);
        ..
        ..

}

SYSCALL(get_softdirty, ..., token, ...)
{
        struct task_struct *tsk = task_from_pid(pid);
        mutex_lock(&mm->st_lock);
        if (tsk == current) {
                if (tsk->mm->token.external != token.external) {
                        mutex_unlock
                        return -EAGAIN;
                }
        } else {
                if (tsk->mm->token.external != token.external ||
                    tsk->mm->token.internal != token.internal) {
                        mutex_unlock;
                        return -EAGAIN;
                }
        }
        mutex_unlock(&mm->st_lock);
        ...
}




> 
> -- 
> Kind regards,
> Minchan Kim
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Change soft-dirty interface?
  2013-06-14  5:07       ` Minchan Kim
@ 2013-06-14 10:01         ` Pavel Emelyanov
  2013-06-14 11:22           ` Minchan Kim
  0 siblings, 1 reply; 11+ messages in thread
From: Pavel Emelyanov @ 2013-06-14 10:01 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Andrew Morton, KOSAKI Motohiro, linux-mm

>>>>> If it's not allowed, another approach should be new system call.
>>>>>
>>>>>         int sys_softdirty(pid_t pid, void *addr, size_t len);
>>>>
>>>> This looks like existing sys_madvise() one.
>>>
>>> Except pid part. It is added by your purpose, which external task
>>> can control any process.

In CRIU we can work with pid-less syscalls just fine :) So extending regular
madvise would work.

>>>>
>>>>> If we approach new system call, we don't need to maintain current
>>>>> proc interface and it would be very handy to get a information
>>>>> without pagemap (open/read/close) so we can add a parameter to
>>>>> get a dirty information easily.
>>>>>
>>>>>         int sys_softdirty(pid_t pid, void *addr, size_t len, unsigned char *vec)
>>>>>
>>>>> What do you think about it?
>>>>>
>>>>
>>>> This is OK for me, though there's another issue with this API I'd like
>>>> to mention -- consider your app is doing these tricks with soft-dirty
>>>> and at the same time CRIU tools live-migrate it using the soft-dirty bits
>>>> to optimize the freeze time.
>>>>
>>>> In that case soft-dirty bits would be in wrong state for both -- you app
>>>> and CRIU, but with the proc API we could compare the ctime-s of the 
>>>> clear_refs file and find out, that someone spoiled the soft-dirty state
>>>> from last time we messed with it and handle it somehow (copy all the memory
>>>> in the worst case). Can we somehow handle this with your proposal?
>>>
>>> Good point I didn't think over that.
>>> A simple idea popped from my mind is we can use read/write lock
>>> so if pid is equal to calling process's one or pid is NULL,
>>> we use read side lock, which can allow marking soft-dirty 
>>> several vmas with parallel. And pid is not equal to calling
>>> process's one, the API should try to hold write-side lock
>>> then, if it's fail, the API should return EAGAIN so that CRIU
>>> can progress other processes and retry it after a while.
>>> Of course, it would make live-lock so that sys_softdirty might
>>> need another argument like "int block".
>>
>> And we need a flag to show SELF_SOFT_DIRTY or EXTERNAL_SOFT_DIRTY
>> and the flag will be protected by above lock. It could prevent mixed
>> case by self and external.
> 
> I realized it's not enough. Another idea is here.
> The intenion is followin as,
> 
> self softdirty VS self softdirty -> NOT exclusive
> self softdirty VS external softdirty -> exclusive
> external softdirty VS external softdirty-> excluisve

I think it might work for us. However, I have two comments to the
implementation, please see below.

> struct softdirty token {
>         u64 external;
>         u64 internal;
> };
> 
>        int sys_set_softdirty(pid_t pid, unsigned long start, size_t len,
>                                 struct softdirty *token); 
>        int sys_get_softdirty(pid_t pid, unsigned long start, size_t len, 
>                                 struct softdirty token, char *vec);

Can you please show an example how to use these two, I don't quite get how
can I do external soft-dirty tracking in atomic manner.

> 
> SYSCALL(set_softdirty, ..., token)
> {
>         struct task_struct *tsk = task_from_pid(pid);
>         mutex_lock(&mm->st_lock);
>         if (tsk == current)
>                 tsk->mm->token.internal++; 
>         else
>                 tsk->mm->token.external++;
>         token->external = mm->token.external;
>         token->internal = mm->token.internal;
>         mutex_unlock(&mm->st_lock);
>         ..
>         ..
> 
> }
> 
> SYSCALL(get_softdirty, ..., token, ...)
> {
>         struct task_struct *tsk = task_from_pid(pid);
>         mutex_lock(&mm->st_lock);
>         if (tsk == current) {
>                 if (tsk->mm->token.external != token.external) {
>                         mutex_unlock
>                         return -EAGAIN;
>                 }
>         } else {
>                 if (tsk->mm->token.external != token.external ||
>                     tsk->mm->token.internal != token.internal) {
>                         mutex_unlock;
>                         return -EAGAIN;
>                 }
>         }
>         mutex_unlock(&mm->st_lock);

Presumably the critical section should be longer, as if tokens match and we
release the lock and proceed with working on pagemap, the concurrent call
to set_softdirty can proceed and spoil the picture.

>         ...
> }
> 
> 
> 
> 
>>
>> -- 
>> Kind regards,
>> Minchan Kim
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Change soft-dirty interface?
  2013-06-14 10:01         ` Pavel Emelyanov
@ 2013-06-14 11:22           ` Minchan Kim
  2013-06-14 11:37             ` Pavel Emelyanov
  0 siblings, 1 reply; 11+ messages in thread
From: Minchan Kim @ 2013-06-14 11:22 UTC (permalink / raw)
  To: Pavel Emelyanov; +Cc: Andrew Morton, KOSAKI Motohiro, linux-mm

Hello Pavel,

On Fri, Jun 14, 2013 at 02:01:23PM +0400, Pavel Emelyanov wrote:
> >>>>> If it's not allowed, another approach should be new system call.
> >>>>>
> >>>>>         int sys_softdirty(pid_t pid, void *addr, size_t len);
> >>>>
> >>>> This looks like existing sys_madvise() one.
> >>>
> >>> Except pid part. It is added by your purpose, which external task
> >>> can control any process.
> 
> In CRIU we can work with pid-less syscalls just fine :) So extending regular
> madvise would work.

I didn't know that.
Just out of curiosity. How can CRIU control other tasks without pid?

> 
> >>>>
> >>>>> If we approach new system call, we don't need to maintain current
> >>>>> proc interface and it would be very handy to get a information
> >>>>> without pagemap (open/read/close) so we can add a parameter to
> >>>>> get a dirty information easily.
> >>>>>
> >>>>>         int sys_softdirty(pid_t pid, void *addr, size_t len, unsigned char *vec)
> >>>>>
> >>>>> What do you think about it?
> >>>>>
> >>>>
> >>>> This is OK for me, though there's another issue with this API I'd like
> >>>> to mention -- consider your app is doing these tricks with soft-dirty
> >>>> and at the same time CRIU tools live-migrate it using the soft-dirty bits
> >>>> to optimize the freeze time.
> >>>>
> >>>> In that case soft-dirty bits would be in wrong state for both -- you app
> >>>> and CRIU, but with the proc API we could compare the ctime-s of the 
> >>>> clear_refs file and find out, that someone spoiled the soft-dirty state
> >>>> from last time we messed with it and handle it somehow (copy all the memory
> >>>> in the worst case). Can we somehow handle this with your proposal?
> >>>
> >>> Good point I didn't think over that.
> >>> A simple idea popped from my mind is we can use read/write lock
> >>> so if pid is equal to calling process's one or pid is NULL,
> >>> we use read side lock, which can allow marking soft-dirty 
> >>> several vmas with parallel. And pid is not equal to calling
> >>> process's one, the API should try to hold write-side lock
> >>> then, if it's fail, the API should return EAGAIN so that CRIU
> >>> can progress other processes and retry it after a while.
> >>> Of course, it would make live-lock so that sys_softdirty might
> >>> need another argument like "int block".
> >>
> >> And we need a flag to show SELF_SOFT_DIRTY or EXTERNAL_SOFT_DIRTY
> >> and the flag will be protected by above lock. It could prevent mixed
> >> case by self and external.
> > 
> > I realized it's not enough. Another idea is here.
> > The intenion is followin as,
> > 
> > self softdirty VS self softdirty -> NOT exclusive
> > self softdirty VS external softdirty -> exclusive
> > external softdirty VS external softdirty-> excluisve
> 
> I think it might work for us. However, I have two comments to the
> implementation, please see below.
> 
> > struct softdirty token {
> >         u64 external;
> >         u64 internal;
> > };
> > 
> >        int sys_set_softdirty(pid_t pid, unsigned long start, size_t len,
> >                                 struct softdirty *token); 

I should have mentioned that start and len are ignored if pid is not eqaul
to caller's pid.

> >        int sys_get_softdirty(pid_t pid, unsigned long start, size_t len, 
> >                                 struct softdirty token, char *vec);
> 
> Can you please show an example how to use these two, I don't quite get how
> can I do external soft-dirty tracking in atomic manner.

Hmm, I don't know how CRIU works but ...

	while(1) {

		struct softdirty token;
		
		sys_set_softdirty(tracked_pid, 0, 0, &token);
		...
		...
		...
		if (!sys_get_softdirty(tacked_pid, 0, 0, token, NULL))
			break;
	}

Maybe do you have a concern about live-lock?

> 
> > 
> > SYSCALL(set_softdirty, ..., token)
> > {
> >         struct task_struct *tsk = task_from_pid(pid);
> >         mutex_lock(&mm->st_lock);
> >         if (tsk == current)
> >                 tsk->mm->token.internal++; 
> >         else
> >                 tsk->mm->token.external++;
> >         token->external = mm->token.external;
> >         token->internal = mm->token.internal;
> >         mutex_unlock(&mm->st_lock);
> >         ..
> >         ..
> > 
> > }
> > 
> > SYSCALL(get_softdirty, ..., token, ...)
> > {
> >         struct task_struct *tsk = task_from_pid(pid);
> >         mutex_lock(&mm->st_lock);
> >         if (tsk == current) {
> >                 if (tsk->mm->token.external != token.external) {
> >                         mutex_unlock
> >                         return -EAGAIN;
> >                 }
> >         } else {
> >                 if (tsk->mm->token.external != token.external ||
> >                     tsk->mm->token.internal != token.internal) {
> >                         mutex_unlock;
> >                         return -EAGAIN;
> >                 }
> >         }
> >         mutex_unlock(&mm->st_lock);
> 
> Presumably the critical section should be longer, as if tokens match and we
> release the lock and proceed with working on pagemap, the concurrent call
> to set_softdirty can proceed and spoil the picture.

True.

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Change soft-dirty interface?
  2013-06-14 11:22           ` Minchan Kim
@ 2013-06-14 11:37             ` Pavel Emelyanov
  2013-06-15  6:41               ` Minchan Kim
  0 siblings, 1 reply; 11+ messages in thread
From: Pavel Emelyanov @ 2013-06-14 11:37 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Andrew Morton, KOSAKI Motohiro, linux-mm

On 06/14/2013 03:22 PM, Minchan Kim wrote:
> Hello Pavel,
> 
> On Fri, Jun 14, 2013 at 02:01:23PM +0400, Pavel Emelyanov wrote:
>>>>>>> If it's not allowed, another approach should be new system call.
>>>>>>>
>>>>>>>         int sys_softdirty(pid_t pid, void *addr, size_t len);
>>>>>>
>>>>>> This looks like existing sys_madvise() one.
>>>>>
>>>>> Except pid part. It is added by your purpose, which external task
>>>>> can control any process.
>>
>> In CRIU we can work with pid-less syscalls just fine :) So extending regular
>> madvise would work.
> 
> I didn't know that.
> Just out of curiosity. How can CRIU control other tasks without pid?

We use the parasite-injection technique [1]. Briefly -- we put a code into
other task's address space using ptrace() and /proc/PID/map_files/ and make
this code run and do what we need. Thus we can call madvise() "on" another
task.

[1] http://lwn.net/Articles/454304/

>>
>>>>>>
>>>>>>> If we approach new system call, we don't need to maintain current
>>>>>>> proc interface and it would be very handy to get a information
>>>>>>> without pagemap (open/read/close) so we can add a parameter to
>>>>>>> get a dirty information easily.
>>>>>>>
>>>>>>>         int sys_softdirty(pid_t pid, void *addr, size_t len, unsigned char *vec)
>>>>>>>
>>>>>>> What do you think about it?
>>>>>>>
>>>>>>
>>>>>> This is OK for me, though there's another issue with this API I'd like
>>>>>> to mention -- consider your app is doing these tricks with soft-dirty
>>>>>> and at the same time CRIU tools live-migrate it using the soft-dirty bits
>>>>>> to optimize the freeze time.
>>>>>>
>>>>>> In that case soft-dirty bits would be in wrong state for both -- you app
>>>>>> and CRIU, but with the proc API we could compare the ctime-s of the 
>>>>>> clear_refs file and find out, that someone spoiled the soft-dirty state
>>>>>> from last time we messed with it and handle it somehow (copy all the memory
>>>>>> in the worst case). Can we somehow handle this with your proposal?
>>>>>
>>>>> Good point I didn't think over that.
>>>>> A simple idea popped from my mind is we can use read/write lock
>>>>> so if pid is equal to calling process's one or pid is NULL,
>>>>> we use read side lock, which can allow marking soft-dirty 
>>>>> several vmas with parallel. And pid is not equal to calling
>>>>> process's one, the API should try to hold write-side lock
>>>>> then, if it's fail, the API should return EAGAIN so that CRIU
>>>>> can progress other processes and retry it after a while.
>>>>> Of course, it would make live-lock so that sys_softdirty might
>>>>> need another argument like "int block".
>>>>
>>>> And we need a flag to show SELF_SOFT_DIRTY or EXTERNAL_SOFT_DIRTY
>>>> and the flag will be protected by above lock. It could prevent mixed
>>>> case by self and external.
>>>
>>> I realized it's not enough. Another idea is here.
>>> The intenion is followin as,
>>>
>>> self softdirty VS self softdirty -> NOT exclusive
>>> self softdirty VS external softdirty -> exclusive
>>> external softdirty VS external softdirty-> excluisve
>>
>> I think it might work for us. However, I have two comments to the
>> implementation, please see below.
>>
>>> struct softdirty token {
>>>         u64 external;
>>>         u64 internal;
>>> };
>>>
>>>        int sys_set_softdirty(pid_t pid, unsigned long start, size_t len,
>>>                                 struct softdirty *token); 
> 
> I should have mentioned that start and len are ignored if pid is not eqaul
> to caller's pid.

OK

>>>        int sys_get_softdirty(pid_t pid, unsigned long start, size_t len, 
>>>                                 struct softdirty token, char *vec);
>>
>> Can you please show an example how to use these two, I don't quite get how
>> can I do external soft-dirty tracking in atomic manner.
> 
> Hmm, I don't know how CRIU works but ...
> 
> 	while(1) {
> 
> 		struct softdirty token;
> 		
> 		sys_set_softdirty(tracked_pid, 0, 0, &token);
> 		...
> 		...
> 		...
> 		if (!sys_get_softdirty(tacked_pid, 0, 0, token, NULL))
> 			break;
> 	}
> 
> Maybe do you have a concern about live-lock?

No, I worry about potential races with which we or application can skip
dirty page. Let me describe how CRIU uses existing soft-dirty implementation.

1. stop the task we want to work on
2. read the /proc/pid/pagemap file to find out which pages to
   read. Those with soft-dirty _cleared_ should be _skipped_
3. read task's memory at calculated bitmap
4. reset soft dirty bits on task
5. resume task execution

With the interface you propose the sequence presumably should look like

1. stop the task we want to work on
2. call set_softdirty + get_softdirty to get the soft-dirty bitmap and
   reset one. If it reports error, then the soft-dirty we did before is
   spoiled and all memory should be read (iow -- bitmap should be filled
   with 1-s)
3. read task's memory at calculated bitmap
4. resume task execution

Am I right with this? If yes, why do we need two calls, wouldn't it be better
to merge them into one?

Thanks,
Pavel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Change soft-dirty interface?
  2013-06-14 11:37             ` Pavel Emelyanov
@ 2013-06-15  6:41               ` Minchan Kim
  2013-06-19  9:31                 ` Pavel Emelyanov
  0 siblings, 1 reply; 11+ messages in thread
From: Minchan Kim @ 2013-06-15  6:41 UTC (permalink / raw)
  To: Pavel Emelyanov; +Cc: Andrew Morton, KOSAKI Motohiro, linux-mm


Hi Pavel,

Sorry for the delaying.
Maybe our timezone difference and my boys's interrupt.

On Fri, Jun 14, 2013 at 03:37:09PM +0400, Pavel Emelyanov wrote:
> On 06/14/2013 03:22 PM, Minchan Kim wrote:
> > Hello Pavel,
> > 
> > On Fri, Jun 14, 2013 at 02:01:23PM +0400, Pavel Emelyanov wrote:
> >>>>>>> If it's not allowed, another approach should be new system call.
> >>>>>>>
> >>>>>>>         int sys_softdirty(pid_t pid, void *addr, size_t len);
> >>>>>>
> >>>>>> This looks like existing sys_madvise() one.
> >>>>>
> >>>>> Except pid part. It is added by your purpose, which external task
> >>>>> can control any process.
> >>
> >> In CRIU we can work with pid-less syscalls just fine :) So extending regular
> >> madvise would work.
> > 
> > I didn't know that.
> > Just out of curiosity. How can CRIU control other tasks without pid?
> 
> We use the parasite-injection technique [1]. Briefly -- we put a code into
> other task's address space using ptrace() and /proc/PID/map_files/ and make
> this code run and do what we need. Thus we can call madvise() "on" another
> task.

Interesting.

> 
> [1] http://lwn.net/Articles/454304/


> 
> >>
> >>>>>>
> >>>>>>> If we approach new system call, we don't need to maintain current
> >>>>>>> proc interface and it would be very handy to get a information
> >>>>>>> without pagemap (open/read/close) so we can add a parameter to
> >>>>>>> get a dirty information easily.
> >>>>>>>
> >>>>>>>         int sys_softdirty(pid_t pid, void *addr, size_t len, unsigned char *vec)
> >>>>>>>
> >>>>>>> What do you think about it?
> >>>>>>>
> >>>>>>
> >>>>>> This is OK for me, though there's another issue with this API I'd like
> >>>>>> to mention -- consider your app is doing these tricks with soft-dirty
> >>>>>> and at the same time CRIU tools live-migrate it using the soft-dirty bits
> >>>>>> to optimize the freeze time.
> >>>>>>
> >>>>>> In that case soft-dirty bits would be in wrong state for both -- you app
> >>>>>> and CRIU, but with the proc API we could compare the ctime-s of the 
> >>>>>> clear_refs file and find out, that someone spoiled the soft-dirty state
> >>>>>> from last time we messed with it and handle it somehow (copy all the memory
> >>>>>> in the worst case). Can we somehow handle this with your proposal?
> >>>>>
> >>>>> Good point I didn't think over that.
> >>>>> A simple idea popped from my mind is we can use read/write lock
> >>>>> so if pid is equal to calling process's one or pid is NULL,
> >>>>> we use read side lock, which can allow marking soft-dirty 
> >>>>> several vmas with parallel. And pid is not equal to calling
> >>>>> process's one, the API should try to hold write-side lock
> >>>>> then, if it's fail, the API should return EAGAIN so that CRIU
> >>>>> can progress other processes and retry it after a while.
> >>>>> Of course, it would make live-lock so that sys_softdirty might
> >>>>> need another argument like "int block".
> >>>>
> >>>> And we need a flag to show SELF_SOFT_DIRTY or EXTERNAL_SOFT_DIRTY
> >>>> and the flag will be protected by above lock. It could prevent mixed
> >>>> case by self and external.
> >>>
> >>> I realized it's not enough. Another idea is here.
> >>> The intenion is followin as,
> >>>
> >>> self softdirty VS self softdirty -> NOT exclusive
> >>> self softdirty VS external softdirty -> exclusive
> >>> external softdirty VS external softdirty-> excluisve
> >>
> >> I think it might work for us. However, I have two comments to the
> >> implementation, please see below.
> >>
> >>> struct softdirty token {
> >>>         u64 external;
> >>>         u64 internal;
> >>> };
> >>>
> >>>        int sys_set_softdirty(pid_t pid, unsigned long start, size_t len,
> >>>                                 struct softdirty *token); 
> > 
> > I should have mentioned that start and len are ignored if pid is not eqaul
> > to caller's pid.
> 
> OK
> 
> >>>        int sys_get_softdirty(pid_t pid, unsigned long start, size_t len, 
> >>>                                 struct softdirty token, char *vec);
> >>
> >> Can you please show an example how to use these two, I don't quite get how
> >> can I do external soft-dirty tracking in atomic manner.
> > 
> > Hmm, I don't know how CRIU works but ...
> > 
> > 	while(1) {
> > 
> > 		struct softdirty token;
> > 		
> > 		sys_set_softdirty(tracked_pid, 0, 0, &token);
> > 		...
> > 		...
> > 		...
> > 		if (!sys_get_softdirty(tacked_pid, 0, 0, token, NULL))
> > 			break;
> > 	}
> > 
> > Maybe do you have a concern about live-lock?
> 
> No, I worry about potential races with which we or application can skip
> dirty page. Let me describe how CRIU uses existing soft-dirty implementation.
> 
> 1. stop the task we want to work on
> 2. read the /proc/pid/pagemap file to find out which pages to
>    read. Those with soft-dirty _cleared_ should be _skipped_
> 3. read task's memory at calculated bitmap
> 4. reset soft dirty bits on task
> 5. resume task execution

Let me try to parse as my term.

1. admin does "echo 4 > /proc/<target>/clear_refs"
2. admin stop the target
3. admin reads the /proc/<target>/pagemap and make bitmap
   with only soft-dirty marked pages so we can avoid unnecessary
   migration
4. admin reads target's dirtied pages via bitmap from 3
5. admin does "echo 4 > /proc/<target>/clear_refs" again to find
   future diry pages of the target.
6. admin resumes the target

Right?
If so, my interface is following as

1. admin does set_softdirty(target, 0, 0, &token);
   (set_softdirty clears all soft-dirty bit from target process's
   page table.
2. admin stop the target
3. admin reads the /proc/target/pagemap and make bitmap
   with only soft-dirty marked pages so we can avoid unnecessary
   migration. 
4. admins does get_softdirty(target, 0, 0, token) to confirm
   someone else spoiled since 1
4-1. If it is reports error, then admins discard the bitmap got
     from 3 and have to read all memory.
5. admin does set_softdirty(target, 0, 0, &token) again to find
   future dirty pages of the target  
5. admin resumes the target.

> 
> With the interface you propose the sequence presumably should look like
> 
> 1. stop the task we want to work on
> 2. call set_softdirty + get_softdirty to get the soft-dirty bitmap and
>    reset one. If it reports error, then the soft-dirty we did before is
>    spoiled and all memory should be read (iow -- bitmap should be filled
>    with 1-s)
> 3. read task's memory at calculated bitmap
> 4. resume task execution

> 
> Am I right with this? If yes, why do we need two calls, wouldn't it be better

I failed to parse your terms so I wrote scnario as my understanding
so please see my above sequence and if you have a comment, please ask
again.

> to merge them into one?

It's not hard part but I wanted to show my intention clearly.
If we all agree on, let's think over interface again.

Thanks!

> 
> Thanks,
> Pavel

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Change soft-dirty interface?
  2013-06-15  6:41               ` Minchan Kim
@ 2013-06-19  9:31                 ` Pavel Emelyanov
  2013-06-21  1:41                   ` Minchan Kim
  0 siblings, 1 reply; 11+ messages in thread
From: Pavel Emelyanov @ 2013-06-19  9:31 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Andrew Morton, KOSAKI Motohiro, linux-mm

>>> Maybe do you have a concern about live-lock?
>>
>> No, I worry about potential races with which we or application can skip
>> dirty page. Let me describe how CRIU uses existing soft-dirty implementation.
>>
>> 1. stop the task we want to work on
>> 2. read the /proc/pid/pagemap file to find out which pages to
>>    read. Those with soft-dirty _cleared_ should be _skipped_
>> 3. read task's memory at calculated bitmap
>> 4. reset soft dirty bits on task
>> 5. resume task execution
> 
> Let me try to parse as my term.
> 
> 1. admin does "echo 4 > /proc/<target>/clear_refs"
> 2. admin stop the target
> 3. admin reads the /proc/<target>/pagemap and make bitmap
>    with only soft-dirty marked pages so we can avoid unnecessary
>    migration
> 4. admin reads target's dirtied pages via bitmap from 3
> 5. admin does "echo 4 > /proc/<target>/clear_refs" again to find
>    future diry pages of the target.
> 6. admin resumes the target
> 
> Right?

Almost, the step #1 looks excessive. We shouldn't clear the soft dirty
_before_ stopping the target, otherwise we lose all the bits "collected"
before it.

> If so, my interface is following as
> 
> 1. admin does set_softdirty(target, 0, 0, &token);
>    (set_softdirty clears all soft-dirty bit from target process's
>    page table.
> 2. admin stop the target
> 3. admin reads the /proc/target/pagemap and make bitmap
>    with only soft-dirty marked pages so we can avoid unnecessary
>    migration. 
> 4. admins does get_softdirty(target, 0, 0, token) to confirm
>    someone else spoiled since 1
> 4-1. If it is reports error, then admins discard the bitmap got
>      from 3 and have to read all memory.
> 5. admin does set_softdirty(target, 0, 0, &token) again to find
>    future dirty pages of the target  
> 5. admin resumes the target.

Same here -- if we skip step #1, then we can merge steps 4 and 5 into
one system call. Can we?

>>
>> With the interface you propose the sequence presumably should look like
>>
>> 1. stop the task we want to work on
>> 2. call set_softdirty + get_softdirty to get the soft-dirty bitmap and
>>    reset one. If it reports error, then the soft-dirty we did before is
>>    spoiled and all memory should be read (iow -- bitmap should be filled
>>    with 1-s)
>> 3. read task's memory at calculated bitmap
>> 4. resume task execution
> 
>>
>> Am I right with this? If yes, why do we need two calls, wouldn't it be better
> 
> I failed to parse your terms so I wrote scnario as my understanding
> so please see my above sequence and if you have a comment, please ask
> again.
> 
>> to merge them into one?
> 
> It's not hard part but I wanted to show my intention clearly.
> If we all agree on, let's think over interface again.

For me the interface with a single syscall looks OK. If nobody else objects,
I think you can go on with the kernel patches :) Presumably you can even
use the criu project sources and tests to check how memory changes tracking
works with the new interface.

> Thanks!

Thanks,
Pavel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Change soft-dirty interface?
  2013-06-19  9:31                 ` Pavel Emelyanov
@ 2013-06-21  1:41                   ` Minchan Kim
  0 siblings, 0 replies; 11+ messages in thread
From: Minchan Kim @ 2013-06-21  1:41 UTC (permalink / raw)
  To: Pavel Emelyanov; +Cc: Andrew Morton, KOSAKI Motohiro, linux-mm

On Wed, Jun 19, 2013 at 01:31:28PM +0400, Pavel Emelyanov wrote:
> >>> Maybe do you have a concern about live-lock?
> >>
> >> No, I worry about potential races with which we or application can skip
> >> dirty page. Let me describe how CRIU uses existing soft-dirty implementation.
> >>
> >> 1. stop the task we want to work on
> >> 2. read the /proc/pid/pagemap file to find out which pages to
> >>    read. Those with soft-dirty _cleared_ should be _skipped_
> >> 3. read task's memory at calculated bitmap
> >> 4. reset soft dirty bits on task
> >> 5. resume task execution
> > 
> > Let me try to parse as my term.
> > 
> > 1. admin does "echo 4 > /proc/<target>/clear_refs"
> > 2. admin stop the target
> > 3. admin reads the /proc/<target>/pagemap and make bitmap
> >    with only soft-dirty marked pages so we can avoid unnecessary
> >    migration
> > 4. admin reads target's dirtied pages via bitmap from 3
> > 5. admin does "echo 4 > /proc/<target>/clear_refs" again to find
> >    future diry pages of the target.
> > 6. admin resumes the target
> > 
> > Right?
> 
> Almost, the step #1 looks excessive. We shouldn't clear the soft dirty
> _before_ stopping the target, otherwise we lose all the bits "collected"
> before it.
> 
> > If so, my interface is following as
> > 
> > 1. admin does set_softdirty(target, 0, 0, &token);
> >    (set_softdirty clears all soft-dirty bit from target process's
> >    page table.
> > 2. admin stop the target
> > 3. admin reads the /proc/target/pagemap and make bitmap
> >    with only soft-dirty marked pages so we can avoid unnecessary
> >    migration. 
> > 4. admins does get_softdirty(target, 0, 0, token) to confirm
> >    someone else spoiled since 1
> > 4-1. If it is reports error, then admins discard the bitmap got
> >      from 3 and have to read all memory.
> > 5. admin does set_softdirty(target, 0, 0, &token) again to find
> >    future dirty pages of the target  
> > 5. admin resumes the target.
> 
> Same here -- if we skip step #1, then we can merge steps 4 and 5 into
> one system call. Can we?
> 
> >>
> >> With the interface you propose the sequence presumably should look like
> >>
> >> 1. stop the task we want to work on
> >> 2. call set_softdirty + get_softdirty to get the soft-dirty bitmap and
> >>    reset one. If it reports error, then the soft-dirty we did before is
> >>    spoiled and all memory should be read (iow -- bitmap should be filled
> >>    with 1-s)
> >> 3. read task's memory at calculated bitmap
> >> 4. resume task execution
> > 
> >>
> >> Am I right with this? If yes, why do we need two calls, wouldn't it be better
> > 
> > I failed to parse your terms so I wrote scnario as my understanding
> > so please see my above sequence and if you have a comment, please ask
> > again.
> > 
> >> to merge them into one?
> > 
> > It's not hard part but I wanted to show my intention clearly.
> > If we all agree on, let's think over interface again.
> 
> For me the interface with a single syscall looks OK. If nobody else objects,
> I think you can go on with the kernel patches :) Presumably you can even
> use the criu project sources and tests to check how memory changes tracking
> works with the new interface.

Thanks for the good discussion!

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-06-21  1:41 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-13  1:53 Change soft-dirty interface? Minchan Kim
2013-06-13  9:10 ` Pavel Emelyanov
2013-06-14  0:32   ` Minchan Kim
2013-06-14  0:41     ` Minchan Kim
2013-06-14  5:07       ` Minchan Kim
2013-06-14 10:01         ` Pavel Emelyanov
2013-06-14 11:22           ` Minchan Kim
2013-06-14 11:37             ` Pavel Emelyanov
2013-06-15  6:41               ` Minchan Kim
2013-06-19  9:31                 ` Pavel Emelyanov
2013-06-21  1:41                   ` Minchan Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).