RFC: Workload Sampling Using Perf Events and CRIU

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* RFC: Workload Sampling Using Perf Events and CRIU
@ 2013-09-03 13:21 Christopher Covington
  2013-09-03 14:21 ` [CRIU] " Pavel Emelyanov
  0 siblings, 1 reply; 4+ messages in thread
From: Christopher Covington @ 2013-09-03 13:21 UTC (permalink / raw)
  To: criu, linux-perf-users

Hi,

Seeing some overlapping themes in the cr_service CRIU patches, I wanted to
bring up a use case that I've been working on--periodic sampling of a workload
using perf events and CRIU. My goal is to enable sampling of workload
(benchmark) execution on models that are too slow to run the whole thing in a
reasonable amount of time. The proposed workflow is to profile the workload on
a fast system, post-process the data with a tool like SimPoint to figure out
when to take checkpoints, dump checkpoints from the fast system, then restore
them on the slow model and get some representative results assuming everything
works as intended.

Unfortunately I don't have code to share immediately. Should everything go
smoothly, I may be able to send stuff out in a few weeks. I was hoping it
might be useful to talk general architecture in the meantime.

My current prototype adds a new command to criu: "sample", requiring an
intervals argument and a workload to execute. When given this command, criu
fork's a child process which opens a perf event that is set up to signal the
parent criu process when the first interval has elapsed (I'm measuring
instructions, but it could be any perf event). With the counter set up, the
child executes the workload. The criu parent process then waits for a SIGIO
signal from the perf event. When it comes, it dumps the child process, which
has modified logic to not dump the perf event file descriptor but instead
reset it to the next interval. I tried opening the perf event before the fork,
but the start-on-exec flag didn't seem to work in that configuration and I
don't really want to include criu's instructions anyway. There's a bit of skid
in this setup but I'm hoping it's not significant (the intervals I'm
interested in are on the order of hundreds of millions of instructions). If
exact precision was important, perhaps the kernel could stop the workload when
the count expires and criu could be augmented to be able to dump it, but I
figured I'd only try to tackle that if it's needed.

The differences between this and the conventional dump are that criu knows the
PID from running fork rather than having it passed on the command line, there
are multiple dumps in a run, and there is some extra complexity around dumping
file descriptors. I had to use the close-on-exec flag when duplicating file
descriptors to keep the child process untainted by criu.

What do people think of this approach? Would it make more sense to add
something that depends on CRIU to perf tools? Should I look more closely at a
library-based approach? Could potential library users make use of this sort of
fork+exec+signal approach instead of making function calls?

Thanks,
Christopher

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by the Linux Foundation.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [CRIU] RFC: Workload Sampling Using Perf Events and CRIU
  2013-09-03 13:21 RFC: Workload Sampling Using Perf Events and CRIU Christopher Covington
@ 2013-09-03 14:21 ` Pavel Emelyanov
  2013-09-03 15:19   ` Christopher Covington
  0 siblings, 1 reply; 4+ messages in thread
From: Pavel Emelyanov @ 2013-09-03 14:21 UTC (permalink / raw)
  To: Christopher Covington; +Cc: criu, linux-perf-users

On 09/03/2013 05:21 PM, Christopher Covington wrote:
> Hi,
> 
> Seeing some overlapping themes in the cr_service CRIU patches, I wanted to
> bring up a use case that I've been working on--periodic sampling of a workload
> using perf events and CRIU. My goal is to enable sampling of workload
> (benchmark) execution on models that are too slow to run the whole thing in a
> reasonable amount of time. The proposed workflow is to profile the workload on
> a fast system, post-process the data with a tool like SimPoint to figure out
> when to take checkpoints, dump checkpoints from the fast system, then restore
> them on the slow model and get some representative results assuming everything
> works as intended.
> 
> Unfortunately I don't have code to share immediately. Should everything go
> smoothly, I may be able to send stuff out in a few weeks. I was hoping it
> might be useful to talk general architecture in the meantime.
> 
> My current prototype adds a new command to criu: "sample", requiring an
> intervals argument and a workload to execute. When given this command, criu
> fork's a child process which opens a perf event that is set up to signal the
> parent criu process when the first interval has elapsed (I'm measuring
> instructions, but it could be any perf event).

I'm not familiar with internals of perf, can you shed more light on this, please.
What does "opens a perf event" occurs? Is it an eventfd descriptor with respective
setup or something else?

> With the counter set up, the
> child executes the workload. The criu parent process then waits for a SIGIO
> signal from the perf event. When it comes, it dumps the child process, which
> has modified logic to not dump the perf event file descriptor but instead
> reset it to the next interval. I tried opening the perf event before the fork,
> but the start-on-exec flag didn't seem to work in that configuration and I
> don't really want to include criu's instructions anyway. There's a bit of skid
> in this setup but I'm hoping it's not significant (the intervals I'm
> interested in are on the order of hundreds of millions of instructions). If
> exact precision was important, perhaps the kernel could stop the workload when
> the count expires and criu could be augmented to be able to dump it, but I
> figured I'd only try to tackle that if it's needed.
> 
> The differences between this and the conventional dump are that criu knows the
> PID from running fork rather than having it passed on the command line, there
> are multiple dumps in a run, and there is some extra complexity around dumping
> file descriptors. I had to use the close-on-exec flag when duplicating file
> descriptors to keep the child process untainted by criu.
> 
> What do people think of this approach? Would it make more sense to add
> something that depends on CRIU to perf tools? Should I look more closely at a
> library-based approach? Could potential library users make use of this sort of
> fork+exec+signal approach instead of making function calls?

For me the scenario you proposes fits naturally into the "service" thing being
developed. The part that is missing for your case is that for now "service" is
supposed to serve only one "dump-me" request per-connection.

Can we somehow from one process configure perf events to come to another process?
If yes, then we can make your case look like

1. criu service starts
2. a process with your workload starts and
  a) opens perf event
  b) connects to criu service
  c) delegates the perf event to service
  d) sends the "dump me request", with "use delegated event" flag set
3. your workload starts

After this once perf event occurs, it's caught by criu service, which in turn
dumps the process.

So is it possible to make this "perf event delegation to other process"?

Thanks,
Pavel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [CRIU] RFC: Workload Sampling Using Perf Events and CRIU
  2013-09-03 14:21 ` [CRIU] " Pavel Emelyanov
@ 2013-09-03 15:19   ` Christopher Covington
  2013-09-03 18:46     ` Pavel Emelyanov
  0 siblings, 1 reply; 4+ messages in thread
From: Christopher Covington @ 2013-09-03 15:19 UTC (permalink / raw)
  To: Pavel Emelyanov; +Cc: criu, linux-perf-users

On 09/03/2013 10:21 AM, Pavel Emelyanov wrote:
> On 09/03/2013 05:21 PM, Christopher Covington wrote:
>> Hi,
>>
>> Seeing some overlapping themes in the cr_service CRIU patches, I wanted to
>> bring up a use case that I've been working on--periodic sampling of a workload
>> using perf events and CRIU. My goal is to enable sampling of workload
>> (benchmark) execution on models that are too slow to run the whole thing in a
>> reasonable amount of time. The proposed workflow is to profile the workload on
>> a fast system, post-process the data with a tool like SimPoint to figure out
>> when to take checkpoints, dump checkpoints from the fast system, then restore
>> them on the slow model and get some representative results assuming everything
>> works as intended.
>>
>> Unfortunately I don't have code to share immediately. Should everything go
>> smoothly, I may be able to send stuff out in a few weeks. I was hoping it
>> might be useful to talk general architecture in the meantime.
>>
>> My current prototype adds a new command to criu: "sample", requiring an
>> intervals argument and a workload to execute. When given this command, criu
>> fork's a child process which opens a perf event that is set up to signal the
>> parent criu process when the first interval has elapsed (I'm measuring
>> instructions, but it could be any perf event).
> 
> I'm not familiar with internals of perf, can you shed more light on this, please.
> What does "opens a perf event" occurs? Is it an eventfd descriptor with respective
> setup or something else?

http://web.eece.maine.edu/~vweaver/projects/perf_events/perf_event_open.html

I'm passing initial settings as an argument to the perf event open system
call, which returns a file descriptor. With the file descriptor in hand I can
then use fcntl and ioctl to do the last part of the setup like setting the
asynchronous flag and making the parent process the owner so that it gets the
wakeup signal.

>> With the counter set up, the
>> child executes the workload. The criu parent process then waits for a SIGIO
>> signal from the perf event. When it comes, it dumps the child process, which
>> has modified logic to not dump the perf event file descriptor but instead
>> reset it to the next interval. I tried opening the perf event before the fork,
>> but the start-on-exec flag didn't seem to work in that configuration and I
>> don't really want to include criu's instructions anyway. There's a bit of skid
>> in this setup but I'm hoping it's not significant (the intervals I'm
>> interested in are on the order of hundreds of millions of instructions). If
>> exact precision was important, perhaps the kernel could stop the workload when
>> the count expires and criu could be augmented to be able to dump it, but I
>> figured I'd only try to tackle that if it's needed.
>>
>> The differences between this and the conventional dump are that criu knows the
>> PID from running fork rather than having it passed on the command line, there
>> are multiple dumps in a run, and there is some extra complexity around dumping
>> file descriptors. I had to use the close-on-exec flag when duplicating file
>> descriptors to keep the child process untainted by criu.
>>
>> What do people think of this approach? Would it make more sense to add
>> something that depends on CRIU to perf tools? Should I look more closely at a
>> library-based approach? Could potential library users make use of this sort of
>> fork+exec+signal approach instead of making function calls?
> 
> For me the scenario you proposes fits naturally into the "service" thing being
> developed. The part that is missing for your case is that for now "service" is
> supposed to serve only one "dump-me" request per-connection.
> 
> Can we somehow from one process configure perf events to come to another process?
> If yes, then we can make your case look like
> 
> 1. criu service starts
> 2. a process with your workload starts and
>   a) opens perf event
>   b) connects to criu service
>   c) delegates the perf event to service
>   d) sends the "dump me request", with "use delegated event" flag set
> 3. your workload starts
> 
> After this once perf event occurs, it's caught by criu service, which in turn
> dumps the process.
> 
> So is it possible to make this "perf event delegation to other process"?

There are two things to be delegated. The first is who gets the wakeup signal.
As long as the process identifier for the service is known, it should be
trivial to make a file control ownership call on the perf event file
descriptor before the workload is executed. The other resource is the file
descriptor itself, which one must re-program and reset to capture multiple
checkpoints. The service should have access to the file descriptor once the
first dump is taken, which is the earliest it would need to perform any
operations on it anyhow.

I think this still leaves the specifics of multiple checkpoint dumps in
sequence somewhat unresolved. I think I'll try to switch over to the service
workflow and play around with it a little to get a better idea of what the
options might be.

Thanks,
Christopher

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by the Linux Foundation.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [CRIU] RFC: Workload Sampling Using Perf Events and CRIU
  2013-09-03 15:19   ` Christopher Covington
@ 2013-09-03 18:46     ` Pavel Emelyanov
  0 siblings, 0 replies; 4+ messages in thread
From: Pavel Emelyanov @ 2013-09-03 18:46 UTC (permalink / raw)
  To: Christopher Covington; +Cc: criu, linux-perf-users

On 09/03/2013 07:19 PM, Christopher Covington wrote:
>>> My current prototype adds a new command to criu: "sample", requiring an
>>> intervals argument and a workload to execute. When given this command, criu
>>> fork's a child process which opens a perf event that is set up to signal the
>>> parent criu process when the first interval has elapsed (I'm measuring
>>> instructions, but it could be any perf event).
>>
>> I'm not familiar with internals of perf, can you shed more light on this, please.
>> What does "opens a perf event" occurs? Is it an eventfd descriptor with respective
>> setup or something else?
> 
> http://web.eece.maine.edu/~vweaver/projects/perf_events/perf_event_open.html
> 
> I'm passing initial settings as an argument to the perf event open system
> call, which returns a file descriptor. With the file descriptor in hand I can
> then use fcntl and ioctl to do the last part of the setup like setting the
> asynchronous flag and making the parent process the owner so that it gets the
> wakeup signal.

Why do you configure signal delivery for event notification? Isn't it more
convenient just to poll() the perf event descriptor? I'm reading the kernel's
sys_perf_event_open() stuff and see that it's perfectly poll-able.

>>> What do people think of this approach? Would it make more sense to add
>>> something that depends on CRIU to perf tools? Should I look more closely at a
>>> library-based approach? Could potential library users make use of this sort of
>>> fork+exec+signal approach instead of making function calls?
>>
>> For me the scenario you proposes fits naturally into the "service" thing being
>> developed. The part that is missing for your case is that for now "service" is
>> supposed to serve only one "dump-me" request per-connection.
>>
>> Can we somehow from one process configure perf events to come to another process?
>> If yes, then we can make your case look like
>>
>> 1. criu service starts
>> 2. a process with your workload starts and
>>   a) opens perf event
>>   b) connects to criu service
>>   c) delegates the perf event to service
>>   d) sends the "dump me request", with "use delegated event" flag set
>> 3. your workload starts
>>
>> After this once perf event occurs, it's caught by criu service, which in turn
>> dumps the process.
>>
>> So is it possible to make this "perf event delegation to other process"?
> 
> There are two things to be delegated. The first is who gets the wakeup signal.
> As long as the process identifier for the service is known, it should be
> trivial to make a file control ownership call on the perf event file
> descriptor before the workload is executed. The other resource is the file
> descriptor itself, which one must re-program and reset to capture multiple
> checkpoints. The service should have access to the file descriptor once the
> first dump is taken, which is the earliest it would need to perform any
> operations on it anyhow.

Ah, I see. Well, this fits fine into existing service API -- the "dump-me"
requester would have to pass and fd with a directory where to put image files
to, so we can just extend one and pass the fd with events.

> I think this still leaves the specifics of multiple checkpoint dumps in
> sequence somewhat unresolved.

The service is going to look like a "classical" server -- it listens for
connection, then spawns a child and lets one handle the request. After
doing so it's ready to listen for more requests.

> I think I'll try to switch over to the service workflow and play around with
> it a little to get a better idea of what the options might be.

That's great!

Thanks,
Pavel

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-09-03 18:46 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-03 13:21 RFC: Workload Sampling Using Perf Events and CRIU Christopher Covington
2013-09-03 14:21 ` [CRIU] " Pavel Emelyanov
2013-09-03 15:19   ` Christopher Covington
2013-09-03 18:46     ` Pavel Emelyanov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).