From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jesse Barnes Subject: Re: Design of a GPU profiling debug interface Date: Tue, 9 Nov 2010 09:15:54 -0800 Message-ID: <20101109091554.39680c72@jbarnes-desktop> References: <1288443851.21112.30.camel@pcjc2lap> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from cpoproxy3-pub.bluehost.com (cpoproxy3-pub.bluehost.com [67.222.54.6]) by gabe.freedesktop.org (Postfix) with SMTP id B21739E80C for ; Tue, 9 Nov 2010 09:15:57 -0800 (PST) In-Reply-To: <1288443851.21112.30.camel@pcjc2lap> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org Errors-To: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org To: Peter Clifton Cc: "intel-gfx@lists.freedesktop.org" List-Id: intel-gfx@lists.freedesktop.org On Sat, 30 Oct 2010 14:04:11 +0100 Peter Clifton wrote: > I think I'll need some help with this. I'm by no means a kernel > programmer, so I'm feeling my way in the dark with this. > > I want to design an interface so I can synchronise my GPU idle flags > polling with batchbuffer execution. I'm imagining at a high level, doing > something like this in my application (or mesa). (Hand-wavey-pseudocode) > > expose_event_handler () > { > static bool one_shot_trace = true; > > if (one_shot_trace) > mesa_debug_i915_trace_idle (TRUE); > > /* RENDERING COMMANDS IN HERE */ > SwapBuffers(); > > if (one_shot_trace) > mesa_debug_i915_trace_idle (FALSE); > > one_shot_trace = false; > } > > > I was imagining adding a flag to the EXECBUFFER2 IOCTL, or perhaps > adding a new EXECBUFFER3 IOCTL (which I'm playing with locally now). > Basically I just want to flag execbuffers which I'm interested in seeing > profiling data for. > > In order to get really high-resolution profiling, it would be > advantageous to confine it to the time-period of interest otherwise the > data rate is too high. I guestimated about 10MB/s for a binary > representation of the data I'm currently polling in user-space. More > spatial resolution would be nice too, so this could increase. Would be very cool to be able to correlate the data... > I think I have a vague idea how to do the GPU and logging parts, even if > I end up having to start the polling before the batchbuffer starts > executing. > > What I've got little / no clue how to is manage allocation of memory to > store the results in. > > Should userspace (mesa?) be passing buffers for the kernel to return > profiling data? Then retrieving it somehow when it "knows" the > batchbuffer is finished? This will probably require over-allocation with > a guestimate of required memory space to log the given batch-buffer. > > What about exporting via debugfs. Assuming the above code-fragment, we > could leave the last "frame" of polled data available, with the data > being overwritten when the next request to start logging comes in. > (That would perhaps require some kind of sequence number assigned if we > have multiple batches which come under the same request... or a separate > IOCTL to turn on / off logging). There's also relayfs, which is made for high bandwidth kernel->user communication. I'm not sure if it will make this any easier, but I think there's some documentation in the kernel about it. A ring buffer with the last N timestamps might also be a good way of exposing things. Having more than one entry available means that if userspace didn't get scheduled at the right time it would still have a good chance of getting all the data it missed since the last read. > > Also.. I'm not sure how the locking would work if userspace is reading > out the debugfs file whilst another frame is being executed. (We'd > probably need a secondary logging buffer allocating in that case). The kernel implementation of the read() side of the file could do some locking to prevent new data from corrupting a read in progress. -- Jesse Barnes, Intel Open Source Technology Center