* Question regarding forcewake in i915 @ 2014-12-22 12:26 유재용 2015-01-06 15:19 ` Dave Gordon 0 siblings, 1 reply; 7+ messages in thread From: 유재용 @ 2014-12-22 12:26 UTC (permalink / raw) To: intel-gfx Hello intel-gfx, I'm reading i915 gpu drivers and find myself quite hard to understand about forcewake concepts. I understand that it is something with the energy efficiency so related to ACPI. And it looks like forcewake is working as a pair (get and put). In the "get" part, what it first does it waiting on FORCEWAKE_ACK_HSW register (in case of haswell). And then, it writes something to FORCEWAKE_MT register, read from ECOBUS. And again, it waits on FORCEWAKE_ACK_HSW again! It becomes more confusing when it comes to put. In the "put" part, what it does it writing to FORCEWAKE_MT register and read from ECOBUS. I tried to find some good reading materials about this forcewake, but what I found was a series of patches in this mailing list. (which are quite hard to follow from the begining) Could you explain about the concept of FORCEWAKE and possibly the magic tricks on these get and put? Thanks, Jaeyong _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Question regarding forcewake in i915 2014-12-22 12:26 Question regarding forcewake in i915 유재용 @ 2015-01-06 15:19 ` Dave Gordon 2015-01-07 8:13 ` Jaeyong Yoo 0 siblings, 1 reply; 7+ messages in thread From: Dave Gordon @ 2015-01-06 15:19 UTC (permalink / raw) To: jaeyong.yoo; +Cc: intel-gfx On 22/12/14 12:26, 유재용 wrote: > Hello intel-gfx, > > I'm reading i915 gpu drivers and find myself quite hard to understand > about forcewake concepts. > > I understand that it is something with the energy efficiency so related > to ACPI. And it looks like forcewake is working as a pair (get and put). > In the "get" part, what it first does it waiting on FORCEWAKE_ACK_HSW > register (in case of haswell). > And then, it writes something to FORCEWAKE_MT register, read from ECOBUS. > And again, it waits on FORCEWAKE_ACK_HSW again! > It becomes more confusing when it comes to put. > In the "put" part, what it does it writing to FORCEWAKE_MT register and > read from ECOBUS. > > I tried to find some good reading materials about this forcewake, but > what I found was a series of patches in this mailing list. (which are > quite hard to follow from the begining) > Could you explain about the concept of FORCEWAKE and possibly the magic > tricks on these get and put? > > Thanks, > Jaeyong Hi Jaeyong, FORCEWAKE details vary a little from one chip to another, so this is only a general description, but essentially setting one or more bits in the FORCEWAKE register(s) prevents some or all of the power domains from going into the deeper idle (sleep) states (and forces them out of the sleep state if they're already asleep). Clearing the bit(s) allows the affected parts to go to sleep again. The FORCEWAKE_ACK register(s) contain one or more bits which reflect the internal state, and so acknowledge that the most recent write to the corresponding FORCEWAKE register has been accepted and acted upon. It can take a while for a portion of the chip to wake up, so after setting a FORCEWAKE bit we have to spin-wait until it's taken effect. So, the general algorithm for accessing some part of the chip that may be asleep is: 1) set the relevant bit of (a) FORCEWAKE register 2) poll (matching) FORCEWAKE_ACK until the write is acknowledged 3) access the chip (this can encompass several reads and writes) 4) clear the FORCEWAKE bit that we set earlier 5) poll FORCEWAKE_ACK again until this write is acknowledged Now for extra confusion, there are a few more details: * because reads and writes can in some cases be reordered, we need to force the write to FORCEWAKE to complete before the busy-polling of FORCEWAKE_ACK. This is the sole purpose of the read of the ECOBUS register, which is used just because it happens to lie in the same cacheline as FORCEWAKE. * we can choose not to poll for FORCEWAKE_ACK clear in step (5). Instead, we can just leave the chip to go back to sleep while we get on with other things. But in that case, we might come back and try to wake the chip again before it's finished responding to the write in step (4). So if we don't poll at the end of the sequence, we have to poll at the beginning instead; in other words, move step (5) to before step (1). IIRC, gen6 had a single FORCEWAKE register containing a single effective bit, gen7 has a single register containing multiple bits (so that they can be controlled by different agents) which are OR-ed together to produce a combined wakeup signal (this also applies to HSW, although the FORCEWAKE_ACK is in a different place from earlier chips); and VLV has multiple registers for different power domains (e.g. MEDIA vs RENDER). Hope this helps! Dave _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Question regarding forcewake in i915 2015-01-06 15:19 ` Dave Gordon @ 2015-01-07 8:13 ` Jaeyong Yoo 2015-01-12 17:47 ` Dave Gordon 0 siblings, 1 reply; 7+ messages in thread From: Jaeyong Yoo @ 2015-01-07 8:13 UTC (permalink / raw) To: 'Dave Gordon'; +Cc: intel-gfx Thanks a lot. It is very helpful. Couple of follow-up questions below. > -----Original Message----- > From: Dave Gordon [mailto:david.s.gordon@intel.com] > Sent: Wednesday, January 07, 2015 12:19 AM > To: jaeyong.yoo@samsung.com > Cc: intel-gfx@lists.freedesktop.org > Subject: Re: [Intel-gfx] Question regarding forcewake in i915 > > On 22/12/14 12:26, 유재용 wrote: > > Hello intel-gfx, > > > > I'm reading i915 gpu drivers and find myself quite hard to understand > > about forcewake concepts. > > > > I understand that it is something with the energy efficiency so > > related to ACPI. And it looks like forcewake is working as a pair (get > and put). > > In the "get" part, what it first does it waiting on FORCEWAKE_ACK_HSW > > register (in case of haswell). > > And then, it writes something to FORCEWAKE_MT register, read from ECOBUS. > > And again, it waits on FORCEWAKE_ACK_HSW again! > > It becomes more confusing when it comes to put. > > In the "put" part, what it does it writing to FORCEWAKE_MT register > > and read from ECOBUS. > > > > I tried to find some good reading materials about this forcewake, but > > what I found was a series of patches in this mailing list. (which are > > quite hard to follow from the begining) Could you explain about the > > concept of FORCEWAKE and possibly the magic tricks on these get and > > put? > > > > Thanks, > > Jaeyong > > Hi Jaeyong, > > FORCEWAKE details vary a little from one chip to another, so this is only > a general description, but essentially setting one or more bits in the > FORCEWAKE register(s) prevents some or all of the power domains from going > into the deeper idle (sleep) states (and forces them out of the sleep > state if they're already asleep). Clearing the bit(s) allows the affected > parts to go to sleep again. > > The FORCEWAKE_ACK register(s) contain one or more bits which reflect the > internal state, and so acknowledge that the most recent write to the > corresponding FORCEWAKE register has been accepted and acted upon. It can > take a while for a portion of the chip to wake up, so after setting a > FORCEWAKE bit we have to spin-wait until it's taken effect. > > So, the general algorithm for accessing some part of the chip that may be > asleep is: > 1) set the relevant bit of (a) FORCEWAKE register > 2) poll (matching) FORCEWAKE_ACK until the write is acknowledged > 3) access the chip (this can encompass several reads and writes) > 4) clear the FORCEWAKE bit that we set earlier > 5) poll FORCEWAKE_ACK again until this write is acknowledged > > Now for extra confusion, there are a few more details: > * because reads and writes can in some cases be reordered, we > need to force the write to FORCEWAKE to complete before the > busy-polling of FORCEWAKE_ACK. This is the sole purpose of the > read of the ECOBUS register, which is used just because it > happens to lie in the same cacheline as FORCEWAKE. > > * we can choose not to poll for FORCEWAKE_ACK clear in step (5). > Instead, we can just leave the chip to go back to sleep while > we get on with other things. But in that case, we might come > back and try to wake the chip again before it's finished > responding to the write in step (4). So if we don't poll at > the end of the sequence, we have to poll at the beginning > instead; in other words, move step (5) to before step (1). I see we can move step (5) before step (1). But, I don't understand why we have to do this. For instance, if we put step (5) right after step (4), does the chip have to wake up for processing the polling (5)? And, additionally, I saw calling "__gen6_gt_wait_for_thread_c0" after step (2). Does it mean after FORCEWAKE_ACK is acknowledged, the hardware (possible ACPI) Sets the thread to C0 state? And is it noticible via GEN6_GT_THREAD_STATUS_REG (0x13805c)? Thanks, Jaeyong _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Question regarding forcewake in i915 2015-01-07 8:13 ` Jaeyong Yoo @ 2015-01-12 17:47 ` Dave Gordon 2015-01-15 7:38 ` Debugging registers (INSTPS, CSCMDOP, CSCMDVLD, ...) Jaeyong Yoo 0 siblings, 1 reply; 7+ messages in thread From: Dave Gordon @ 2015-01-12 17:47 UTC (permalink / raw) To: Jaeyong Yoo; +Cc: intel-gfx@lists.freedesktop.org On 07/01/15 08:13, Jaeyong Yoo wrote: > Thanks a lot. It is very helpful. Couple of follow-up questions below. > >> -----Original Message----- >> From: Dave Gordon [mailto:david.s.gordon@intel.com] >> Sent: Wednesday, January 07, 2015 12:19 AM >> To: jaeyong.yoo@samsung.com >> Cc: intel-gfx@lists.freedesktop.org >> Subject: Re: [Intel-gfx] Question regarding forcewake in i915 >> >> On 22/12/14 12:26, 유재용 wrote: >>> Hello intel-gfx, >>> >>> I'm reading i915 gpu drivers and find myself quite hard to understand >>> about forcewake concepts. >>> >>> I understand that it is something with the energy efficiency so >>> related to ACPI. And it looks like forcewake is working as a pair (get >>> and put). >>> In the "get" part, what it first does it waiting on FORCEWAKE_ACK_HSW >>> register (in case of haswell). >>> And then, it writes something to FORCEWAKE_MT register, read from ECOBUS. >>> And again, it waits on FORCEWAKE_ACK_HSW again! >>> It becomes more confusing when it comes to put. >>> In the "put" part, what it does it writing to FORCEWAKE_MT register >>> and read from ECOBUS. >>> >>> I tried to find some good reading materials about this forcewake, but >>> what I found was a series of patches in this mailing list. (which are >>> quite hard to follow from the begining) Could you explain about the >>> concept of FORCEWAKE and possibly the magic tricks on these get and >>> put? >>> >>> Thanks, >>> Jaeyong >> >> Hi Jaeyong, >> >> FORCEWAKE details vary a little from one chip to another, so this is only >> a general description, but essentially setting one or more bits in the >> FORCEWAKE register(s) prevents some or all of the power domains from going >> into the deeper idle (sleep) states (and forces them out of the sleep >> state if they're already asleep). Clearing the bit(s) allows the affected >> parts to go to sleep again. >> >> The FORCEWAKE_ACK register(s) contain one or more bits which reflect the >> internal state, and so acknowledge that the most recent write to the >> corresponding FORCEWAKE register has been accepted and acted upon. It can >> take a while for a portion of the chip to wake up, so after setting a >> FORCEWAKE bit we have to spin-wait until it's taken effect. >> >> So, the general algorithm for accessing some part of the chip that may be >> asleep is: >> 1) set the relevant bit of (a) FORCEWAKE register >> 2) poll (matching) FORCEWAKE_ACK until the write is acknowledged >> 3) access the chip (this can encompass several reads and writes) >> 4) clear the FORCEWAKE bit that we set earlier >> 5) poll FORCEWAKE_ACK again until this write is acknowledged >> >> Now for extra confusion, there are a few more details: >> * because reads and writes can in some cases be reordered, we >> need to force the write to FORCEWAKE to complete before the >> busy-polling of FORCEWAKE_ACK. This is the sole purpose of the >> read of the ECOBUS register, which is used just because it >> happens to lie in the same cacheline as FORCEWAKE. >> >> * we can choose not to poll for FORCEWAKE_ACK clear in step (5). >> Instead, we can just leave the chip to go back to sleep while >> we get on with other things. But in that case, we might come >> back and try to wake the chip again before it's finished >> responding to the write in step (4). So if we don't poll at >> the end of the sequence, we have to poll at the beginning >> instead; in other words, move step (5) to before step (1). > > I see we can move step (5) before step (1). But, I don't understand why > we have to do this. For instance, if we put step (5) right after step (4), > does the chip have to wake up for processing the polling (5)? No, polling these registers doesn't affect the wake state; they're in a power domain that's not itself controlled by FORCEWAKE. Either sequence (1-2-3-4-5 or 5-1-2-3-4) is valid. But if we use the former sequence, the CPU will be busy polling in step 5 for however long it takes the GPU to finish whatever else it's doing internally and then acknowledge the write to FORCEWAKE, which could take a while. By moving step 5 to the beginning of the sequence, the CPU can get on with unrelated tasks during this time, so in general by the time it gets round to needing to access FORCEWAKE again the previous write will have completed and the CPU will see the ACK on the first read. So the modified sequence allows greater parallelism between GPU and CPU. [Aside]: if the driver needs to make a whole sequence of accesses, then it's better to turn on FORCEWAKE once and hold it across the whole sequence and then release it at the end, rather than setting it around each access individually. See execlists_elsp_write() in intel_lrc.c for an example. [/Aside] > And, additionally, I saw calling "__gen6_gt_wait_for_thread_c0" after step (2). > Does it mean after FORCEWAKE_ACK is acknowledged, the hardware (possible ACPI) > Sets the thread to C0 state? > And is it noticible via GEN6_GT_THREAD_STATUS_REG (0x13805c)? > > Thanks, > Jaeyong I'm not the expert on that, but it /looks/ like the intent is for the driver to wait when waking up the GPU not only for the write to the FORCEWAKE register to be acknowledged, but also for the GT unit to be fully active (which might take longer). It polls the register you mention, and the comment suggests that if we don't wait here, other registers read /via/ the GT unit might appear to be zero when they aren't really. I note that it's described as a workaround on SNB/IVB/HSW only, so it may be that the original expectation was that seeing FORCEWAKE_ACK set meant that the chip was ready for access, but it turned out that in some circumstances the GT unit took longer than expected to become ready and so it had to be polled separately. .Dave. _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 7+ messages in thread
* Debugging registers (INSTPS, CSCMDOP, CSCMDVLD, ...) 2015-01-12 17:47 ` Dave Gordon @ 2015-01-15 7:38 ` Jaeyong Yoo 2015-01-19 12:52 ` Dave Gordon 2015-01-19 13:31 ` Dave Gordon 0 siblings, 2 replies; 7+ messages in thread From: Jaeyong Yoo @ 2015-01-15 7:38 UTC (permalink / raw) To: daniel.vetter; +Cc: intel-gfx Hello Daniel and other maintainers, While I'm working on drm memory allocator with myself, I've encountered render ring hang. And I am noticed that I can diagnose the command streamer's status with the following registers: INSTPS: 0x2070 CSCMDOP: 0x220c CSCMDVLD: 0x2210 INSTDONE_1: 206C I can see the general description of such registers in the following PDF. http://www.x.org/docs/intel/VOL_1_graphics_core.pdf But, sadly, this document does not provide a field description for implementation specific registers. I appreciate if you point me to such information in Haswell architecture? Best regards, JY _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Debugging registers (INSTPS, CSCMDOP, CSCMDVLD, ...) 2015-01-15 7:38 ` Debugging registers (INSTPS, CSCMDOP, CSCMDVLD, ...) Jaeyong Yoo @ 2015-01-19 12:52 ` Dave Gordon 2015-01-19 13:31 ` Dave Gordon 1 sibling, 0 replies; 7+ messages in thread From: Dave Gordon @ 2015-01-19 12:52 UTC (permalink / raw) To: Jaeyong Yoo, daniel.vetter; +Cc: intel-gfx On 15/01/15 07:38, Jaeyong Yoo wrote: > Hello Daniel and other maintainers, > > While I'm working on drm memory allocator with myself, I've encountered render ring hang. > And I am noticed that I can diagnose the command streamer's status with the following registers: > > INSTPS: 0x2070 > CSCMDOP: 0x220c > CSCMDVLD: 0x2210 > INSTDONE_1: 206C > > I can see the general description of such registers in the following PDF. > http://www.x.org/docs/intel/VOL_1_graphics_core.pdf > > But, sadly, this document does not provide a field description for implementation specific registers. > I appreciate if you point me to such information in Haswell architecture? > > Best regards, > JY > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx Hi, have you looked at intel-gpu-tools, available from http://cgit.freedesktop.org/xorg/app/intel-gpu-tools/ ? It includes tools to decode specific registers (such as INSTDONE) and describe what the various bits mean (have a look at lib/instdone.c). There's also a tool to decode an error dump, which is a commonly used means of tracking down a GPU hang. Hope this helps, .Dave. _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Debugging registers (INSTPS, CSCMDOP, CSCMDVLD, ...) 2015-01-15 7:38 ` Debugging registers (INSTPS, CSCMDOP, CSCMDVLD, ...) Jaeyong Yoo 2015-01-19 12:52 ` Dave Gordon @ 2015-01-19 13:31 ` Dave Gordon 1 sibling, 0 replies; 7+ messages in thread From: Dave Gordon @ 2015-01-19 13:31 UTC (permalink / raw) To: Jaeyong Yoo; +Cc: daniel.vetter, intel-gfx On 15/01/15 07:38, Jaeyong Yoo wrote: > Hello Daniel and other maintainers, > > While I'm working on drm memory allocator with myself, I've encountered render ring hang. > And I am noticed that I can diagnose the command streamer's status with the following registers: > > INSTPS: 0x2070 > CSCMDOP: 0x220c > CSCMDVLD: 0x2210 > INSTDONE_1: 206C > > I can see the general description of such registers in the following PDF. > http://www.x.org/docs/intel/VOL_1_graphics_core.pdf > > But, sadly, this document does not provide a field description for implementation specific registers. > I appreciate if you point me to such information in Haswell architecture? > > Best regards, > JY > _________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx Hi, that link you provided looks pretty ancient -- the document is dated Jan 2008! There's a lot of much more recent documentation at * https://01.org/linuxgraphics/documentation including some Haswell-specific files at: * https://01.org/linuxgraphics/documentation/2013-intel-core-processor-family If you are trying to diagnose a GPU hang, these may be useful: * https://01.org/linuxgraphics/documentation/how-get-gpu-error-state * https://01.org/linuxgraphics/documentation/intel-gpu-dump-tool-guide Hope this helps, .Dave. _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-01-19 13:32 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-12-22 12:26 Question regarding forcewake in i915 유재용 2015-01-06 15:19 ` Dave Gordon 2015-01-07 8:13 ` Jaeyong Yoo 2015-01-12 17:47 ` Dave Gordon 2015-01-15 7:38 ` Debugging registers (INSTPS, CSCMDOP, CSCMDVLD, ...) Jaeyong Yoo 2015-01-19 12:52 ` Dave Gordon 2015-01-19 13:31 ` Dave Gordon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox