* more intel drm issues (was Re: [git pull] drm intel only fixes)
@ 2011-01-20 2:39 Linus Torvalds
2011-01-20 4:55 ` Jeff Chua
0 siblings, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2011-01-20 2:39 UTC (permalink / raw)
To: Chris Wilson; +Cc: Jesse Barnes, Dave Airlie, linux-kernel, DRI mailing list
Ok, so I have a new issue that I'm currently bisecting but that people
may be able to figure out even befor emy bisect finishes.
On my slow Atom netbook (that I'm planning on using as my traveling
companion for LCA), suspend-to-RAM takes a long time with current git.
It's quite noticeable - it used to be pretty much instant, now it
takes three seconds. And it's all i915 graphics (although I haven't
bisected it down fully, I've bisected it down to the drm merge).
In the good case (like plain 2.6.37), a suspend event will look
something like this:
...
PM: suspend of devices complete after 147.646 msecs
...
but the i915 driver at some point made it take 3s:
...
PM: suspend of devices complete after 3059.656 msecs
...
which is definitely long enough to be worth fixing.
Maybe the person responsible will go "oh, that's obviously due to
xyz", and just fix it. But I'll continue to bisect in case nobody
steps up to admit to wasting time..
Linus
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: more intel drm issues (was Re: [git pull] drm intel only fixes) 2011-01-20 2:39 more intel drm issues (was Re: [git pull] drm intel only fixes) Linus Torvalds @ 2011-01-20 4:55 ` Jeff Chua 2011-01-20 6:22 ` Linus Torvalds 0 siblings, 1 reply; 13+ messages in thread From: Jeff Chua @ 2011-01-20 4:55 UTC (permalink / raw) To: Linus Torvalds, Len Brown, Rafael J. Wysocki Cc: Chris Wilson, Jesse Barnes, Dave Airlie, linux-kernel, DRI mailing list [-- Attachment #1: Type: text/plain, Size: 1355 bytes --] On Thu, Jan 20, 2011 at 10:39 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > Ok, so I have a new issue that I'm currently bisecting but that people > may be able to figure out even befor emy bisect finishes. > > On my slow Atom netbook (that I'm planning on using as my traveling > companion for LCA), suspend-to-RAM takes a long time with current git. > It's quite noticeable - it used to be pretty much instant, now it > takes three seconds. And it's all i915 graphics (although I haven't > bisected it down fully, I've bisected it down to the drm merge). > > In the good case (like plain 2.6.37), a suspend event will look > something like this:... > PM: suspend of devices complete after 147.646 msecs > but the i915 driver at some point made it take 3s: > PM: suspend of devices complete after 3059.656 msecs > which is definitely long enough to be worth fixing. > > Maybe the person responsible will go "oh, that's obviously due to > xyz", and just fix it. But I'll continue to bisect in case nobody > steps up to admit to wasting time.. Rafael send out two patches earlier. Could be related. I was facing issue during resume. Attached are the two patches. You'll need to apply both. Len has suggested before to try to booting with "acpi_sleep=nonvs" which works as well for my case. Thanks, Jeff [-- Attachment #2: patch-resume1 --] [-- Type: application/octet-stream, Size: 7134 bytes --] Delivered-To: jeff.chua.linux@gmail.com Received: by 10.42.164.7 with SMTP id e7cs211926icy; Wed, 19 Jan 2011 13:29:53 -0800 (PST) Received: by 10.216.19.139 with SMTP id n11mr1144343wen.78.1295472592165; Wed, 19 Jan 2011 13:29:52 -0800 (PST) Return-Path: <rjw@sisk.pl> Received: from ogre.sisk.pl (ogre.sisk.pl [217.79.144.158]) by mx.google.com with ESMTP id h49si8956988wer.34.2011.01.19.13.29.51; Wed, 19 Jan 2011 13:29:52 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of rjw@sisk.pl designates 217.79.144.158 as permitted sender) client-ip=217.79.144.158; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of rjw@sisk.pl designates 217.79.144.158 as permitted sender) smtp.mail=rjw@sisk.pl Received: from localhost (localhost.localdomain [127.0.0.1]) by ogre.sisk.pl (Postfix) with ESMTP id 725D51A26AE; Wed, 19 Jan 2011 22:14:03 +0100 (CET) Received: from ogre.sisk.pl ([127.0.0.1]) by localhost (ogre.sisk.pl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 13137-05; Wed, 19 Jan 2011 22:13:36 +0100 (CET) Received: from ferrari.rjw.lan (220-bem-13.acn.waw.pl [82.210.184.220]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ogre.sisk.pl (Postfix) with ESMTP id 9F2D41A2701; Wed, 19 Jan 2011 22:13:36 +0100 (CET) From: "Rafael J. Wysocki" <rjw@sisk.pl> To: Jeff Chua <jeff.chua.linux@gmail.com> Subject: [1/2] ACPI: Introduce acpi_os_ioremap() Date: Wed, 19 Jan 2011 22:27:14 +0100 User-Agent: KMail/1.13.5 (Linux/2.6.37+; KDE/4.4.4; x86_64; ; ) Cc: Len Brown <len.brown@intel.com>, linux-pm@lists.linux-foundation.org, Jack Steiner <steiner@sgi.com>, Linus Torvalds <torvalds@linux-foundation.org>, ACPI Devel Maling List <linux-acpi@vger.kernel.org> References: <AANLkTimBEnKTuR5aQyXvpZX8grQPw1eCaVgP-k9yBXhv@mail.gmail.com> <AANLkTimf3B=z1DCA2q+yi-ccyAJwmjkB_g8EP=ZTehmn@mail.gmail.com> <201101192219.07569.rjw@sisk.pl> In-Reply-To: <201101192219.07569.rjw@sisk.pl> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201101192227.14928.rjw@sisk.pl> X-Virus-Scanned: amavisd-new at ogre.sisk.pl using MkS_Vir for Linux From: Rafael J. Wysocki <rjw@sisk.pl> Commit ca9b600 (ACPI / PM: Make suspend_nvs_save() use acpi_os_map_memory()) attempted to prevent the code in osl.c and nvs.c from using different ioremap() variants by making the latter use acpi_os_map_memory() for mapping the NVS pages. However, that also requires acpi_os_unmap_memory() to be used for unmapping them, which causes synchronize_rcu() to be executed many times in a row unnecessarily and introduces substantial delays during resume on some systems. Instead of using acpi_os_map_memory() for mapping the NVS pages in nvs.c introduce acpi_os_ioremap() calling ioremap_cache() and make the code in both osl.c and nvs.c use it. Reported-by: Jeff Chua <jeff.chua.linux@gmail.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> --- drivers/acpi/nvs.c | 7 ++++--- drivers/acpi/osl.c | 12 +++++++----- include/linux/acpi.h | 3 --- include/linux/acpi_io.h | 16 ++++++++++++++++ 4 files changed, 27 insertions(+), 11 deletions(-) Index: linux-2.6/drivers/acpi/osl.c =================================================================== --- linux-2.6.orig/drivers/acpi/osl.c +++ linux-2.6/drivers/acpi/osl.c @@ -38,6 +38,7 @@ #include <linux/workqueue.h> #include <linux/nmi.h> #include <linux/acpi.h> +#include <linux/acpi_io.h> #include <linux/efi.h> #include <linux/ioport.h> #include <linux/list.h> @@ -302,9 +303,10 @@ void __iomem *__init_refok acpi_os_map_memory(acpi_physical_address phys, acpi_size size) { struct acpi_ioremap *map, *tmp_map; - unsigned long flags, pg_sz; + unsigned long flags; void __iomem *virt; - phys_addr_t pg_off; + acpi_physical_address pg_off; + acpi_size pg_sz; if (phys > ULONG_MAX) { printk(KERN_ERR PREFIX "Cannot map memory that high\n"); @@ -320,7 +322,7 @@ acpi_os_map_memory(acpi_physical_address pg_off = round_down(phys, PAGE_SIZE); pg_sz = round_up(phys + size, PAGE_SIZE) - pg_off; - virt = ioremap_cache(pg_off, pg_sz); + virt = acpi_os_ioremap(pg_off, pg_sz); if (!virt) { kfree(map); return NULL; @@ -642,7 +644,7 @@ acpi_os_read_memory(acpi_physical_addres virt_addr = acpi_map_vaddr_lookup(phys_addr, size); rcu_read_unlock(); if (!virt_addr) { - virt_addr = ioremap_cache(phys_addr, size); + virt_addr = acpi_os_ioremap(phys_addr, size); unmap = 1; } if (!value) @@ -678,7 +680,7 @@ acpi_os_write_memory(acpi_physical_addre virt_addr = acpi_map_vaddr_lookup(phys_addr, size); rcu_read_unlock(); if (!virt_addr) { - virt_addr = ioremap_cache(phys_addr, size); + virt_addr = acpi_os_ioremap(phys_addr, size); unmap = 1; } Index: linux-2.6/drivers/acpi/nvs.c =================================================================== --- linux-2.6.orig/drivers/acpi/nvs.c +++ linux-2.6/drivers/acpi/nvs.c @@ -12,6 +12,7 @@ #include <linux/mm.h> #include <linux/slab.h> #include <linux/acpi.h> +#include <linux/acpi_io.h> #include <acpi/acpiosxf.h> /* @@ -80,7 +81,7 @@ void suspend_nvs_free(void) free_page((unsigned long)entry->data); entry->data = NULL; if (entry->kaddr) { - acpi_os_unmap_memory(entry->kaddr, entry->size); + iounmap(entry->kaddr); entry->kaddr = NULL; } } @@ -114,8 +115,8 @@ int suspend_nvs_save(void) list_for_each_entry(entry, &nvs_list, node) if (entry->data) { - entry->kaddr = acpi_os_map_memory(entry->phys_start, - entry->size); + entry->kaddr = acpi_os_ioremap(entry->phys_start, + entry->size); if (!entry->kaddr) { suspend_nvs_free(); return -ENOMEM; Index: linux-2.6/include/linux/acpi.h =================================================================== --- linux-2.6.orig/include/linux/acpi.h +++ linux-2.6/include/linux/acpi.h @@ -306,9 +306,6 @@ extern acpi_status acpi_pci_osc_control_ u32 *mask, u32 req); extern void acpi_early_init(void); -int acpi_os_map_generic_address(struct acpi_generic_address *addr); -void acpi_os_unmap_generic_address(struct acpi_generic_address *addr); - #else /* !CONFIG_ACPI */ #define acpi_disabled 1 Index: linux-2.6/include/linux/acpi_io.h =================================================================== --- /dev/null +++ linux-2.6/include/linux/acpi_io.h @@ -0,0 +1,16 @@ +#ifndef _ACPI_IO_H_ +#define _ACPI_IO_H_ + +#include <linux/io.h> +#include <acpi/acpi.h> + +static inline void __iomem *acpi_os_ioremap(acpi_physical_address phys, + acpi_size size) +{ + return ioremap_cache(phys, size); +} + +int acpi_os_map_generic_address(struct acpi_generic_address *addr); +void acpi_os_unmap_generic_address(struct acpi_generic_address *addr); + +#endif [-- Attachment #3: patch-resume2 --] [-- Type: application/octet-stream, Size: 3639 bytes --] Delivered-To: jeff.chua.linux@gmail.com Received: by 10.42.164.7 with SMTP id e7cs211925icy; Wed, 19 Jan 2011 13:29:51 -0800 (PST) Received: by 10.227.151.204 with SMTP id d12mr1432539wbw.113.1295472590379; Wed, 19 Jan 2011 13:29:50 -0800 (PST) Return-Path: <rjw@sisk.pl> Received: from ogre.sisk.pl (ogre.sisk.pl [217.79.144.158]) by mx.google.com with ESMTP id i10si11530447wej.187.2011.01.19.13.29.49; Wed, 19 Jan 2011 13:29:50 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of rjw@sisk.pl designates 217.79.144.158 as permitted sender) client-ip=217.79.144.158; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of rjw@sisk.pl designates 217.79.144.158 as permitted sender) smtp.mail=rjw@sisk.pl Received: from localhost (localhost.localdomain [127.0.0.1]) by ogre.sisk.pl (Postfix) with ESMTP id 884BB1A26AC; Wed, 19 Jan 2011 22:14:02 +0100 (CET) Received: from ogre.sisk.pl ([127.0.0.1]) by localhost (ogre.sisk.pl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 12814-05; Wed, 19 Jan 2011 22:13:37 +0100 (CET) Received: from ferrari.rjw.lan (220-bem-13.acn.waw.pl [82.210.184.220]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ogre.sisk.pl (Postfix) with ESMTP id D47561A2738; Wed, 19 Jan 2011 22:13:36 +0100 (CET) From: "Rafael J. Wysocki" <rjw@sisk.pl> To: Jeff Chua <jeff.chua.linux@gmail.com> Subject: [2/2] ACPI / PM: Call suspend_nvs_free() earlier during resume Date: Wed, 19 Jan 2011 22:27:55 +0100 User-Agent: KMail/1.13.5 (Linux/2.6.37+; KDE/4.4.4; x86_64; ; ) Cc: Len Brown <len.brown@intel.com>, linux-pm@lists.linux-foundation.org, Jack Steiner <steiner@sgi.com>, Linus Torvalds <torvalds@linux-foundation.org>, ACPI Devel Maling List <linux-acpi@vger.kernel.org> References: <AANLkTimBEnKTuR5aQyXvpZX8grQPw1eCaVgP-k9yBXhv@mail.gmail.com> <AANLkTimf3B=z1DCA2q+yi-ccyAJwmjkB_g8EP=ZTehmn@mail.gmail.com> <201101192219.07569.rjw@sisk.pl> In-Reply-To: <201101192219.07569.rjw@sisk.pl> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201101192227.55298.rjw@sisk.pl> X-Virus-Scanned: amavisd-new at ogre.sisk.pl using MkS_Vir for Linux From: Rafael J. Wysocki <rjw@sisk.pl> It turns out that some device drivers map pages from the ACPI NVS region during resume using ioremap(), which conflicts with ioremap_cache() used for mapping those pages by the NVS save/restore code in nvs.c. Make the NVS pages mapped by the code in nvs.c be unmapped before device drivers' resume routines run. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> --- drivers/acpi/sleep.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-2.6/drivers/acpi/sleep.c =================================================================== --- linux-2.6.orig/drivers/acpi/sleep.c +++ linux-2.6/drivers/acpi/sleep.c @@ -166,6 +166,7 @@ static void acpi_pm_finish(void) u32 acpi_state = acpi_target_sleep_state; acpi_ec_unblock_transactions(); + suspend_nvs_free(); if (acpi_state == ACPI_STATE_S0) return; @@ -186,7 +187,6 @@ static void acpi_pm_finish(void) */ static void acpi_pm_end(void) { - suspend_nvs_free(); /* * This is necessary in case acpi_pm_finish() is not called during a * failing transition to a sleep state. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: more intel drm issues (was Re: [git pull] drm intel only fixes) 2011-01-20 4:55 ` Jeff Chua @ 2011-01-20 6:22 ` Linus Torvalds 2011-01-20 10:17 ` Jeff Chua 2011-01-20 10:25 ` Chris Wilson 0 siblings, 2 replies; 13+ messages in thread From: Linus Torvalds @ 2011-01-20 6:22 UTC (permalink / raw) To: Jeff Chua Cc: Len Brown, Rafael J. Wysocki, Chris Wilson, Jesse Barnes, Dave Airlie, linux-kernel, DRI mailing list On Wed, Jan 19, 2011 at 8:55 PM, Jeff Chua <jeff.chua.linux@gmail.com> wrote: > > Rafael send out two patches earlier. Could be related. I was facing > issue during resume. No, I'm aware of the rcu-synchronize thing, this isn't it. This is really at the suspend stage, and I had bisected it down to the drm changes. In fact, by now I have bisected it down to a single commit. It's another merge commit, which makes me a bit nervous (I bisected another issue today, and it turned out to simply not be repeatable). But this time the merge commit actually has a real conflict that got fixed up in the merge, and the code around the conflict waits for three seconds, and three seconds is also exactly how long the delay at suspend time is. So I get the feeling that this time it's a real issue, and what happened was that the merge may have been a mismerge. Chris: as of commit 8d5203ca6253 ("Merge branch 'drm-intel-fixes' into drm-intel-next") I'm getting that 3-second delay at suspend time. And the merge diff looks like this: + struct drm_device *dev = ring->dev; + struct drm_i915_private *dev_priv = dev->dev_private; unsigned long end; - drm_i915_private_t *dev_priv = dev->dev_private; u32 head; - head = intel_read_status_page(ring, 4); - if (head) { - ring->head = head & HEAD_ADDR; - ring->space = ring->head - (ring->tail + 8); - if (ring->space < 0) - ring->space += ring->size; - if (ring->space >= n) - return 0; - } - trace_i915_ring_wait_begin (dev); end = jiffies + 3 * HZ; do { and that whole do-loop with a 3-second timeout makes me *very* suspicious. It used to have (in _one_ of the parent branches) that code before it to return early if there was space in the ring, now it doesn't any more - and that merge co-incides with my suspend suddenly taking 3 seconds. The same check that is deleted does exist inside the loop too, but there it has some extra code it in (compare to "actual_head" and so on), so I wonder if the fast-case was somehow hiding this issue. But I don't know the code. I just see that whole "PM: suspend of devices complete after x.xxx msecs" issue, and I can see the machine taking too long to suspend. Linus ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: more intel drm issues (was Re: [git pull] drm intel only fixes) 2011-01-20 6:22 ` Linus Torvalds @ 2011-01-20 10:17 ` Jeff Chua 2011-01-20 10:25 ` Chris Wilson 1 sibling, 0 replies; 13+ messages in thread From: Jeff Chua @ 2011-01-20 10:17 UTC (permalink / raw) To: Linus Torvalds Cc: Len Brown, Rafael J. Wysocki, Chris Wilson, Jesse Barnes, Dave Airlie, linux-kernel, DRI mailing list On Thu, Jan 20, 2011 at 2:22 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Wed, Jan 19, 2011 at 8:55 PM, Jeff Chua <jeff.chua.linux@gmail.com> wrote: >> >> Rafael send out two patches earlier. Could be related. I was facing >> issue during resume. > > No, I'm aware of the rcu-synchronize thing, this isn't it. This is > really at the suspend stage, and I had bisected it down to the drm > changes. > > In fact, by now I have bisected it down to a single commit. It's > another merge commit, which makes me a bit nervous (I bisected another > issue today, and it turned out to simply not be repeatable). > > But this time the merge commit actually has a real conflict that got > fixed up in the merge, and the code around the conflict waits for > three seconds, and three seconds is also exactly how long the delay at > suspend time is. So I get the feeling that this time it's a real > issue, and what happened was that the merge may have been a mismerge. I did see that once during suspend. But as you mentioned, 3 seconds, and it wasn't repeatable. It was at the first suspend right after reboot. Jeff ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: more intel drm issues (was Re: [git pull] drm intel only fixes) 2011-01-20 6:22 ` Linus Torvalds 2011-01-20 10:17 ` Jeff Chua @ 2011-01-20 10:25 ` Chris Wilson 2011-01-20 16:07 ` Linus Torvalds 1 sibling, 1 reply; 13+ messages in thread From: Chris Wilson @ 2011-01-20 10:25 UTC (permalink / raw) To: Linus Torvalds, Jeff Chua Cc: Len Brown, Rafael J. Wysocki, Jesse Barnes, Dave Airlie, linux-kernel, DRI mailing list On Wed, 19 Jan 2011 22:22:48 -0800, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Wed, Jan 19, 2011 at 8:55 PM, Jeff Chua <jeff.chua.linux@gmail.com> wrote: > > > > Rafael send out two patches earlier. Could be related. I was facing > > issue during resume. > > No, I'm aware of the rcu-synchronize thing, this isn't it. This is > really at the suspend stage, and I had bisected it down to the drm > changes. > > In fact, by now I have bisected it down to a single commit. It's > another merge commit, which makes me a bit nervous (I bisected another > issue today, and it turned out to simply not be repeatable). > > But this time the merge commit actually has a real conflict that got > fixed up in the merge, and the code around the conflict waits for > three seconds, and three seconds is also exactly how long the delay at > suspend time is. So I get the feeling that this time it's a real > issue, and what happened was that the merge may have been a mismerge. > > Chris: as of commit 8d5203ca6253 ("Merge branch 'drm-intel-fixes' into > drm-intel-next") I'm getting that 3-second delay at suspend time. And > the merge diff looks like this: > > + struct drm_device *dev = ring->dev; > + struct drm_i915_private *dev_priv = dev->dev_private; > unsigned long end; > - drm_i915_private_t *dev_priv = dev->dev_private; > u32 head; > > - head = intel_read_status_page(ring, 4); > - if (head) { > - ring->head = head & HEAD_ADDR; > - ring->space = ring->head - (ring->tail + 8); > - if (ring->space < 0) > - ring->space += ring->size; > - if (ring->space >= n) > - return 0; > - } > - > trace_i915_ring_wait_begin (dev); > end = jiffies + 3 * HZ; > do { > > and that whole do-loop with a 3-second timeout makes me *very* > suspicious. It used to have (in _one_ of the parent branches) that > code before it to return early if there was space in the ring, now it > doesn't any more - and that merge co-incides with my suspend suddenly > taking 3 seconds. > > The same check that is deleted does exist inside the loop too, but > there it has some extra code it in (compare to "actual_head" and so > on), so I wonder if the fast-case was somehow hiding this issue. Right, the autoreported HEAD may have been already reset to 0 and so hit the wraparound bug which caused it to exit early without actually quiescing the ringbuffer. Another possibility is that I added a 3s timeout waiting for a request if IRQs were suspended: commit b5ba177d8d71f011c23b1cabec99fdaddae65c4d Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Dec 14 12:17:15 2010 +0000 drm/i915: Poll for seqno completion if IRQ is disabled Both of those I think are symptoms of another problem, that perhaps during suspend we are shutting down parts of the chip before idling? -Chris -- Chris Wilson, Intel Open Source Technology Centre ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: more intel drm issues (was Re: [git pull] drm intel only fixes) 2011-01-20 10:25 ` Chris Wilson @ 2011-01-20 16:07 ` Linus Torvalds 2011-01-20 17:38 ` Chris Wilson 2011-01-20 17:41 ` [PATCH] drm/i915/ringbuffer: Fix use of stale HEAD position whilst polling for space Chris Wilson 0 siblings, 2 replies; 13+ messages in thread From: Linus Torvalds @ 2011-01-20 16:07 UTC (permalink / raw) To: Chris Wilson Cc: Jeff Chua, Len Brown, Rafael J. Wysocki, Jesse Barnes, Dave Airlie, linux-kernel, DRI mailing list On Thu, Jan 20, 2011 at 2:25 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote: > > Right, the autoreported HEAD may have been already reset to 0 and so hit > the wraparound bug which caused it to exit early without actually > quiescing the ringbuffer. Yeah, that would explain the issue. > Another possibility is that I added a 3s timeout waiting for a request if > IRQs were suspended: No, if IRQ's are actually suspended here, then that codepath is totally buggy and would blow up (msleep() doesn't work, and jiffies wouldn't advance on UP). So that's not it. > Both of those I think are symptoms of another problem, that perhaps during > suspend we are shutting down parts of the chip before idling? That could be, but looking at the code, one thing strikes me: the _normal_ case (of just waiting for "enough space" in the ring buffer) doesn't need to use the exact case, but the "wait for ring buffer to be totally empty" does. Which means that the use of the "fast-but-inaccurate" 'head' sounds wrong for the "wait for idle" case. So can you explain the difference between intel_read_status_page(ring, 4); vs I915_READ_HEAD(ring); because from looking at the code, I get the notion that "intel_read_status_page()" may not be exact. But what happens if that inexact value matches our cached ring->actual_head, so we never even try to read the exact case? Does it _stay_ inexact for arbitrarily long times? If so, we might wait for the ring to empty forever (well, until the timeout - the behavior I see), even though the ring really _is_ empty. No? Also, isn't that "head < ring->actual_head" buggy? What about the overflow case? Not that we care, because afaik, 'actual_head' is not actually used anywhere, so it should be called 'pointless_head'? That code looks suspiciously bogus. Linus ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: more intel drm issues (was Re: [git pull] drm intel only fixes) 2011-01-20 16:07 ` Linus Torvalds @ 2011-01-20 17:38 ` Chris Wilson 2011-01-20 17:51 ` Linus Torvalds 2011-01-20 17:41 ` [PATCH] drm/i915/ringbuffer: Fix use of stale HEAD position whilst polling for space Chris Wilson 1 sibling, 1 reply; 13+ messages in thread From: Chris Wilson @ 2011-01-20 17:38 UTC (permalink / raw) To: Linus Torvalds Cc: Jeff Chua, Len Brown, Rafael J. Wysocki, Jesse Barnes, Dave Airlie, linux-kernel, DRI mailing list On Thu, 20 Jan 2011 08:07:02 -0800, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Thu, Jan 20, 2011 at 2:25 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote: > > > > Right, the autoreported HEAD may have been already reset to 0 and so hit > > the wraparound bug which caused it to exit early without actually > > quiescing the ringbuffer. > > Yeah, that would explain the issue. > > > Another possibility is that I added a 3s timeout waiting for a request if > > IRQs were suspended: > > No, if IRQ's are actually suspended here, then that codepath is > totally buggy and would blow up (msleep() doesn't work, and jiffies > wouldn't advance on UP). So that's not it. > > > Both of those I think are symptoms of another problem, that perhaps during > > suspend we are shutting down parts of the chip before idling? > > That could be, but looking at the code, one thing strikes me: the > _normal_ case (of just waiting for "enough space" in the ring buffer) > doesn't need to use the exact case, but the "wait for ring buffer to > be totally empty" does. > > Which means that the use of the "fast-but-inaccurate" 'head' sounds > wrong for the "wait for idle" case. > > So can you explain the difference between > > intel_read_status_page(ring, 4); > > vs > > I915_READ_HEAD(ring); For I915_READ_HEAD, we need to wake up the GT power well, perform an uncached read from the register, and then power down. This takes on the order of a 100 microseconds (less if the GT is already powered up, etc). Instead a read from the status page is from cached memory. The caveat here is that value is only updated by the gfx engine when its HEAD crosses every 64k boundary. So quite rarely. > because from looking at the code, I get the notion that > "intel_read_status_page()" may not be exact. But what happens if that > inexact value matches our cached ring->actual_head, so we never even > try to read the exact case? Does it _stay_ inexact for arbitrarily > long times? If so, we might wait for the ring to empty forever (well, > until the timeout - the behavior I see), even though the ring really > _is_ empty. No? Ah. Your analysis is spot on and this will cause a hang whilst polling if we enter the loop with the last known head the same as the reported value. > Also, isn't that "head < ring->actual_head" buggy? What about the > overflow case? Not that we care, because afaik, 'actual_head' is not > actually used anywhere, so it should be called 'pointless_head'? This is the one case that I think is handled correctly, ignoring all the other bugs. -Chris -- Chris Wilson, Intel Open Source Technology Centre ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: more intel drm issues (was Re: [git pull] drm intel only fixes) 2011-01-20 17:38 ` Chris Wilson @ 2011-01-20 17:51 ` Linus Torvalds 2011-01-20 20:44 ` Linus Torvalds 0 siblings, 1 reply; 13+ messages in thread From: Linus Torvalds @ 2011-01-20 17:51 UTC (permalink / raw) To: Chris Wilson Cc: Jeff Chua, Len Brown, Rafael J. Wysocki, Jesse Barnes, Dave Airlie, linux-kernel, DRI mailing list On Thu, Jan 20, 2011 at 9:38 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote: > >> because from looking at the code, I get the notion that >> "intel_read_status_page()" may not be exact. But what happens if that >> inexact value matches our cached ring->actual_head, so we never even >> try to read the exact case? Does it _stay_ inexact for arbitrarily >> long times? If so, we might wait for the ring to empty forever (well, >> until the timeout - the behavior I see), even though the ring really >> _is_ empty. No? > > Ah. Your analysis is spot on and this will cause a hang whilst polling if > we enter the loop with the last known head the same as the reported value. So how about just doing this in the loop? It will mean that the _first_ read uses the fast cached one (the common case, hopefully), but then if we loop, we'll use the slow exact one. (cut-and-paste, so whitespace isn't good): diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 03e3370..11bbfb5 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -961,6 +961,8 @@ int intel_wait_ring_buffer(struct intel_ring_buffer *ring, int n) msleep(1); if (atomic_read(&dev_priv->mm.wedged)) return -EAGAIN; + /* Force a re-read. FIXME: what if read_status_page says 0 too */ + ring->actual_head = 0; } while (!time_after(jiffies, end)); trace_i915_ring_wait_end (dev); return -EBUSY; but to get rid of the FIXME you should probably get rid of "actual_head" entirely, and just use a local variable as a simple boolean flag instead. Hmm? Linus ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: more intel drm issues (was Re: [git pull] drm intel only fixes) 2011-01-20 17:51 ` Linus Torvalds @ 2011-01-20 20:44 ` Linus Torvalds 0 siblings, 0 replies; 13+ messages in thread From: Linus Torvalds @ 2011-01-20 20:44 UTC (permalink / raw) To: Chris Wilson Cc: Jeff Chua, Len Brown, Rafael J. Wysocki, Jesse Barnes, Dave Airlie, linux-kernel, DRI mailing list [-- Attachment #1: Type: text/plain, Size: 1475 bytes --] On Thu, Jan 20, 2011 at 9:51 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > So how about just doing this in the loop? It will mean that the > _first_ read uses the fast cached one (the common case, hopefully), > but then if we loop, we'll use the slow exact one. > > (cut-and-paste, so whitespace isn't good): > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c > b/drivers/gpu/drm/i915/intel_ringbuffer.c > index 03e3370..11bbfb5 100644 > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > @@ -961,6 +961,8 @@ int intel_wait_ring_buffer(struct > intel_ring_buffer *ring, int n) > msleep(1); > if (atomic_read(&dev_priv->mm.wedged)) > return -EAGAIN; > + /* Force a re-read. FIXME: what if read_status_page > says 0 too */ > + ring->actual_head = 0; > } while (!time_after(jiffies, end)); > trace_i915_ring_wait_end (dev); > return -EBUSY; This makes no difference. And the reason is exactly that we get the zero case that I had in the comment. But THIS attached patch actually seems to fix the slow suspend for me. I removed the accesses to "actual_head", because that whole field seems to not be used. So it seems like the intel_read_status_page() thing returns zero forever when suspending. Maybe you can explain why. Linus [-- Attachment #2: patch.diff --] [-- Type: text/x-patch, Size: 1716 bytes --] drivers/gpu/drm/i915/intel_ringbuffer.c | 5 +++-- drivers/gpu/drm/i915/intel_ringbuffer.h | 1 - 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 03e3370..f6b9baa 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -928,6 +928,7 @@ static int intel_wrap_ring_buffer(struct intel_ring_buffer *ring) int intel_wait_ring_buffer(struct intel_ring_buffer *ring, int n) { + int reread = 0; struct drm_device *dev = ring->dev; struct drm_i915_private *dev_priv = dev->dev_private; unsigned long end; @@ -940,9 +941,8 @@ int intel_wait_ring_buffer(struct intel_ring_buffer *ring, int n) * fallback to the slow and accurate path. */ head = intel_read_status_page(ring, 4); - if (head < ring->actual_head) + if (reread) head = I915_READ_HEAD(ring); - ring->actual_head = head; ring->head = head & HEAD_ADDR; ring->space = ring->head - (ring->tail + 8); if (ring->space < 0) @@ -961,6 +961,7 @@ int intel_wait_ring_buffer(struct intel_ring_buffer *ring, int n) msleep(1); if (atomic_read(&dev_priv->mm.wedged)) return -EAGAIN; + reread = 1; } while (!time_after(jiffies, end)); trace_i915_ring_wait_end (dev); return -EBUSY; diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index be9087e..5b0abfa 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -47,7 +47,6 @@ struct intel_ring_buffer { struct drm_device *dev; struct drm_i915_gem_object *obj; - u32 actual_head; u32 head; u32 tail; int space; ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH] drm/i915/ringbuffer: Fix use of stale HEAD position whilst polling for space 2011-01-20 16:07 ` Linus Torvalds 2011-01-20 17:38 ` Chris Wilson @ 2011-01-20 17:41 ` Chris Wilson 2011-01-21 8:35 ` Jiri Slaby 1 sibling, 1 reply; 13+ messages in thread From: Chris Wilson @ 2011-01-20 17:41 UTC (permalink / raw) Cc: Dave Airlie, Jesse Barnes, linux-kernel, dri-devel, Chris Wilson During suspend, Linus found that his machine would hang for 3 seconds, and identified that intel_ring_buffer_wait() was the culprit: "Because from looking at the code, I get the notion that "intel_read_status_page()" may not be exact. But what happens if that inexact value matches our cached ring->actual_head, so we never even try to read the exact case? Does it _stay_ inexact for arbitrarily long times? If so, we might wait for the ring to empty forever (well, until the timeout - the behavior I see), even though the ring really _is_ empty." As the reported HEAD position is only updated every time it crosses a 64k boundary, whilst draining the ring it is indeed likely to remain one value. If that value matches the last known HEAD position, we never read the true value from the register and so trigger a timeout. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> --- drivers/gpu/drm/i915/intel_ringbuffer.c | 40 ++++++++++++++++++------------ drivers/gpu/drm/i915/intel_ringbuffer.h | 1 - 2 files changed, 24 insertions(+), 17 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 51fbc5e..6218fa9 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -34,6 +34,14 @@ #include "i915_trace.h" #include "intel_drv.h" +static inline int ring_space(struct intel_ring_buffer *ring) +{ + int space = (ring->head & HEAD_ADDR) - (ring->tail + 8); + if (space < 0) + space += ring->size; + return space; +} + static u32 i915_gem_get_seqno(struct drm_device *dev) { drm_i915_private_t *dev_priv = dev->dev_private; @@ -204,11 +212,9 @@ static int init_ring_common(struct intel_ring_buffer *ring) if (!drm_core_check_feature(ring->dev, DRIVER_MODESET)) i915_kernel_lost_context(ring->dev); else { - ring->head = I915_READ_HEAD(ring) & HEAD_ADDR; + ring->head = I915_READ_HEAD(ring); ring->tail = I915_READ_TAIL(ring) & TAIL_ADDR; - ring->space = ring->head - (ring->tail + 8); - if (ring->space < 0) - ring->space += ring->size; + ring->space = ring_space(ring); } return 0; @@ -921,7 +927,7 @@ static int intel_wrap_ring_buffer(struct intel_ring_buffer *ring) } ring->tail = 0; - ring->space = ring->head - 8; + ring->space = ring_space(ring); return 0; } @@ -933,20 +939,22 @@ int intel_wait_ring_buffer(struct intel_ring_buffer *ring, int n) unsigned long end; u32 head; + /* If the reported head position has wrapped or hasn't advanced, + * fallback to the slow and accurate path. + */ + head = intel_read_status_page(ring, 4); + if (head > ring->head) { + ring->head = head; + ring->space = ring_space(ring); + if (ring->space >= n) + return 0; + } + trace_i915_ring_wait_begin (dev); end = jiffies + 3 * HZ; do { - /* If the reported head position has wrapped or hasn't advanced, - * fallback to the slow and accurate path. - */ - head = intel_read_status_page(ring, 4); - if (head < ring->actual_head) - head = I915_READ_HEAD(ring); - ring->actual_head = head; - ring->head = head & HEAD_ADDR; - ring->space = ring->head - (ring->tail + 8); - if (ring->space < 0) - ring->space += ring->size; + ring->head = I915_READ_HEAD(ring); + ring->space = ring_space(ring); if (ring->space >= n) { trace_i915_ring_wait_end(dev); return 0; diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index 61d5220..6d6fde8 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -47,7 +47,6 @@ struct intel_ring_buffer { struct drm_device *dev; struct drm_i915_gem_object *obj; - u32 actual_head; u32 head; u32 tail; int space; -- 1.7.2.3 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] drm/i915/ringbuffer: Fix use of stale HEAD position whilst polling for space 2011-01-20 17:41 ` [PATCH] drm/i915/ringbuffer: Fix use of stale HEAD position whilst polling for space Chris Wilson @ 2011-01-21 8:35 ` Jiri Slaby 2011-01-21 10:11 ` [PATCH] drm/i915: Fix use of invalid array size for ring->sync_seqno Chris Wilson 0 siblings, 1 reply; 13+ messages in thread From: Jiri Slaby @ 2011-01-21 8:35 UTC (permalink / raw) To: Chris Wilson; +Cc: Dave Airlie, Jesse Barnes, linux-kernel, dri-devel On 01/20/2011 06:41 PM, Chris Wilson wrote: > During suspend, Linus found that his machine would hang for 3 seconds, > and identified that intel_ring_buffer_wait() was the culprit: FWIW works for me. From 4s I'm back to: PM: suspend of devices complete after 1087.670 msecs BTW I did this change as there are loops running out of bounds of this array added in 1ec14ad3. --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -52,7 +52,7 @@ struct intel_ring_buffer { u32 irq_seqno; /* last seq seem at irq time */ u32 waiting_seqno; - u32 sync_seqno[I915_NUM_RINGS-1]; + u32 sync_seqno[I915_NUM_RINGS]; atomic_t irq_refcount; bool __must_check (*irq_get)(struct intel_ring_buffer *ring); void (*irq_put)(struct intel_ring_buffer *ring); thanks, -- js ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] drm/i915: Fix use of invalid array size for ring->sync_seqno 2011-01-21 8:35 ` Jiri Slaby @ 2011-01-21 10:11 ` Chris Wilson 2011-01-22 22:24 ` Jiri Slaby 0 siblings, 1 reply; 13+ messages in thread From: Chris Wilson @ 2011-01-21 10:11 UTC (permalink / raw) To: Jiri Slaby Cc: Dave Airlie, Jesse Barnes, linux-kernel, dri-devel, Chris Wilson There are I915_NUM_RINGS-1 inter-ring synchronisation counters, but we were clearing I915_NUM_RINGS of them. Oops. Reported-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> --- Jiri, does this catch the invalid array access? For 3 rings, there should only be 2 sync counters... --- drivers/gpu/drm/i915/i915_gem.c | 2 +- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 3dfc848..812b97b 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1857,7 +1857,7 @@ i915_gem_retire_requests_ring(struct drm_device *dev, seqno = ring->get_seqno(ring); - for (i = 0; i < I915_NUM_RINGS; i++) + for (i = 0; i < ARRAY_SIZE(ring->sync_seqno); i++) if (seqno >= ring->sync_seqno[i]) ring->sync_seqno[i] = 0; diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index dcfdf41..d2f445e 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -1175,7 +1175,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data, goto err; seqno = i915_gem_next_request_seqno(dev, ring); - for (i = 0; i < I915_NUM_RINGS-1; i++) { + for (i = 0; i < ARRAY_SIZE(ring->sync_seqno); i++) { if (seqno < ring->sync_seqno[i]) { /* The GPU can not handle its semaphore value wrapping, * so every billion or so execbuffers, we need to stall -- 1.7.2.3 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] drm/i915: Fix use of invalid array size for ring->sync_seqno 2011-01-21 10:11 ` [PATCH] drm/i915: Fix use of invalid array size for ring->sync_seqno Chris Wilson @ 2011-01-22 22:24 ` Jiri Slaby 0 siblings, 0 replies; 13+ messages in thread From: Jiri Slaby @ 2011-01-22 22:24 UTC (permalink / raw) To: Chris Wilson; +Cc: Dave Airlie, Jesse Barnes, linux-kernel, dri-devel On 01/21/2011 11:11 AM, Chris Wilson wrote: > There are I915_NUM_RINGS-1 inter-ring synchronisation counters, but we > were clearing I915_NUM_RINGS of them. Oops. > > Reported-by: Jiri Slaby <jirislaby@gmail.com> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > --- > > Jiri, does this catch the invalid array access? For 3 rings, there should > only be 2 sync counters... Yup, it's gone. thanks, -- js ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2011-01-22 22:24 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-01-20 2:39 more intel drm issues (was Re: [git pull] drm intel only fixes) Linus Torvalds 2011-01-20 4:55 ` Jeff Chua 2011-01-20 6:22 ` Linus Torvalds 2011-01-20 10:17 ` Jeff Chua 2011-01-20 10:25 ` Chris Wilson 2011-01-20 16:07 ` Linus Torvalds 2011-01-20 17:38 ` Chris Wilson 2011-01-20 17:51 ` Linus Torvalds 2011-01-20 20:44 ` Linus Torvalds 2011-01-20 17:41 ` [PATCH] drm/i915/ringbuffer: Fix use of stale HEAD position whilst polling for space Chris Wilson 2011-01-21 8:35 ` Jiri Slaby 2011-01-21 10:11 ` [PATCH] drm/i915: Fix use of invalid array size for ring->sync_seqno Chris Wilson 2011-01-22 22:24 ` Jiri Slaby
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox