* i915 modeset memory corruption issues? (Fwd: Oops in ext3_block_to_path.isra.40+0x26/0x11b)
@ 2012-03-16 16:11 Linus Torvalds
2012-03-16 16:47 ` Dave Airlie
2012-03-16 22:52 ` Keith Packard
0 siblings, 2 replies; 22+ messages in thread
From: Linus Torvalds @ 2012-03-16 16:11 UTC (permalink / raw)
To: Keith Packard, Dave Airlie, DRI
Guys,
I don't know if these kinds of things have been forwarded to you, but
there's apparently been several things like this going on - with the
finger pointing to the i915 driver apparently clearing random memory.
Often the end result seems to be list corruption or a NULL pointer
dereference in the filesystem layer.
Linus
---------- Forwarded message ----------
From: Jiri Kosina <jkosina@suse.cz>
Date: Fri, Mar 16, 2012 at 8:25 AM
Subject: Re: Oops in ext3_block_to_path.isra.40+0x26/0x11b
To: George Spelvin <linux@horizon.com>
Cc: jack@suse.cz, linux-ext4@vger.kernel.org,
linux-kernel@vger.kernel.org, torvalds@linux-foundation.org
On Fri, 16 Mar 2012, George Spelvin wrote:
> > I am not aware of anything, but I have a question -- George, did the
> > machine get suspended/resumed before this happened?
>
> I'm pretty sure it didn't. It's a desktop machine, and I don't ever
> suspend it. It had been up for a while, and /sys/power/state exists, and
> *maybe* I played with it sometime since boot, but the backup runs nightly
> and I *definitely* did not suspend it the day (or even several days)
> before the oops.
>
> (I tried to preserve the machine state, but processes started getting
> stuck in the kernel a few hours after the report, so I had to reboot it.)
>
> Jan Kara asked:
> > And by any chance, do you use i915 driver? Because that one seems to
> > cause corruption - see: https://lkml.org/lkml/2012/3/9/217. I believe
> > Jiri's corruption is likely caused by that...
>
> Yes! lspci -nn and abbreviated .config attached. But, as mentioned, the machine
> hasn't been suspended...
So it might be the culprit. As the reason of the corruption is not yet
understood, it might be that suspend/resume cycle is not necessary
pre-requisite for this to trigger, it might just make it more likely.
And the corruption is observed to be indeed several writes of 0x00000000,
so it could easily lead to null pointer dereferences all over the place.
Are you able to reproduce the problem if you turn kernel modesetting off?
--
Jiri Kosina
SUSE Labs
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: i915 modeset memory corruption issues? (Fwd: Oops in ext3_block_to_path.isra.40+0x26/0x11b)
2012-03-16 16:11 i915 modeset memory corruption issues? (Fwd: Oops in ext3_block_to_path.isra.40+0x26/0x11b) Linus Torvalds
@ 2012-03-16 16:47 ` Dave Airlie
2012-03-16 23:01 ` Keith Packard
2012-03-16 22:52 ` Keith Packard
1 sibling, 1 reply; 22+ messages in thread
From: Dave Airlie @ 2012-03-16 16:47 UTC (permalink / raw)
To: Linus Torvalds; +Cc: DRI
> Guys,
> I don't know if these kinds of things have been forwarded to you, but
> there's apparently been several things like this going on - with the
> finger pointing to the i915 driver apparently clearing random memory.
> Often the end result seems to be list corruption or a NULL pointer
> dereference in the filesystem layer.
>
I think the hibernate problem isn't this problem.
The hibernate issue is known and I've been hoping someone from Intel
would run with debugging it, they have a big enough team that I don't
feel I can expend the personal time to look into it.
Maybe Keith can push someone or maybe I just refuse pull requests
until one with a fix appears.
The latest thinking on the hibernate issues is kernel one sets up an
fbcon, hibernate restores the old memory and the GTT still points at
the pages,
then something writes to the console and overwrites real memory., just
a working theory, nobody has proven it yet.
Dave.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: i915 modeset memory corruption issues? (Fwd: Oops in ext3_block_to_path.isra.40+0x26/0x11b)
2012-03-16 16:47 ` Dave Airlie
@ 2012-03-16 23:01 ` Keith Packard
2012-03-17 7:41 ` Dave Airlie
0 siblings, 1 reply; 22+ messages in thread
From: Keith Packard @ 2012-03-16 23:01 UTC (permalink / raw)
To: Dave Airlie, Linus Torvalds; +Cc: DRI
<#part sign=pgpmime>
On Fri, 16 Mar 2012 16:47:46 +0000, Dave Airlie <airlied@gmail.com> wrote:
> The hibernate issue is known and I've been hoping someone from Intel
> would run with debugging it, they have a big enough team that I don't
> feel I can expend the personal time to look into it.
Yeah, I've been chatting with a couple of intel folks; we should have a
test patch ready shortly, but we haven't been able to reproduce anything
like this...
> Maybe Keith can push someone or maybe I just refuse pull requests
> until one with a fix appears.
>From what I've seen, this is a problem only on Ironlake machines, which
makes me afraid of some weird GTT flushing issue, given the adventures
we had with VT-d on that hardware where we idle the gpu before any GTT
updates. I wonder what would happen if we idled the GPU before any GTT
updates even when VT-d wasn't running...
> The latest thinking on the hibernate issues is kernel one sets up an
> fbcon, hibernate restores the old memory and the GTT still points at
> the pages,
> then something writes to the console and overwrites real memory., just
> a working theory, nobody has proven it yet.
Presumably the resumed kernel will not be able to write to the console
until the i915 driver is running again, at which point it will have
updated all of the GTT entries. And, presumably the booted kernel won't
be writing to the console after it has loaded the resumed kernel memory?
--
keith.packard@intel.com
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: i915 modeset memory corruption issues? (Fwd: Oops in ext3_block_to_path.isra.40+0x26/0x11b)
2012-03-16 23:01 ` Keith Packard
@ 2012-03-17 7:41 ` Dave Airlie
2012-03-17 18:59 ` Keith Packard
0 siblings, 1 reply; 22+ messages in thread
From: Dave Airlie @ 2012-03-17 7:41 UTC (permalink / raw)
To: Keith Packard; +Cc: Linus Torvalds, DRI
On Fri, Mar 16, 2012 at 11:01 PM, Keith Packard <keithp@keithp.com> wrote:
> <#part sign=pgpmime>
> On Fri, 16 Mar 2012 16:47:46 +0000, Dave Airlie <airlied@gmail.com> wrote:
>
>> The hibernate issue is known and I've been hoping someone from Intel
>> would run with debugging it, they have a big enough team that I don't
>> feel I can expend the personal time to look into it.
>
> Yeah, I've been chatting with a couple of intel folks; we should have a
> test patch ready shortly, but we haven't been able to reproduce anything
> like this...
>
>> Maybe Keith can push someone or maybe I just refuse pull requests
>> until one with a fix appears.
>
> From what I've seen, this is a problem only on Ironlake machines, which
> makes me afraid of some weird GTT flushing issue, given the adventures
> we had with VT-d on that hardware where we idle the gpu before any GTT
> updates. I wonder what would happen if we idled the GPU before any GTT
> updates even when VT-d wasn't running...
We've had reports on 965GM in Fedora, maybe davej has the specific bug.
Dave.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: i915 modeset memory corruption issues? (Fwd: Oops in ext3_block_to_path.isra.40+0x26/0x11b)
2012-03-17 7:41 ` Dave Airlie
@ 2012-03-17 18:59 ` Keith Packard
2012-03-17 19:20 ` Dave Airlie
0 siblings, 1 reply; 22+ messages in thread
From: Keith Packard @ 2012-03-17 18:59 UTC (permalink / raw)
To: Dave Airlie; +Cc: Linus Torvalds, DRI
<#part sign=pgpmime>
On Sat, 17 Mar 2012 07:41:14 +0000, Dave Airlie <airlied@gmail.com> wrote:
> We've had reports on 965GM in Fedora, maybe davej has the specific
> bug.
Are these reports relatively recent, leading to a possible software bug
introduced in the last couple of releases?
--
keith.packard@intel.com
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: i915 modeset memory corruption issues? (Fwd: Oops in ext3_block_to_path.isra.40+0x26/0x11b)
2012-03-17 18:59 ` Keith Packard
@ 2012-03-17 19:20 ` Dave Airlie
2012-03-17 20:23 ` Dave Airlie
0 siblings, 1 reply; 22+ messages in thread
From: Dave Airlie @ 2012-03-17 19:20 UTC (permalink / raw)
To: Keith Packard; +Cc: Linus Torvalds, DRI
On Sat, Mar 17, 2012 at 6:59 PM, Keith Packard <keithp@keithp.com> wrote:
> <#part sign=pgpmime>
> On Sat, 17 Mar 2012 07:41:14 +0000, Dave Airlie <airlied@gmail.com> wrote:
>
>> We've had reports on 965GM in Fedora, maybe davej has the specific
>> bug.
>
> Are these reports relatively recent, leading to a possible software bug
> introduced in the last couple of releases?
No I think they are since 0-day KMS reports.
but its St Patricks day so really not a reliable info source for a
couple of days :-)
Dave.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: i915 modeset memory corruption issues? (Fwd: Oops in ext3_block_to_path.isra.40+0x26/0x11b)
2012-03-17 19:20 ` Dave Airlie
@ 2012-03-17 20:23 ` Dave Airlie
2012-03-17 20:27 ` Dave Airlie
2012-03-17 21:14 ` Linus Torvalds
0 siblings, 2 replies; 22+ messages in thread
From: Dave Airlie @ 2012-03-17 20:23 UTC (permalink / raw)
To: Keith Packard; +Cc: Linus Torvalds, DRI
On Sat, Mar 17, 2012 at 7:20 PM, Dave Airlie <airlied@gmail.com> wrote:
> On Sat, Mar 17, 2012 at 6:59 PM, Keith Packard <keithp@keithp.com> wrote:
>> <#part sign=pgpmime>
>> On Sat, 17 Mar 2012 07:41:14 +0000, Dave Airlie <airlied@gmail.com> wrote:
>>
>>> We've had reports on 965GM in Fedora, maybe davej has the specific
>>> bug.
>>
>> Are these reports relatively recent, leading to a possible software bug
>> introduced in the last couple of releases?
>
> No I think they are since 0-day KMS reports.
>
> but its St Patricks day so really not a reliable info source for a
> couple of days :-)
I did however get a flashback in google to this:
http://lists.fedoraproject.org/pipermail/scm-commits/2010-July/456636.html
Linus don't think we ever did work out why that worked, I wonder if we
lost something after that.
Dave.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: i915 modeset memory corruption issues? (Fwd: Oops in ext3_block_to_path.isra.40+0x26/0x11b)
2012-03-17 20:23 ` Dave Airlie
@ 2012-03-17 20:27 ` Dave Airlie
2012-03-17 21:14 ` Linus Torvalds
1 sibling, 0 replies; 22+ messages in thread
From: Dave Airlie @ 2012-03-17 20:27 UTC (permalink / raw)
To: Keith Packard; +Cc: Linus Torvalds, DRI
Okay found the 965GM report
https://bugzilla.kernel.org/show_bug.cgi?id=37142
Dave.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: i915 modeset memory corruption issues? (Fwd: Oops in ext3_block_to_path.isra.40+0x26/0x11b)
2012-03-17 20:23 ` Dave Airlie
2012-03-17 20:27 ` Dave Airlie
@ 2012-03-17 21:14 ` Linus Torvalds
2012-03-17 22:09 ` Hugh Dickins
2012-03-19 16:16 ` Jerome Glisse
1 sibling, 2 replies; 22+ messages in thread
From: Linus Torvalds @ 2012-03-17 21:14 UTC (permalink / raw)
To: Dave Airlie, Al Viro, Hugh Dickins; +Cc: DRI
On Sat, Mar 17, 2012 at 1:23 PM, Dave Airlie <airlied@gmail.com> wrote:
>
> I did however get a flashback in google to this:
>
> http://lists.fedoraproject.org/pipermail/scm-commits/2010-July/456636.html
>
> Linus don't think we ever did work out why that worked, I wonder if we
> lost something after that.
Hmm. Maybe we should stop making the default gfp_mask be
GFP_HIGHMEM_MOVABLE (see inode_init_always() in fs/inode.c), and
instead add the _MOVABLE flag only for mappings that use
"generic_file_mmap()".
It does sound a bit scary to just make default mappings use MOVABLE
pages, when we know that can be incorrect for some cases.
But that would require that filesystems like ext4 etc that don't just
use the generic_file_mmap() function would have add do that MOVABLE
thing on their own. And the *normal* case is certainly that pages are
movable and putting them in themovable zone should be ok. It's just
*not* ok if you also map them into some hardware GTT thing or just
otherwise keep track of the pages directly, like DRI does.
The other possibility is to just make this a shm thing, since it's
generally that layer that has the case of "pages allocated with the
mapping gfp masks". Hugh Dickins clearly tried to make sure that the
DRM initialization of the gfp mask was honored, but maybe there is
some case where it is missed. Hugh added a mapping_set_gfp_mask() to
i915_gem_alloc_object(), but maybe we have i915 shmem mappings that
get set up through some other path.
What about the ttm swap storage thing, for example? That also does
shmem_file_setup(), without limiting the pages. But I don't think i915
uses ttm, right?
Al? Hugh? Opinions? Right now anybody who uses the
shmem_read_mapping_page() function is in danger of getting a MOVABLE
page by default. Which may or may not be right.
That said, I don't see how i915 would get a movable page. It *seems*
to do the right setup for the gfp flags of the mapping.
Linus
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: i915 modeset memory corruption issues? (Fwd: Oops in ext3_block_to_path.isra.40+0x26/0x11b)
2012-03-17 21:14 ` Linus Torvalds
@ 2012-03-17 22:09 ` Hugh Dickins
2012-03-17 22:52 ` Linus Torvalds
2012-03-19 16:16 ` Jerome Glisse
1 sibling, 1 reply; 22+ messages in thread
From: Hugh Dickins @ 2012-03-17 22:09 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Hugh Dickins, DRI, Al Viro
On Sat, 17 Mar 2012, Linus Torvalds wrote:
> On Sat, Mar 17, 2012 at 1:23 PM, Dave Airlie <airlied@gmail.com> wrote:
> >
> > I did however get a flashback in google to this:
> >
> > http://lists.fedoraproject.org/pipermail/scm-commits/2010-July/456636.html
> >
> > Linus don't think we ever did work out why that worked, I wonder if we
> > lost something after that.
>
> Hmm. Maybe we should stop making the default gfp_mask be
> GFP_HIGHMEM_MOVABLE (see inode_init_always() in fs/inode.c), and
> instead add the _MOVABLE flag only for mappings that use
> "generic_file_mmap()".
>
> It does sound a bit scary to just make default mappings use MOVABLE
> pages, when we know that can be incorrect for some cases.
>
> But that would require that filesystems like ext4 etc that don't just
> use the generic_file_mmap() function would have add do that MOVABLE
> thing on their own. And the *normal* case is certainly that pages are
> movable and putting them in themovable zone should be ok. It's just
> *not* ok if you also map them into some hardware GTT thing or just
> otherwise keep track of the pages directly, like DRI does.
>
> The other possibility is to just make this a shm thing, since it's
> generally that layer that has the case of "pages allocated with the
> mapping gfp masks". Hugh Dickins clearly tried to make sure that the
> DRM initialization of the gfp mask was honored, but maybe there is
> some case where it is missed. Hugh added a mapping_set_gfp_mask() to
> i915_gem_alloc_object(), but maybe we have i915 shmem mappings that
> get set up through some other path.
>
> What about the ttm swap storage thing, for example? That also does
> shmem_file_setup(), without limiting the pages. But I don't think i915
> uses ttm, right?
>
> Al? Hugh? Opinions? Right now anybody who uses the
> shmem_read_mapping_page() function is in danger of getting a MOVABLE
> page by default. Which may or may not be right.
>
> That said, I don't see how i915 would get a movable page. It *seems*
> to do the right setup for the gfp flags of the mapping.
Yes, i915_gem_alloc_object() does its own mapping_set_gfp_mask(mapping,
GFP_HIGHUSER | __GFP_RECLAIMABLE), which should be as good as ever.
I've got to go out for an hour: I'll digest more and think more about
this when I get back. If someone could explain the original problem
with _MOVABLE, that would help me: I didn't see an explanation in the
patch or in the bugzilla. I would expect using _MOVABLE for i915_gem
would render a block of otherwise movable pages immovable because of
the GEM pages pinned, but I wouldn't expect them to move in mysterious
ways - or are references held to the pages without them being pinned??.
A factor which might turn out to be relevant: swapin_readahead() (or
swapoff) brings in pages from swap before it knows who they belong to,
so the swapped-in swapcache pages don't necessarily have the limitations
mapping_set_gfp_mask() has asked for. It's been discussed with Alan,
we know it will need fixing for gma500 when eventually that comes to
be used on machines with more than 4GB, but for now I wasn't aware of
a problem.
Hugh
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: i915 modeset memory corruption issues? (Fwd: Oops in ext3_block_to_path.isra.40+0x26/0x11b)
2012-03-17 22:09 ` Hugh Dickins
@ 2012-03-17 22:52 ` Linus Torvalds
2012-03-18 0:43 ` Keith Packard
2012-03-19 17:51 ` Hugh Dickins
0 siblings, 2 replies; 22+ messages in thread
From: Linus Torvalds @ 2012-03-17 22:52 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Al Viro, DRI
On Sat, Mar 17, 2012 at 3:09 PM, Hugh Dickins <hughd@google.com> wrote:
>
> I've got to go out for an hour: I'll digest more and think more about
> this when I get back. If someone could explain the original problem
> with _MOVABLE, that would help me:
I do not believe we actually ever uncovered the original problem with
_MOVABLE: the problem was bisected down to the stable-backported
version of commit 4bdadb978569 ("drm/i915: Selectively enable
self-reclaim"), and I looked at the changes and decided that one of
the main ones was the removal of the mapping_set_gfp_mask() - which
resulted in __GFP_MOVABLE being on for the mapping.
So we just tried re-introducing that, and it fixed the problem.
Exactly *why* movable pages are a problem was never all that clear.
The assumption was (and I guess still is) that there are physical
pointers to the pages in the DRM pages themselves. Quoting from the
original thread:
"I could easily see that something would get very unhappy and corrupt
memory if the suspend-to-disk phase ends up compacting memory and
moving the pages around from under the i915 driver."
but I didn't actually see why the i915 page pinning would be defeated
by __GFP_MOVABLE. The code does get a reference to them afaik.
So no, we don't have a real first cause. We just have a band-aid for
the symptom. It might be a pure bug in hibernation that just assumes
that any pages in MOVABLE zones can always be moved without even
checking any refcounts, so not using __GFP_MOVABLE might just be
working around problems elsewhere.
Or it's entirely possible that the DRM layer does *not* increment the
refcounts of the pages properly, and that hibernate with movable pages
is just a very efficient way of showing the problems that results in.
> A factor which might turn out to be relevant: swapin_readahead() (or
> swapoff) brings in pages from swap before it knows who they belong to,
> so the swapped-in swapcache pages don't necessarily have the limitations
> mapping_set_gfp_mask() has asked for. It's been discussed with Alan,
> we know it will need fixing for gma500 when eventually that comes to
> be used on machines with more than 4GB, but for now I wasn't aware of
> a problem.
So i915 shouldn't much care about highmem vs non-highmem, it's just
that it does hold on to pages and take their physical address. But as
far as I can tell, whenever the physical address is in use, it does
have a refcount to the page.
So for example, i915_gem_object_get_pages_gtt() will use
shmem_read_mapping_page_gfp() which will increment the page count for
the page it gets, so all the obj->pages[] entries should have properly
incremented page counts. And they get released by
i915_gem_object_put_pages_gtt(), but maybe that is called too early
while the pages are still in use by the GFX unit...
If there is a refcounting issue, hibernate + __GFP_MOVABLE might just
be a great way to see it.
I don't really know the code, though.
Linus
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: i915 modeset memory corruption issues? (Fwd: Oops in ext3_block_to_path.isra.40+0x26/0x11b)
2012-03-17 22:52 ` Linus Torvalds
@ 2012-03-18 0:43 ` Keith Packard
2012-03-18 1:44 ` Hugh Dickins
2012-03-19 17:51 ` Hugh Dickins
1 sibling, 1 reply; 22+ messages in thread
From: Keith Packard @ 2012-03-18 0:43 UTC (permalink / raw)
To: Linus Torvalds, Hugh Dickins; +Cc: Al Viro, DRI
<#part sign=pgpmime>
On Sat, 17 Mar 2012 15:52:15 -0700, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> I do not believe we actually ever uncovered the original problem with
> _MOVABLE: the problem was bisected down to the stable-backported
> version of commit 4bdadb978569 ("drm/i915: Selectively enable
> self-reclaim"), and I looked at the changes and decided that one of
> the main ones was the removal of the mapping_set_gfp_mask() - which
> resulted in __GFP_MOVABLE being on for the mapping.
Can anyone explain what __GFP_MOVABLE even does? I can't understand what
this flag would be for; if the page is locked (with get_page), then the
page cannot move. If it isn't locked, then it's subject to swapping, in
which case the page will almost certainly move when it returns from
disk. Is it that the page won't move if it isn't swapped? That doesn't
seem all that useful to me.
> but I didn't actually see why the i915 page pinning would be defeated
> by __GFP_MOVABLE. The code does get a reference to them afaik.
GTT mapping and page locking are done in lock-step in the driver:
i915_gem_object_bind_to_gtt
i915_gem_object_get_pages_gtt
pins the pages
i915_gem_gtt_bind_object
maps to GTT
i915_gem_object_unbind
i915_gem_gtt_unbind_object
unmaps from GTT
i915_gem_object_put_pages_gtt
unpins the pages.
There are no other code paths to unmapping objects from the GTT or
unpinning the pages that I can find.
> So for example, i915_gem_object_get_pages_gtt() will use
> shmem_read_mapping_page_gfp() which will increment the page count for
> the page it gets, so all the obj->pages[] entries should have properly
> incremented page counts. And they get released by
> i915_gem_object_put_pages_gtt(), but maybe that is called too early
> while the pages are still in use by the GFX unit...
This seems the most likely problem -- there are so many caches and
buffers involved. However, we're seeing troubles on hibernate resume, at
which point there isn't any acceleration going on, just two fbdev
drivers poking the hardware. Which really reduces the complexity quite a
bit; it's just CPU reads/writes through the GTT aperture created for the
two console frame buffers. That makes this an interesting place to look
for trouble; we can ignore vast areas within the driver that deal with
acceleration, at least for this case.
--
keith.packard@intel.com
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: i915 modeset memory corruption issues? (Fwd: Oops in ext3_block_to_path.isra.40+0x26/0x11b)
2012-03-18 0:43 ` Keith Packard
@ 2012-03-18 1:44 ` Hugh Dickins
2012-03-18 3:42 ` Keith Packard
0 siblings, 1 reply; 22+ messages in thread
From: Hugh Dickins @ 2012-03-18 1:44 UTC (permalink / raw)
To: Keith Packard; +Cc: Linus Torvalds, Al Viro, DRI
On Sat, 17 Mar 2012, Keith Packard wrote:
> On Sat, 17 Mar 2012 15:52:15 -0700, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> > I do not believe we actually ever uncovered the original problem with
> > _MOVABLE: the problem was bisected down to the stable-backported
> > version of commit 4bdadb978569 ("drm/i915: Selectively enable
> > self-reclaim"), and I looked at the changes and decided that one of
> > the main ones was the removal of the mapping_set_gfp_mask() - which
> > resulted in __GFP_MOVABLE being on for the mapping.
>
> Can anyone explain what __GFP_MOVABLE even does? I can't understand what
> this flag would be for; if the page is locked (with get_page), then the
> page cannot move. If it isn't locked, then it's subject to swapping, in
> which case the page will almost certainly move when it returns from
> disk. Is it that the page won't move if it isn't swapped? That doesn't
> seem all that useful to me.
__GFP_MOVABLE is a hint to page allocation that there's a good likelihood
that this logical page can be migrated elsewhere in physical memory later
on if mm wants, so it's a good idea to allocate it from a physical area of
similarly MOVABLE pages; then if later on someone wants a large contiguous
area for something (or wants to hot-unplug that memory), it should be easy
to clear the whole area out, moving existing pages elsewhere. (I think
that's right: several questions come to me as I write it, but now is not
the time to research all those details.) Page migration can only be done
later if it can account for all of page_count(page).
>
> > but I didn't actually see why the i915 page pinning would be defeated
> > by __GFP_MOVABLE. The code does get a reference to them afaik.
>
> GTT mapping and page locking are done in lock-step in the driver:
>
> i915_gem_object_bind_to_gtt
> i915_gem_object_get_pages_gtt
> pins the pages
> i915_gem_gtt_bind_object
> maps to GTT
>
> i915_gem_object_unbind
> i915_gem_gtt_unbind_object
> unmaps from GTT
> i915_gem_object_put_pages_gtt
> unpins the pages.
>
> There are no other code paths to unmapping objects from the GTT or
> unpinning the pages that I can find.
>
> > So for example, i915_gem_object_get_pages_gtt() will use
> > shmem_read_mapping_page_gfp() which will increment the page count for
> > the page it gets, so all the obj->pages[] entries should have properly
> > incremented page counts. And they get released by
> > i915_gem_object_put_pages_gtt(), but maybe that is called too early
> > while the pages are still in use by the GFX unit...
>
> This seems the most likely problem -- there are so many caches and
> buffers involved. However, we're seeing troubles on hibernate resume, at
> which point there isn't any acceleration going on, just two fbdev
> drivers poking the hardware. Which really reduces the complexity quite a
> bit; it's just CPU reads/writes through the GTT aperture created for the
> two console frame buffers. That makes this an interesting place to look
> for trouble; we can ignore vast areas within the driver that deal with
> acceleration, at least for this case.
I keep worrying about the sequence when the machine is powered on again
after hibernation: can i915 get up to anything before it is resumed from
the hibernation image? Get to use certain pages at that stage, then
continue to poke at them after the hibernation image is restored (which
changes the story of what pages are free and what are used for what):
lacking some kind of flush?
Hugh
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: i915 modeset memory corruption issues? (Fwd: Oops in ext3_block_to_path.isra.40+0x26/0x11b)
2012-03-18 1:44 ` Hugh Dickins
@ 2012-03-18 3:42 ` Keith Packard
2012-03-18 5:43 ` Hugh Dickins
0 siblings, 1 reply; 22+ messages in thread
From: Keith Packard @ 2012-03-18 3:42 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Linus Torvalds, Al Viro, DRI
<#part sign=pgpmime>
On Sat, 17 Mar 2012 18:44:18 -0700 (PDT), Hugh Dickins <hughd@google.com> wrote:
> __GFP_MOVABLE is a hint to page allocation that there's a good likelihood
> that this logical page can be migrated elsewhere in physical memory later
> on if mm wants, so it's a good idea to allocate it from a physical area of
> similarly MOVABLE pages;
So, allocating things with __GFP_MOVABLE may just change where in memory
the graphics pages get allocated, moving whatever GPU-inspired damage to
less sensitive bits of the kernel data. Sounds fabulous!
> I keep worrying about the sequence when the machine is powered on again
> after hibernation: can i915 get up to anything before it is resumed from
> the hibernation image?
Well, the frame buffer is presumably still using whatever mapping it had
before suspend occurred; is there any way it could be writing through
that before the graphics driver was resumed?
What I don't understand is the relationship between the boot kernel and
the resumed kernel; when does the boot kernel stop writing to the
console, and how does it hand off control of the frame buffer at that
time.
It would be great if we could separate out the boot kernel access to the
graphics system from the resumed system -- if the boot kernel was run
without the i915 driver loaded at all, and just used VGA text mode, then
any damage as a result of resume wouldn't be caused by the boot kernel
GTT mappings getting used at the wrong time.
--
keith.packard@intel.com
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: i915 modeset memory corruption issues? (Fwd: Oops in ext3_block_to_path.isra.40+0x26/0x11b)
2012-03-18 3:42 ` Keith Packard
@ 2012-03-18 5:43 ` Hugh Dickins
2012-03-18 11:55 ` Rafael J. Wysocki
0 siblings, 1 reply; 22+ messages in thread
From: Hugh Dickins @ 2012-03-18 5:43 UTC (permalink / raw)
To: Keith Packard; +Cc: Linus Torvalds, DRI, Rafael J. Wysocki, Al Viro
Added Rafael to the Cc: Rafael, we're pondering over one or more of these
recurrent threads about corruption after resume, seemingly related to i915.
On Sat, 17 Mar 2012, Keith Packard wrote:
> On Sat, 17 Mar 2012 18:44:18 -0700 (PDT), Hugh Dickins <hughd@google.com> wrote:
> > I keep worrying about the sequence when the machine is powered on again
> > after hibernation: can i915 get up to anything before it is resumed from
> > the hibernation image?
>
> Well, the frame buffer is presumably still using whatever mapping it had
> before suspend occurred; is there any way it could be writing through
> that before the graphics driver was resumed?
It's hibernation restore here, so I don't think it could be using the
mapping from before hibernation until after resuming from hibernation
snapshot: it would be using the rebooting kernel's mapping until then.
>
> What I don't understand is the relationship between the boot kernel and
> the resumed kernel; when does the boot kernel stop writing to the
> console, and how does it hand off control of the frame buffer at that
> time.
I believe the handoff point comes in the late initcall software_resume():
which loads the image and calls hibernation_restore -> resume_target_kernel
-> swsusp_arch_resume, which emerges into the restored hibernation image.
As a late initcall, I imagine some work has already been done via the
framebuffer, but I have no conception of what kind of mappings that
involves (would shmem objects come into it at all? and is that even
a relevant question, could enough damage be done without them?), nor
whether they're properly torn down before emerging into the hibernimage.
>
> It would be great if we could separate out the boot kernel access to the
> graphics system from the resumed system -- if the boot kernel was run
> without the i915 driver loaded at all, and just used VGA text mode, then
> any damage as a result of resume wouldn't be caused by the boot kernel
> GTT mappings getting used at the wrong time.
But you're giving my worry more credence than it deserves there:
we don't have any evidence that this is where the problem lies,
that's just a suspicion of mine at the moment.
Hugh
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: i915 modeset memory corruption issues? (Fwd: Oops in ext3_block_to_path.isra.40+0x26/0x11b)
2012-03-18 5:43 ` Hugh Dickins
@ 2012-03-18 11:55 ` Rafael J. Wysocki
2012-03-19 17:58 ` Hugh Dickins
0 siblings, 1 reply; 22+ messages in thread
From: Rafael J. Wysocki @ 2012-03-18 11:55 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Linus Torvalds, DRI, Al Viro
On Sunday, March 18, 2012, Hugh Dickins wrote:
> Added Rafael to the Cc: Rafael, we're pondering over one or more of these
> recurrent threads about corruption after resume, seemingly related to i915.
Thanks for letting me know. :-)
I actually have a confirmation that the issue isn't present if the KMS is
not used.
> On Sat, 17 Mar 2012, Keith Packard wrote:
> > On Sat, 17 Mar 2012 18:44:18 -0700 (PDT), Hugh Dickins <hughd@google.com> wrote:
> > > I keep worrying about the sequence when the machine is powered on again
> > > after hibernation: can i915 get up to anything before it is resumed from
> > > the hibernation image?
> >
> > Well, the frame buffer is presumably still using whatever mapping it had
> > before suspend occurred; is there any way it could be writing through
> > that before the graphics driver was resumed?
>
> It's hibernation restore here, so I don't think it could be using the
> mapping from before hibernation until after resuming from hibernation
> snapshot: it would be using the rebooting kernel's mapping until then.
That's correct.
> > What I don't understand is the relationship between the boot kernel and
> > the resumed kernel; when does the boot kernel stop writing to the
> > console, and how does it hand off control of the frame buffer at that
> > time.
>
> I believe the handoff point comes in the late initcall software_resume():
> which loads the image and calls hibernation_restore -> resume_target_kernel
> -> swsusp_arch_resume, which emerges into the restored hibernation image.
That's correct too. It may, however, be done through the
SNAPSHOT_ATOMIC_RESTORE ioctl in kernel/power/user.c, that calls
hibernation_restore() directly, too.
In either case, hibernation_restore() calls suspend_console() which is the
point the console is supposed to be left alone at.
Later, it calls dpm_suspend_start() that executes .freeze() callbacks of
all device drivers and it calls resume_target_kernel() that executes the
driver's .freeze_noirq() callbacks. If that goes well, swsusp_arch_resume()
is run that jumps into the image kernel and then it starts over in
create_image() at the restore_processor_state() call.
> As a late initcall, I imagine some work has already been done via the
> framebuffer, but I have no conception of what kind of mappings that
> involves (would shmem objects come into it at all? and is that even
> a relevant question, could enough damage be done without them?), nor
> whether they're properly torn down before emerging into the hibernimage.
During resume from hibernation we're very careful to restore the entire
contents of RAM and to switch the CPU (there's only one of them executing
code at that point) to the pre-hibernation page tables. As a result, all
of the CPU's memory mappings will be the same as before the hibernation
when that's been completed. However, I'm not sure what the contents of
the graphics' registers is at this point and whether or not it may possibly
access "wrong" memory regions through DMA.
> > It would be great if we could separate out the boot kernel access to the
> > graphics system from the resumed system -- if the boot kernel was run
> > without the i915 driver loaded at all, and just used VGA text mode, then
> > any damage as a result of resume wouldn't be caused by the boot kernel
> > GTT mappings getting used at the wrong time.
That shouldn't be very difficult to verify, if i915 is built as a module and
is not loaded until software_resume() is run.
> But you're giving my worry more credence than it deserves there:
> we don't have any evidence that this is where the problem lies,
> that's just a suspicion of mine at the moment.
Well, pretty much the only explanation of the observed symptoms I can imagine
is some kind of "leakage" of the boot kernel's memory mappings through the
graphics adapter into the post-restore system.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: i915 modeset memory corruption issues? (Fwd: Oops in ext3_block_to_path.isra.40+0x26/0x11b)
2012-03-18 11:55 ` Rafael J. Wysocki
@ 2012-03-19 17:58 ` Hugh Dickins
0 siblings, 0 replies; 22+ messages in thread
From: Hugh Dickins @ 2012-03-19 17:58 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: Linus Torvalds, DRI, Al Viro
On Sun, 18 Mar 2012, Rafael J. Wysocki wrote:
> On Sunday, March 18, 2012, Hugh Dickins wrote:
> > On Sat, 17 Mar 2012, Keith Packard wrote:
> > > On Sat, 17 Mar 2012 18:44:18 -0700 (PDT), Hugh Dickins <hughd@google.com> wrote:
> > > > I keep worrying about the sequence when the machine is powered on again
> > > > after hibernation: can i915 get up to anything before it is resumed from
> > > > the hibernation image?
...
>
> Well, pretty much the only explanation of the observed symptoms I can imagine
> is some kind of "leakage" of the boot kernel's memory mappings through the
> graphics adapter into the post-restore system.
Thanks for your detailed confirmations, Rafael. I see from other threads
that pretty much everyone has now converged upon this hypothesis - which
perhaps I read first elsewhere, forgot, and then thought it my idea!
I'll leave it to the i915 experts: good luck in tracking it down.
Hugh
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: i915 modeset memory corruption issues? (Fwd: Oops in ext3_block_to_path.isra.40+0x26/0x11b)
2012-03-17 22:52 ` Linus Torvalds
2012-03-18 0:43 ` Keith Packard
@ 2012-03-19 17:51 ` Hugh Dickins
1 sibling, 0 replies; 22+ messages in thread
From: Hugh Dickins @ 2012-03-19 17:51 UTC (permalink / raw)
To: Linus Torvalds; +Cc: DRI, Rafael J. Wysocki, Al Viro
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1639 bytes --]
On Sat, 17 Mar 2012, Linus Torvalds wrote:
> On Sat, Mar 17, 2012 at 3:09 PM, Hugh Dickins <hughd@google.com> wrote:
> >
> > I've got to go out for an hour: I'll digest more and think more about
> > this when I get back. If someone could explain the original problem
> > with _MOVABLE, that would help me:
>
> I do not believe we actually ever uncovered the original problem with
> _MOVABLE: the problem was bisected down to the stable-backported
> version of commit 4bdadb978569 ("drm/i915: Selectively enable
> self-reclaim"), and I looked at the changes and decided that one of
> the main ones was the removal of the mapping_set_gfp_mask() - which
> resulted in __GFP_MOVABLE being on for the mapping.
>
> So we just tried re-introducing that, and it fixed the problem.
>
> Exactly *why* movable pages are a problem was never all that clear.
> ...
Thanks a lot for your considered and detailed reply on this
(before we added Rafael).
I've come to the (not entirely firm) conclusion that the addition and
removal of __GFP_MOVABLE was just shifting the shmem object allocations
from one block of memory (shared with others not using __GFP_MOVABLE)
to another (shared with others using __GFP_MOVABLE) and back.
So when the __GFP_MOVABLE inadvertently went in, a new group of users
who hadn't noticed the corruption before, now reported it; and when you
removed the __GFP_MOVABLE (a good idea anyway, pinned pages just clogging
up an otherwise movable block of memory), that group of users found their
problem solved.
Not really a problem from i915 specifying __GFP_MOVABLE on shmem objects.
Hugh
[-- Attachment #2: Type: text/plain, Size: 159 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: i915 modeset memory corruption issues? (Fwd: Oops in ext3_block_to_path.isra.40+0x26/0x11b)
2012-03-17 21:14 ` Linus Torvalds
2012-03-17 22:09 ` Hugh Dickins
@ 2012-03-19 16:16 ` Jerome Glisse
1 sibling, 0 replies; 22+ messages in thread
From: Jerome Glisse @ 2012-03-19 16:16 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Hugh, Dickins, Al Viro, DRI
On Sat, 2012-03-17 at 14:14 -0700, Linus Torvalds wrote:
> On Sat, Mar 17, 2012 at 1:23 PM, Dave Airlie <airlied@gmail.com> wrote:
> >
> > I did however get a flashback in google to this:
> >
> > http://lists.fedoraproject.org/pipermail/scm-commits/2010-July/456636.html
> >
> > Linus don't think we ever did work out why that worked, I wonder if we
> > lost something after that.
>
> Hmm. Maybe we should stop making the default gfp_mask be
> GFP_HIGHMEM_MOVABLE (see inode_init_always() in fs/inode.c), and
> instead add the _MOVABLE flag only for mappings that use
> "generic_file_mmap()".
>
> It does sound a bit scary to just make default mappings use MOVABLE
> pages, when we know that can be incorrect for some cases.
>
> But that would require that filesystems like ext4 etc that don't just
> use the generic_file_mmap() function would have add do that MOVABLE
> thing on their own. And the *normal* case is certainly that pages are
> movable and putting them in themovable zone should be ok. It's just
> *not* ok if you also map them into some hardware GTT thing or just
> otherwise keep track of the pages directly, like DRI does.
>
> The other possibility is to just make this a shm thing, since it's
> generally that layer that has the case of "pages allocated with the
> mapping gfp masks". Hugh Dickins clearly tried to make sure that the
> DRM initialization of the gfp mask was honored, but maybe there is
> some case where it is missed. Hugh added a mapping_set_gfp_mask() to
> i915_gem_alloc_object(), but maybe we have i915 shmem mappings that
> get set up through some other path.
>
> What about the ttm swap storage thing, for example? That also does
> shmem_file_setup(), without limiting the pages. But I don't think i915
> uses ttm, right?
It's not an issue for ttm, as we never us page allocated through shmem
for the hw. We use shmem to swapout buffer object (ttm_tt_swapout func
in ttm_tt.c).
Cheers,
Jerome
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: i915 modeset memory corruption issues? (Fwd: Oops in ext3_block_to_path.isra.40+0x26/0x11b)
2012-03-16 16:11 i915 modeset memory corruption issues? (Fwd: Oops in ext3_block_to_path.isra.40+0x26/0x11b) Linus Torvalds
2012-03-16 16:47 ` Dave Airlie
@ 2012-03-16 22:52 ` Keith Packard
2012-03-16 23:24 ` Linus Torvalds
1 sibling, 1 reply; 22+ messages in thread
From: Keith Packard @ 2012-03-16 22:52 UTC (permalink / raw)
To: Linus Torvalds, Dave Airlie, DRI
<#part sign=pgpmime>
On Fri, 16 Mar 2012 09:11:54 -0700, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> I don't know if these kinds of things have been forwarded to you, but
> there's apparently been several things like this going on - with the
> finger pointing to the i915 driver apparently clearing random memory.
> Often the end result seems to be list corruption or a NULL pointer
> dereference in the filesystem layer.
Yikes! I've been participating in the hibernation thread, but
I hadn't heard about this other random memory corruption.
We had a theory about hibernation -- unflushed FB writes that go astray
when the pages underneath the GTT get reassigned when switching from the
boot kernel to the resumed kernel.
Something similar could be happening at other times though? A graphics
page is freed with writes outstanding, the GTT entry is cleared and the
page allocated to something in the filesystem, but before the GTT entry
update is recognized by the GPU, and before pending writes are flushed,
the data gets written through the GTT to the middle of the file system
structure. That stretches the bounds of credibility though; you'd have
to have unflushed data *and* an unflushed GTT write, neither of which is
supposed to happen.
> So it might be the culprit. As the reason of the corruption is not yet
> understood, it might be that suspend/resume cycle is not necessary
> pre-requisite for this to trigger, it might just make it more likely.
That opens up a much broader scope of code that might contain problems...
--
keith.packard@intel.com
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: i915 modeset memory corruption issues? (Fwd: Oops in ext3_block_to_path.isra.40+0x26/0x11b)
2012-03-16 22:52 ` Keith Packard
@ 2012-03-16 23:24 ` Linus Torvalds
2012-03-17 2:55 ` Keith Packard
0 siblings, 1 reply; 22+ messages in thread
From: Linus Torvalds @ 2012-03-16 23:24 UTC (permalink / raw)
To: Keith Packard; +Cc: DRI
On Fri, Mar 16, 2012 at 3:52 PM, Keith Packard <keithp@keithp.com> wrote:
>
> We had a theory about hibernation -- unflushed FB writes that go astray
> when the pages underneath the GTT get reassigned when switching from the
> boot kernel to the resumed kernel.
Well, even without hibernation, one theory was about the VC switching
- apparently that was one thing that at least one reporter does end up
doing - switching from X into the virtual terminals and back.
That may have similar issues, and may be the deeper problem (do people
still do the VC switching *for* hibernation?)
Maybe.
Linus
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: i915 modeset memory corruption issues? (Fwd: Oops in ext3_block_to_path.isra.40+0x26/0x11b)
2012-03-16 23:24 ` Linus Torvalds
@ 2012-03-17 2:55 ` Keith Packard
0 siblings, 0 replies; 22+ messages in thread
From: Keith Packard @ 2012-03-17 2:55 UTC (permalink / raw)
To: Linus Torvalds; +Cc: DRI
<#part sign=pgpmime>
On Fri, 16 Mar 2012 16:24:35 -0700, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> Well, even without hibernation, one theory was about the VC switching
> - apparently that was one thing that at least one reporter does end up
> doing - switching from X into the virtual terminals and back.
Thanks for the extra info; it doesn't bring anything to mind as we just
don't do that much when VT-switching anymore. But, yet another code path
to explore.
> That may have similar issues, and may be the deeper problem (do people
> still do the VC switching *for* hibernation?)
I haven't a clue -- I haven't tried hibernating a machine in years...
--
keith.packard@intel.com
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2012-03-19 17:59 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-16 16:11 i915 modeset memory corruption issues? (Fwd: Oops in ext3_block_to_path.isra.40+0x26/0x11b) Linus Torvalds
2012-03-16 16:47 ` Dave Airlie
2012-03-16 23:01 ` Keith Packard
2012-03-17 7:41 ` Dave Airlie
2012-03-17 18:59 ` Keith Packard
2012-03-17 19:20 ` Dave Airlie
2012-03-17 20:23 ` Dave Airlie
2012-03-17 20:27 ` Dave Airlie
2012-03-17 21:14 ` Linus Torvalds
2012-03-17 22:09 ` Hugh Dickins
2012-03-17 22:52 ` Linus Torvalds
2012-03-18 0:43 ` Keith Packard
2012-03-18 1:44 ` Hugh Dickins
2012-03-18 3:42 ` Keith Packard
2012-03-18 5:43 ` Hugh Dickins
2012-03-18 11:55 ` Rafael J. Wysocki
2012-03-19 17:58 ` Hugh Dickins
2012-03-19 17:51 ` Hugh Dickins
2012-03-19 16:16 ` Jerome Glisse
2012-03-16 22:52 ` Keith Packard
2012-03-16 23:24 ` Linus Torvalds
2012-03-17 2:55 ` Keith Packard
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.