public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] - deleting huge pages
@ 2004-05-02 12:30 Jack Steiner
  2004-05-02 18:33 ` Chen, Kenneth W
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Jack Steiner @ 2004-05-02 12:30 UTC (permalink / raw)
  To: linux-ia64


I found this problem in 2.4,21, but AFAICT, the same problem
exists in 2.6.5.

If you attempt to allocate a LOT more huge pages than are physically available,
the kernel may reference invalid PGDs or PMDs. 

Here is the 2.4 backtrace of a failure. If the mmap fails, do_mmap_pgoff attempts to
unmap the vma range it was mapping. Depending on where it failed during
the mmap, some of the higher level PGD/PMDs may not have been assigned.

The bug (at least in 2.4) exists on all platforms but on our platform
attempts to dereference NULL pointers usually cause MCAs. (If a platform
has zeros in page 0, you may be lucky & the code would appear to work,
but it is still a bug).

	Stack traceback for pid 6817
	0xe00025307ba50000     6817     6663  0  148   D  0xe00025307ba50420  toy
	0xe00000000445e180 unmap_hugepage_range+0x160  << mca surfaced here
	0xe00000000445e300 zap_hugepage_range+0x80
	0xe00000000452dbc0 do_mmap_pgoff+0xea0
	0xe000000004432910 sys_mmap+0x210
	0xe00000000440e2a0 ia64_ret_from_syscall

The MCA was caused by the NULL pmd dereference in huge_pte_offset. The
MCA doesnt surface until the bad data is consumed.

A patch against 2.6.5:



Index: linux/arch/ia64/mm/hugetlbpage.c
=================================--- linux.orig/arch/ia64/mm/hugetlbpage.c	2004-05-01 20:51:52.000000000 -0500
+++ linux/arch/ia64/mm/hugetlbpage.c	2004-05-01 20:51:54.000000000 -0500
@@ -111,9 +111,16 @@
 	pte_t *pte = NULL;
 
 	pgd = pgd_offset(mm, taddr);
+	if (pgd_none(*pgd) || pgd_bad(*pgd))
+		goto out;
 	pmd = pmd_offset(pgd, taddr);
+	if (pmd_none(*pmd) || pmd_bad(*pmd))
+		goto out;
 	pte = pte_offset_map(pmd, taddr);
 	return pte;
+
+out:
+	return 0;
 }
 
 #define mk_pte_huge(entry) { pte_val(entry) |= _PAGE_P; }
@@ -331,7 +338,7 @@
 
 	for (address = start; address < end; address += HPAGE_SIZE) {
 		pte = huge_pte_offset(mm, address);
-		if (pte_none(*pte))
+		if (!pte || pte_none(*pte))
 			continue;
 		page = pte_page(*pte);
 		huge_page_release(page);
-- 
Thanks

Jack Steiner (steiner@sgi.com)          651-683-5302
Principal Engineer                      SGI - Silicon Graphics, Inc.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH] - deleting huge pages
  2004-05-02 12:30 [PATCH] - deleting huge pages Jack Steiner
@ 2004-05-02 18:33 ` Chen, Kenneth W
  2004-05-03 14:53 ` Jack Steiner
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Chen, Kenneth W @ 2004-05-02 18:33 UTC (permalink / raw)
  To: linux-ia64

>>>> Jack Steiner wrote on Sunday, May 02, 2004 5:30 AM
> I found this problem in 2.4,21, but AFAICT, the same problem
> exists in 2.6.5.
>
> If you attempt to allocate a LOT more huge pages than are physically
> available, the kernel may reference invalid PGDs or PMDs.
>
> Here is the 2.4 backtrace of a failure. If the mmap fails, do_mmap_pgoff
> attempts to unmap the vma range it was mapping. Depending on where it failed
> during the mmap, some of the higher level PGD/PMDs may not have been assigned.
>
> The bug (at least in 2.4) exists on all platforms but on our platform
> attempts to dereference NULL pointers usually cause MCAs. (If a platform
> has zeros in page 0, you may be lucky & the code would appear to work,
> but it is still a bug).
>
> The MCA was caused by the NULL pmd dereference in huge_pte_offset. The
> MCA doesnt surface until the bad data is consumed.
>
> A patch against 2.6.5:

Recent work on cleaning up hugepage_vma has at least one hunk covered here.
http://linux.bkbits.net:8080/linux-2.5/cset@40842336E3nkJ7cWJ0-3zQ7yP4WbHg

- Ken



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] - deleting huge pages
  2004-05-02 12:30 [PATCH] - deleting huge pages Jack Steiner
  2004-05-02 18:33 ` Chen, Kenneth W
@ 2004-05-03 14:53 ` Jack Steiner
  2004-05-03 17:12 ` Chen, Kenneth W
  2004-05-03 19:47 ` Jack Steiner
  3 siblings, 0 replies; 5+ messages in thread
From: Jack Steiner @ 2004-05-03 14:53 UTC (permalink / raw)
  To: linux-ia64

On Sun, May 02, 2004 at 11:33:33AM -0700, Chen, Kenneth W wrote:
> >>>> Jack Steiner wrote on Sunday, May 02, 2004 5:30 AM
> > I found this problem in 2.4,21, but AFAICT, the same problem
> > exists in 2.6.5.
> >
> > If you attempt to allocate a LOT more huge pages than are physically
> > available, the kernel may reference invalid PGDs or PMDs.
> >
> > Here is the 2.4 backtrace of a failure. If the mmap fails, do_mmap_pgoff
> > attempts to unmap the vma range it was mapping. Depending on where it failed
> > during the mmap, some of the higher level PGD/PMDs may not have been assigned.
> >
> > The bug (at least in 2.4) exists on all platforms but on our platform
> > attempts to dereference NULL pointers usually cause MCAs. (If a platform
> > has zeros in page 0, you may be lucky & the code would appear to work,
> > but it is still a bug).
> >
> > The MCA was caused by the NULL pmd dereference in huge_pte_offset. The
> > MCA doesnt surface until the bad data is consumed.
> >
> > A patch against 2.6.5:
> 
> Recent work on cleaning up hugepage_vma has at least one hunk covered here.
> http://linux.bkbits.net:8080/linux-2.5/cset@40842336E3nkJ7cWJ0-3zQ7yP4WbHg
> 
> - Ken
> 

Yep... Looks like the same problem has been fixed by David Gibson. 
Ignore my patch.



-- 
Thanks

Jack Steiner (steiner@sgi.com)          651-683-5302
Principal Engineer                      SGI - Silicon Graphics, Inc.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH] - deleting huge pages
  2004-05-02 12:30 [PATCH] - deleting huge pages Jack Steiner
  2004-05-02 18:33 ` Chen, Kenneth W
  2004-05-03 14:53 ` Jack Steiner
@ 2004-05-03 17:12 ` Chen, Kenneth W
  2004-05-03 19:47 ` Jack Steiner
  3 siblings, 0 replies; 5+ messages in thread
From: Chen, Kenneth W @ 2004-05-03 17:12 UTC (permalink / raw)
  To: linux-ia64

>>>>> Jack Steiner wrote on Monday, May 03, 2004 7:53 AM
> > > I found this problem in 2.4,21, but AFAICT, the same problem
> > > exists in 2.6.5.
> > >
> > > If you attempt to allocate a LOT more huge pages than are physically
> > > available, the kernel may reference invalid PGDs or PMDs.
> > >
> > > Here is the 2.4 backtrace of a failure. If the mmap fails, do_mmap_pgoff
> > > attempts to unmap the vma range it was mapping. Depending on where it failed
> > > during the mmap, some of the higher level PGD/PMDs may not have been assigned.
> > >
> > > The bug (at least in 2.4) exists on all platforms but on our platform
> > > attempts to dereference NULL pointers usually cause MCAs. (If a platform
> > > has zeros in page 0, you may be lucky & the code would appear to work,
> > > but it is still a bug).
> > >
> > > The MCA was caused by the NULL pmd dereference in huge_pte_offset. The
> > > MCA doesnt surface until the bad data is consumed.
> > >
> > > A patch against 2.6.5:
> >
> > Recent work on cleaning up hugepage_vma has at least one hunk covered here.
> > http://linux.bkbits.net:8080/linux-2.5/cset@40842336E3nkJ7cWJ0-3zQ7yP4WbHg
> >
> >
>
> Yep... Looks like the same problem has been fixed by David Gibson.
> Ignore my patch.

Jack, I don't mean to stir up the mud.  The hunk in unmap_hugepage_range() in
your original post is still needed for 2.6 kernel, plus Bjorn definitely need
your fix for 2.4 tree ;-)

And all this just making hugetlb demanding paging patch smaller .... :-)

- Ken



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] - deleting huge pages
  2004-05-02 12:30 [PATCH] - deleting huge pages Jack Steiner
                   ` (2 preceding siblings ...)
  2004-05-03 17:12 ` Chen, Kenneth W
@ 2004-05-03 19:47 ` Jack Steiner
  3 siblings, 0 replies; 5+ messages in thread
From: Jack Steiner @ 2004-05-03 19:47 UTC (permalink / raw)
  To: linux-ia64

(Update to previous mail - patch updated for latest tree)

If you attempt to allocate a LOT more huge pages than are physically
available, the kernel may reference a NULL pte.

If the mmap fails, do_mmap_pgoff attempts to unmap the vma range it
was mapping. Depending on where it failed during
the mmap, some of the higher level PGD/PMDs may not have been assigned.

The same problem exists in 2.4. I'll post a second patch in a few
days....




--- linux_base/arch/ia64/mm/hugetlbpage.c	2004-05-03 09:36:00.000000000 -0500
+++ linux/arch/ia64/mm/hugetlbpage.c	2004-05-03 14:38:21.000000000 -0500
@@ -245,7 +245,7 @@
 
 	for (address = start; address < end; address += HPAGE_SIZE) {
 		pte = huge_pte_offset(mm, address);
-		if (pte_none(*pte))
+		if (!pte || pte_none(*pte))
 			continue;
 		page = pte_page(*pte);
 		put_page(page);



-- 
Thanks

Jack Steiner (steiner@sgi.com)          651-683-5302
Principal Engineer                      SGI - Silicon Graphics, Inc.



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2004-05-03 19:47 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-05-02 12:30 [PATCH] - deleting huge pages Jack Steiner
2004-05-02 18:33 ` Chen, Kenneth W
2004-05-03 14:53 ` Jack Steiner
2004-05-03 17:12 ` Chen, Kenneth W
2004-05-03 19:47 ` Jack Steiner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox