public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: Memory management bug
@ 2000-11-17 10:41 schwidefsky
  2000-11-17 15:44 ` Andrea Arcangeli
  0 siblings, 1 reply; 16+ messages in thread
From: schwidefsky @ 2000-11-17 10:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrea Arcangeli, mingo, linux-kernel



>>
>> If they absolutely needs 4 pages for pmd pagetables due hardware
constraints
>> I'd recommend to use _four_ hardware pages for each softpage, not two.
>
>Yes.
>
>However, it definitely is an issue of making trade-offs. Most 64-bit MMU
>models tend to have some flexibility in how you set up the page tables,
>and it may be possible to just move bits around too (ie making both the
>pmd and the pgd twice as large, and getting the expansion of 4 by doing
>two expand-by-two's, for example, if the hardware has support for doing
>things like that).

Unluckly we don't have any flexibility. The segment index (pmd) has 11
bits,
pointers are 8 byte. That makes 16K segment table. I have understood that
this is a problem if the system is really low on memory. But low on memory
does mean low on real memory + swap space, doesn't it ? The system has
enough swap space but it isn't using any of it when the BUG hits. I think
the "if (!order)" statements before the "goto try_again" in __alloc_pages
have something to do with it. To test this assumption I removed the ifs and

I didn't see any "__alloc_pages: %lu-order allocation failed." message
before I hit yet another BUG in swap_state.c:60.
Whats the reasoning behind these ifs ?

blue skies,
   Martin

Linux/390 Design & Development, IBM Deutschland Entwicklung GmbH
Schönaicherstr. 220, D-71032 Böblingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: schwidefsky@de.ibm.com


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: Memory management bug
@ 2000-11-21 19:55 schwidefsky
  0 siblings, 0 replies; 16+ messages in thread
From: schwidefsky @ 2000-11-21 19:55 UTC (permalink / raw)
  To: andrea, riel, torvalds, linux-kernel



>Agreed, that's almost sure _not_ random memory corruption of the page
>structure. It looks like a VM bug (if you can reproduce trivially I'd give
a
>try to test8 too since test8 is rock solid for me while test10 lockups in
VM
>core at the second bonnie if using emulated highmem).
I was lucky. Somehow I managed to f**k up my disk in a way that the
filesystem
check triggers the bug in a reproducible way and always with the same page!
I setup a "trace store into" to the page structure and logged who is
changing
the "struct page". Here is the log starting after page->mapping was set:

address changed   function
5c13a   mapping   add_to_page_cache_unique
                     count=2, flags=PG_locked, age=2
5b14a   next_hash __add_page_to_hash_queue
5b178   buffers   __add_page_to_hash_queue
68440   flags     lru_cache_add
                     flags=PG_active|PG_locked
6846a   lru       lru_cache_add
68470   lru       lru_cache_add
78fc6   virtual   create_empty_buffers
78fda   count     create_empty_buffers
                     count=3
6d9ce   count     __free_pages
                     count=2
5c122   list      __add_page_to_hash_queue
68464   lru       lru_cache_add
77b16   flags     end_buffer_io_async
                     flags=PG_active|PG_uptodate|PG_locked
77b52   flags     end_buffer_io_async
                     flags=PG_active|PG_uptodate|PG_locked
77bc4   flags     end_buffer_io_async
                     flags=PG_active|PG_uptodate
67792   age       age_page_up
                     age=5
5c88c   count     __find_get_page
                     count=3
559be   count     copy_page_range
                     count=4
559be   count     copy_page_rage
                     count=5
6d9ce   count     __free_pages
                     count=4
6b55e   lru       refill_inactive_scan
6b4ac   flags     refill_inactive_scan
                     flags=PG_active|PG_uptodate
6770c   age       age_page_down_ageonly
                     age=2
6b570   lru       refill_inactive_scan
6b576   lru       refill_inactive_scan
6b56a   lru       refill_inactive_scan
6b55e   lru       refill_inactive_scan
6b4ac   flags     refill_inactive_scan
                     flags=PG_active|PG_uptodate
6770c   age       age_page_down_ageonly
                     age=1
6b570   lru       refill_inactive_scan
6b576   lru       refill_inactive_scan
6b56a   lru       refill_inactive_scan
6b55e   lru       refill_inactive_scan
6b4ac   flags     refill_inactive_scan
                     flags=PG_active|PG_uptodate
6770c   age       age_page_down_ageonly
                     age=0
6b570   lru       refill_inactive_scan
6b576   lru       refill_inactive_scan
6b56a   lru       refill_inactive_scan

program check at 6e1e0 because of BUG() in line 60 of swap_state.c.
Stack backtrace from there:
6e1e0 add_to_swap_cache
6900a try_to_swap_out
69408 swap_out_vma
69578 swap_out_mm
69838 swap_out
6b90a refill_inactive
6bab4 do_try_to_free_pages
6bbba kswapd

age_page_down_ageonly was always called from refill_inactive_scan. So
refill_inactive_scan lowers the age of the pages but does not deactivate
the
page when it reached age==0 (page->count to big). try_to_swap_out doesn't
check for page->mapping and tries to swap out the page because the age is
0. Bang!

blue skies,
   Martin

P.S. by the way this test was done on linux-2.4.0-test11

Linux/390 Design & Development, IBM Deutschland Entwicklung GmbH
Schönaicherstr. 220, D-71032 Böblingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: schwidefsky@de.ibm.com


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: Memory management bug
@ 2000-11-17 16:35 schwidefsky
  2000-11-17 16:42 ` Linus Torvalds
  2000-11-17 18:11 ` Andrea Arcangeli
  0 siblings, 2 replies; 16+ messages in thread
From: schwidefsky @ 2000-11-17 16:35 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Linus Torvalds, mingo, linux-kernel



>> before I hit yet another BUG in swap_state.c:60.
>
>The bug in swap_state:60 shows a kernel bug in the VM or random memory
>corruption. Make sure you can reproduce on x86 to be sure it's not a s390
>that is randomly corrupting memory. If you read the oops after the BUG
message
>with asm at hand you will see in the registers the value of page->mapping
and
>you can guess if it's random memory corruption or bug in VM this way (for
>example if `reg & 3 != 0' it's memory corruption for sure, you should also
>if it's pointing to a suitable kernel-heap address).
I did a little closer investigation. The BUG was triggered by a page with
page->mapping pointing to an address space of a mapped ext2 file
(page->mapping->a_ops == &ext2_aops). The page had PG_locked, PG_uptodate,
PG_active and PG_swap_cache set. The stack backstrace showed that kswapd
called do_try_to_free_pages, refill_inactive, swap_out, swap_out_mm,
swap_out_vma, try_to_swap_out and add_to_swap_cache where BUG hit.
The registers look good, the struct page looks good. I don't think that
this
was a random memory corruption.

>> Whats the reasoning behind these ifs ?
>
>To catch memory corruption or things running out of control in the kernel.
I was refering to the "if (!order) goto try_again" ifs in alloc_pages, not
the "if (something) BUG()" ifs.

blue skies,
   Martin

Linux/390 Design & Development, IBM Deutschland Entwicklung GmbH
Schönaicherstr. 220, D-71032 Böblingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: schwidefsky@de.ibm.com


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: Memory management bug
@ 2000-11-16 16:12 schwidefsky
  2000-11-16 17:01 ` Linus Torvalds
  0 siblings, 1 reply; 16+ messages in thread
From: schwidefsky @ 2000-11-16 16:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: mingo, linux-kernel



>What happens if you just replace all places that would use a bad page
>table with a BUG()? (Ie do _not_ add the bug to the place where you
>added the test: by that time it's too late.  I'm talking about the
>places where the bad page tables are used, like in the error cases of
>"get_pte_kernel_slow()" etc.

Ok, the BUG() hit in get_pmd_slow:

pmd_t *
get_pmd_slow(pgd_t *pgd, unsigned long offset)
{
        pmd_t *pmd;
        int i;

        pmd = (pmd_t *) __get_free_pages(GFP_KERNEL,2);
        if (pgd_none(*pgd)) {
                if (pmd) {
                        for (i = 0; i < PTRS_PER_PMD; i++)
                                pmd_clear(pmd+i);
                        pgd_set(pgd, pmd);
                        return pmd + offset;
                }
                BUG();  /* <--- this one hit */
                pmd = (pmd_t *) get_bad_pmd_table();
                pgd_set(pgd, pmd);
                return NULL;
        }
        free_pages((unsigned long)pmd,2);
        if (pgd_bad(*pgd))
                BUG();
        return (pmd_t *) pgd_page(*pgd) + offset;
}

The allocation of 4 consecutive pages for the page middle directory failed.
This caused empty_bad_pmd_table to be used and clear_page_tables inserted
it to the pmd quicklist. The important question is: why did
__get_free_pages fail?

blue skies,
   Martin

Linux/390 Design & Development, IBM Deutschland Entwicklung GmbH
Schönaicherstr. 220, D-71032 Böblingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: schwidefsky@de.ibm.com


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: Memory management bug
@ 2000-11-15 13:24 schwidefsky
  0 siblings, 0 replies; 16+ messages in thread
From: schwidefsky @ 2000-11-15 13:24 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel



>> +extern pte_t empty_bad_pte_table[];
>>  extern __inline__ void free_pte_fast(pte_t *pte)
>>  {
>> +       if (pte == empty_bad_pte_table)
>> +               return;
>
>I guess that should be BUG() instead of return, so that the callers can be
>fixed.
Not really. pte_free and pmd_free are called from the common mm code but
the concept of empty_bad_{pte,pmd}_table is architecture dependent. The
trouble starts in arch/???/mm/init.c where these special arrays are
inserted into the paging tables. So the solution to the problem should be
in architecture dependent files too.

blue skies,
   Martin

Linux/390 Design & Development, IBM Deutschland Entwicklung GmbH
Schönaicherstr. 220, D-71032 Böblingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: schwidefsky@de.ibm.com


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 16+ messages in thread
* Memory management bug
@ 2000-11-15 12:39 schwidefsky
  2000-11-15 13:19 ` Andi Kleen
  2000-11-15 16:45 ` Linus Torvalds
  0 siblings, 2 replies; 16+ messages in thread
From: schwidefsky @ 2000-11-15 12:39 UTC (permalink / raw)
  To: linux-kernel



I think I spotted a problem in the memory management of some (all?)
architectures in 2.4.0-test10.  At the moment I am fighting with the 64bit
backend for the new S/390 machines. I experienced infinite loops in
do_check_pgt_cache because pgtable_cache_size indicated that a lot of pages
are in the quicklists but the pgd/pmd/pte quicklists have been empty (NULL
pointers). After some trickery with some special hardware feature (storage
keys) I found out that empty_bad_pmd_table and empty_bad_pte_table have
been put to the page table quicklists multiple(!) times. It is already a
bug that these two arrays are inserted into the quicklist at all but the
second insertation destroys the quicklists. I solved this problem by
inserting checks for the special entries in  the free_xxx_fast routines,
here is a sample for the i386 free_pte_fast:

diff -u -r1.5 pgalloc.h
--- include/asm-i386/pgalloc.h  2000/11/02 10:14:51     1.5
+++ include/asm-i386/pgalloc.h  2000/11/15 12:27:58
@@ -80,8 +80,11 @@
        return (pte_t *)ret;
 }

+extern pte_t empty_bad_pte_table[];
 extern __inline__ void free_pte_fast(pte_t *pte)
 {
+       if (pte == empty_bad_pte_table)
+               return;
        *(unsigned long *)pte = (unsigned long) pte_quicklist;
        pte_quicklist = (unsigned long *) pte;
        pgtable_cache_size++;

I still get the "__alloc_pages: 2-order allocation failed." error messages
but at least the machine doesn't go into infinite loops anymore. Could
someone with more experience with the other architectures verify that my
observation is true?

blue skies,
   Martin

Linux/390 Design & Development, IBM Deutschland Entwicklung GmbH
Schönaicherstr. 220, D-71032 Böblingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: schwidefsky@de.ibm.com


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2000-11-21 20:23 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2000-11-17 10:41 Memory management bug schwidefsky
2000-11-17 15:44 ` Andrea Arcangeli
2000-11-17 19:12   ` Rik van Riel
  -- strict thread matches above, loose matches on Subject: below --
2000-11-21 19:55 schwidefsky
2000-11-17 16:35 schwidefsky
2000-11-17 16:42 ` Linus Torvalds
2000-11-17 18:11 ` Andrea Arcangeli
2000-11-17 19:15   ` Rik van Riel
2000-11-16 16:12 schwidefsky
2000-11-16 17:01 ` Linus Torvalds
2000-11-16 17:45   ` Andrea Arcangeli
2000-11-16 18:07     ` Linus Torvalds
2000-11-15 13:24 schwidefsky
2000-11-15 12:39 schwidefsky
2000-11-15 13:19 ` Andi Kleen
2000-11-15 16:45 ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox