* RE: [PATCH 0/6] freepgt: free_pgtables shakeup
@ 2005-03-24 1:00 Luck, Tony
2005-03-24 1:07 ` Nick Piggin
0 siblings, 1 reply; 24+ messages in thread
From: Luck, Tony @ 2005-03-24 1:00 UTC (permalink / raw)
To: Nick Piggin, David S. Miller; +Cc: Hugh Dickins, akpm, benh, ak, linux-kernel
>OK, attached is my first cut at slimming down the boundary tests.
>I have only had a chance to try it on i386, so I hate to drop it
>on you like this - but I *have* put a bit of thought into it....
>Treat it as an RFC, and I'll try to test it on a wider range of
>things in the next couple of days.
>
>Not that there is anything really nasty with your system David,
>so I don't think it will be a big disaster if I can't get this to
>work.
>
>Goes on top of Hugh's 6 patches.
Runs on ia64. Looks much cleaner too.
-Tony
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH 0/6] freepgt: free_pgtables shakeup 2005-03-24 1:00 [PATCH 0/6] freepgt: free_pgtables shakeup Luck, Tony @ 2005-03-24 1:07 ` Nick Piggin 0 siblings, 0 replies; 24+ messages in thread From: Nick Piggin @ 2005-03-24 1:07 UTC (permalink / raw) To: Luck, Tony; +Cc: David S. Miller, Hugh Dickins, akpm, benh, ak, linux-kernel Luck, Tony wrote: >>OK, attached is my first cut at slimming down the boundary tests. >>I have only had a chance to try it on i386, so I hate to drop it >>on you like this - but I *have* put a bit of thought into it.... >>Treat it as an RFC, and I'll try to test it on a wider range of >>things in the next couple of days. >> >>Not that there is anything really nasty with your system David, >>so I don't think it will be a big disaster if I can't get this to >>work. >> >>Goes on top of Hugh's 6 patches. > > > Runs on ia64. Looks much cleaner too. > Oh good. Thanks for testing, Tony. ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 0/6] freepgt: free_pgtables shakeup @ 2005-03-23 17:10 Hugh Dickins 2005-03-23 19:57 ` David S. Miller 2005-03-25 21:22 ` Russell King 0 siblings, 2 replies; 24+ messages in thread From: Hugh Dickins @ 2005-03-23 17:10 UTC (permalink / raw) To: Nick Piggin; +Cc: akpm, davem, tony.luck, benh, ak, linux-kernel Here's the recut of those patches, including David Miller's vital fixes. I'm addressing these to Nick rather than Andrew, because they're perhaps not fit for -mm until more testing done and the x86_64 32-bit vdso issue handled. I'm unlikely to be responsive until next week, sorry: over to you, Nick - thanks. Hugh arch/i386/mm/hugetlbpage.c | 11 -- arch/i386/mm/pgtable.c | 2 arch/ia64/mm/hugetlbpage.c | 60 ++++--------- arch/ppc64/mm/hugetlbpage.c | 47 ---------- include/asm-generic/pgtable.h | 8 - include/asm-ia64/page.h | 2 include/asm-ia64/pgtable.h | 30 ------ include/asm-ia64/processor.h | 8 - include/asm-ppc64/pgtable.h | 12 +- include/asm-ppc64/processor.h | 4 include/asm-s390/processor.h | 2 include/asm-sparc64/pgtable.h | 15 --- include/linux/hugetlb.h | 6 - include/linux/mm.h | 14 +-- mm/memory.c | 190 ++++++++++++++++++++++++++++++------------ mm/mmap.c | 139 ++++++++---------------------- 16 files changed, 226 insertions(+), 324 deletions(-) ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/6] freepgt: free_pgtables shakeup 2005-03-23 17:10 Hugh Dickins @ 2005-03-23 19:57 ` David S. Miller 2005-03-24 0:26 ` Nick Piggin 2005-03-25 21:22 ` Russell King 1 sibling, 1 reply; 24+ messages in thread From: David S. Miller @ 2005-03-23 19:57 UTC (permalink / raw) To: Hugh Dickins; +Cc: nickpiggin, akpm, tony.luck, benh, ak, linux-kernel On Wed, 23 Mar 2005 17:10:15 +0000 (GMT) Hugh Dickins <hugh@veritas.com> wrote: > Here's the recut of those patches, including David Miller's vital fixes. > I'm addressing these to Nick rather than Andrew, because they're perhaps > not fit for -mm until more testing done and the x86_64 32-bit vdso issue > handled. I'm unlikely to be responsive until next week, sorry: over to > you, Nick - thanks. Works perfectly fine on sparc64. BTW, I note that we may still want something like that page table bitmask stuff I worked on some time ago. Ie. for things like what lat_mmap does in lmbench, I think that situation is more realistic than people might thing. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/6] freepgt: free_pgtables shakeup 2005-03-23 19:57 ` David S. Miller @ 2005-03-24 0:26 ` Nick Piggin 2005-03-24 5:44 ` David S. Miller 2005-03-30 19:30 ` Hugh Dickins 0 siblings, 2 replies; 24+ messages in thread From: Nick Piggin @ 2005-03-24 0:26 UTC (permalink / raw) To: David S. Miller; +Cc: Hugh Dickins, akpm, tony.luck, benh, ak, linux-kernel [-- Attachment #1: Type: text/plain, Size: 945 bytes --] David S. Miller wrote: > On Wed, 23 Mar 2005 17:10:15 +0000 (GMT) > Hugh Dickins <hugh@veritas.com> wrote: > > >>Here's the recut of those patches, including David Miller's vital fixes. >>I'm addressing these to Nick rather than Andrew, because they're perhaps >>not fit for -mm until more testing done and the x86_64 32-bit vdso issue >>handled. I'm unlikely to be responsive until next week, sorry: over to >>you, Nick - thanks. > > > Works perfectly fine on sparc64. > OK, attached is my first cut at slimming down the boundary tests. I have only had a chance to try it on i386, so I hate to drop it on you like this - but I *have* put a bit of thought into it.... Treat it as an RFC, and I'll try to test it on a wider range of things in the next couple of days. Not that there is anything really nasty with your system David, so I don't think it will be a big disaster if I can't get this to work. Goes on top of Hugh's 6 patches. [-- Attachment #2: fix-ptclear --] [-- Type: text/plain, Size: 3732 bytes --] Index: linux-2.6/mm/memory.c =================================================================== --- linux-2.6.orig/mm/memory.c 2005-03-24 10:43:31.000000000 +1100 +++ linux-2.6/mm/memory.c 2005-03-24 11:22:21.000000000 +1100 @@ -139,14 +139,9 @@ static inline void free_pmd_range(struct start &= PUD_MASK; if (start < floor) return; - if (ceiling) { - ceiling &= PUD_MASK; - if (!ceiling) - return; - } - if (end - 1 > ceiling - 1) + end = (end + PUD_SIZE - 1) & PUD_MASK; + if (end - ceiling - 1 < PUD_SIZE - 1) return; - pmd = pmd_offset(pud, start); pud_clear(pud); pmd_free_tlb(tlb, pmd); @@ -172,14 +167,9 @@ static inline void free_pud_range(struct start &= PGDIR_MASK; if (start < floor) return; - if (ceiling) { - ceiling &= PGDIR_MASK; - if (!ceiling) - return; - } - if (end - 1 > ceiling - 1) + end = (end + PGDIR_SIZE - 1) & PGDIR_MASK; + if (end - ceiling - 1 < PGDIR_SIZE - 1) return; - pud = pud_offset(pgd, start); pgd_clear(pgd); pud_free_tlb(tlb, pud); @@ -198,6 +188,10 @@ void free_pgd_range(struct mmu_gather ** unsigned long next; unsigned long start; + BUG_ON(addr >= end); + /* Don't want end to be 0 and ceiling to be greater than 0-PGDIR_SIZE */ + BUG_ON(end - 1 > ceiling - 1); + /* * The next few lines have given us lots of grief... * @@ -205,23 +199,22 @@ void free_pgd_range(struct mmu_gather ** * there will be no work to do at all, and we'd prefer not to * go all the way down to the bottom just to discover that. * - * Why all these "- 1"s? Because 0 represents both the bottom - * of the address space and the top of it (using -1 for the - * top wouldn't help much: the masks would do the wrong thing). - * The rule is that addr 0 and floor 0 refer to the bottom of - * the address space, but end 0 and ceiling 0 refer to the top - * Comparisons need to use "end - 1" and "ceiling - 1" (though - * that end 0 case should be mythical). - * - * Wherever addr is brought up or ceiling brought down, we must - * be careful to reject "the opposite 0" before it confuses the - * subsequent tests. But what about where end is brought down - * by PMD_SIZE below? no, end can't go down to 0 there. + * The tricky part of this logic (and similar in free_p?d_range above) + * is the 'end' handling. end and ceiling are *exclusive* boundaries, + * so their maximum is 0. This suggests the use of two's complement + * difference when comparing them, so the wrapping is handled for us. * - * Whereas we round start (addr) and ceiling down, by different - * masks at different levels, in order to test whether a table - * now has no other vmas using it, so can be freed, we don't - * bother to round floor or end up - the tests don't need that. + * The method is: + * - Round end up to the nearest PMD aligned boundary. + * - If end has exceeded ceiling, then end - ceiling will be less than + * PMD_SIZE. + * end can't have approached ceiling from above if ceiling is 0, + * because it is rounded up to the next PMD aligned boundary, so + * either it will be 0, or 0+PMD_SIZE. + * - In the above case that end is 0, or any other time end might be + * equal to ceiling, end - ceiling = 0 < PMD_SIZE. So the actual test + * we use is (unsigned) end - ceiling - 1 < PMD_SIZE - 1, + * to catch this case. */ addr &= PMD_MASK; @@ -230,14 +223,10 @@ void free_pgd_range(struct mmu_gather ** if (!addr) return; } - if (ceiling) { - ceiling &= PMD_MASK; - if (!ceiling) - return; - } - if (end - 1 > ceiling - 1) + end = (end + PMD_SIZE - 1) & PMD_MASK; + if (end - ceiling - 1 < PMD_SIZE - 1) end -= PMD_SIZE; - if (addr > end - 1) + if (addr >= end) return; start = addr; ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/6] freepgt: free_pgtables shakeup 2005-03-24 0:26 ` Nick Piggin @ 2005-03-24 5:44 ` David S. Miller 2005-03-30 19:30 ` Hugh Dickins 1 sibling, 0 replies; 24+ messages in thread From: David S. Miller @ 2005-03-24 5:44 UTC (permalink / raw) To: Nick Piggin; +Cc: hugh, akpm, tony.luck, benh, ak, linux-kernel On Thu, 24 Mar 2005 11:26:16 +1100 Nick Piggin <nickpiggin@yahoo.com.au> wrote: > OK, attached is my first cut at slimming down the boundary tests. Works perfectly fine on sparc64. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/6] freepgt: free_pgtables shakeup 2005-03-24 0:26 ` Nick Piggin 2005-03-24 5:44 ` David S. Miller @ 2005-03-30 19:30 ` Hugh Dickins 2005-03-30 23:40 ` Nick Piggin 1 sibling, 1 reply; 24+ messages in thread From: Hugh Dickins @ 2005-03-30 19:30 UTC (permalink / raw) To: Nick Piggin; +Cc: David S. Miller, akpm, tony.luck, benh, ak, linux-kernel On Thu, 24 Mar 2005, Nick Piggin wrote: > > OK, attached is my first cut at slimming down the boundary tests. > I have only had a chance to try it on i386, so I hate to drop it > on you like this - but I *have* put a bit of thought into it.... > Treat it as an RFC, and I'll try to test it on a wider range of > things in the next couple of days. I've stared and stared at it. I think I mostly like it. It's nicer to be rounding end up than ceiling down. It's clearly superior to what David and I had, in branching less (other than in your BUG_ONs), and I do believe your "if (end - ceiling - 1 < P*_SIZE - 1)" is correct and efficient. But I still find it harder to understand than ours; and don't understand at all your comment "end can't have approached ceiling from above...." - but I think you're bravely trying to explain the case I sidestepped with a lordly unexplained "end can't go down to 0 there". Let others decide. One thing I believe is outright wrong, at least with out-of-tree patches: your change from "if (addr > end - 1)" to "if (addr >= end)", after you've just rounded up end (perhaps to 0). (And let me astonish you by asking for the blank lines back before pmd_offset and pud_offset!) Hugh ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/6] freepgt: free_pgtables shakeup 2005-03-30 19:30 ` Hugh Dickins @ 2005-03-30 23:40 ` Nick Piggin 0 siblings, 0 replies; 24+ messages in thread From: Nick Piggin @ 2005-03-30 23:40 UTC (permalink / raw) To: Hugh Dickins; +Cc: David S. Miller, akpm, tony.luck, benh, ak, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1729 bytes --] Hugh Dickins wrote: > It's clearly superior to what David and I had, in branching > less (other than in your BUG_ONs), and I do believe your > "if (end - ceiling - 1 < P*_SIZE - 1)" is correct and efficient. > Well the BUG_ONs were more to just satisfy me that my assumptions were correct. Not to mention only contained in the top level, so they shouldn't hurt performance. But they could go. > But I still find it harder to understand than ours; and don't > understand at all your comment "end can't have approached ceiling > from above...." - but I think you're bravely trying to explain the case > I sidestepped with a lordly unexplained "end can't go down to 0 there". > Yes, say ceiling is 0 - something less than P*_SIZE, you might get the feeling that end may be able to come within our limit of it if it were a very small number. This can't happen because 0 is actually the top of address space, and end can't be *greater* than ceiling before any rounding. If it is not 0, then it must be at least 1, in which case it will always be rounded up to the next P*_SIZE boundary. So no problem. This may have been obvious to you from the start, in which case my extra rambling may have confused you... actually on re-reading it, it would have confused you no matter what. See if the next version is better. > Let others decide. > > One thing I believe is outright wrong, at least with out-of-tree > patches: your change from "if (addr > end - 1)" to "if (addr >= end)", > after you've just rounded up end (perhaps to 0). > Oh yes, good catch. I don't know why I did that :( > (And let me astonish you by asking for the blank lines back before > pmd_offset and pud_offset!) > Hugh? What have you done with Hugh? [-- Attachment #2: freepgt-boundary-tests.patch --] [-- Type: text/plain, Size: 4337 bytes --] Simplify (from the machine's point of view) the infamous boundary tests. The method, and an outline of the proof (which I haven't actually done) is recorded in the comments. It is not conceptually much more difficult than the current method when it is understood, although it doesn't present the corner cases so explicitly in code (hence the need for comments). Eliminates 2 branches per freeable page table level. Tested and works on i386, ia64, sparc64. Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Index: linux-2.6/mm/memory.c =================================================================== --- linux-2.6.orig/mm/memory.c 2005-03-29 17:09:16.000000000 +1000 +++ linux-2.6/mm/memory.c 2005-03-31 09:39:31.000000000 +1000 @@ -139,12 +139,8 @@ static inline void free_pmd_range(struct start &= PUD_MASK; if (start < floor) return; - if (ceiling) { - ceiling &= PUD_MASK; - if (!ceiling) - return; - } - if (end - 1 > ceiling - 1) + end = (end + PUD_SIZE - 1) & PUD_MASK; + if (end - ceiling - 1 < PUD_SIZE - 1) return; pmd = pmd_offset(pud, start); @@ -172,12 +168,8 @@ static inline void free_pud_range(struct start &= PGDIR_MASK; if (start < floor) return; - if (ceiling) { - ceiling &= PGDIR_MASK; - if (!ceiling) - return; - } - if (end - 1 > ceiling - 1) + end = (end + PGDIR_SIZE - 1) & PGDIR_MASK; + if (end - ceiling - 1 < PGDIR_SIZE - 1) return; pud = pud_offset(pgd, start); @@ -198,6 +190,10 @@ void free_pgd_range(struct mmu_gather ** unsigned long next; unsigned long start; + BUG_ON(addr >= end); + /* Don't want end to be 0 and ceiling to be greater than 0-PGDIR_SIZE */ + BUG_ON(end - 1 > ceiling - 1); + /* * The next few lines have given us lots of grief... * @@ -205,23 +201,25 @@ void free_pgd_range(struct mmu_gather ** * there will be no work to do at all, and we'd prefer not to * go all the way down to the bottom just to discover that. * - * Why all these "- 1"s? Because 0 represents both the bottom - * of the address space and the top of it (using -1 for the - * top wouldn't help much: the masks would do the wrong thing). - * The rule is that addr 0 and floor 0 refer to the bottom of - * the address space, but end 0 and ceiling 0 refer to the top - * Comparisons need to use "end - 1" and "ceiling - 1" (though - * that end 0 case should be mythical). - * - * Wherever addr is brought up or ceiling brought down, we must - * be careful to reject "the opposite 0" before it confuses the - * subsequent tests. But what about where end is brought down - * by PMD_SIZE below? no, end can't go down to 0 there. + * The tricky part of this logic (and similar in free_p?d_range above) + * is the 'end' handling. end and ceiling are *exclusive* boundaries, + * so their maximum is 0. This suggests the use of two's complement + * difference when comparing them, so the wrapping is handled for us. * - * Whereas we round start (addr) and ceiling down, by different - * masks at different levels, in order to test whether a table - * now has no other vmas using it, so can be freed, we don't - * bother to round floor or end up - the tests don't need that. + * The method is: + * - Round end up to the nearest PMD aligned boundary. + * - If end has exceeded ceiling, then end - ceiling will be less than + * PMD_SIZE. + * - If end is very small (close to 0) and ceiling is very large + * (close to wrapping to 0, or 0), then the end - ceiling condition + * needs to be false. This holds because end must be at least 1, and + * so rounding it up will always take it to the first PMD boundary, + * and hence out of reach of ceiling. + * - If end is 0 (top of address space), then ceiling must also be 0. + * - In the above case that end is 0, or any other time end might be + * equal to ceiling, end - ceiling = 0 < PMD_SIZE. So the actual test + * we use is (unsigned) end - ceiling - 1 < PMD_SIZE - 1, + * to catch this case. */ addr &= PMD_MASK; @@ -230,12 +228,8 @@ void free_pgd_range(struct mmu_gather ** if (!addr) return; } - if (ceiling) { - ceiling &= PMD_MASK; - if (!ceiling) - return; - } - if (end - 1 > ceiling - 1) + end = (end + PMD_SIZE - 1) & PMD_MASK; + if (end - ceiling - 1 < PMD_SIZE - 1) end -= PMD_SIZE; if (addr > end - 1) return; ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/6] freepgt: free_pgtables shakeup 2005-03-23 17:10 Hugh Dickins 2005-03-23 19:57 ` David S. Miller @ 2005-03-25 21:22 ` Russell King 2005-03-26 2:06 ` Nick Piggin 1 sibling, 1 reply; 24+ messages in thread From: Russell King @ 2005-03-25 21:22 UTC (permalink / raw) To: Hugh Dickins; +Cc: Nick Piggin, akpm, davem, tony.luck, benh, ak, linux-kernel On Wed, Mar 23, 2005 at 05:10:15PM +0000, Hugh Dickins wrote: > Here's the recut of those patches, including David Miller's vital fixes. > I'm addressing these to Nick rather than Andrew, because they're perhaps > not fit for -mm until more testing done and the x86_64 32-bit vdso issue > handled. I'm unlikely to be responsive until next week, sorry: over to > you, Nick - thanks. I thought I'd try these out on ARM, but alas they don't apply to 2.6.12-rc1. ;( This means I won't be testing them, sorry. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 Serial core ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/6] freepgt: free_pgtables shakeup 2005-03-25 21:22 ` Russell King @ 2005-03-26 2:06 ` Nick Piggin 2005-03-26 11:35 ` Russell King 0 siblings, 1 reply; 24+ messages in thread From: Nick Piggin @ 2005-03-26 2:06 UTC (permalink / raw) To: Russell King; +Cc: Hugh Dickins, akpm, davem, tony.luck, benh, ak, linux-kernel Russell King wrote: > On Wed, Mar 23, 2005 at 05:10:15PM +0000, Hugh Dickins wrote: > >>Here's the recut of those patches, including David Miller's vital fixes. >>I'm addressing these to Nick rather than Andrew, because they're perhaps >>not fit for -mm until more testing done and the x86_64 32-bit vdso issue >>handled. I'm unlikely to be responsive until next week, sorry: over to >>you, Nick - thanks. > > > I thought I'd try these out on ARM, but alas they don't apply to > 2.6.12-rc1. ;( This means I won't be testing them, sorry. > The reject should be confined to include/asm-ia64, so it will still work for you. But I've put a clean rollup of all Hugh's patches here in case you'd like to try it. http://www.kerneltrap.org/~npiggin/freepgt-2.6.12-rc1.patch ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/6] freepgt: free_pgtables shakeup 2005-03-26 2:06 ` Nick Piggin @ 2005-03-26 11:35 ` Russell King 2005-03-26 13:37 ` Russell King 2005-03-26 13:42 ` Nick Piggin 0 siblings, 2 replies; 24+ messages in thread From: Russell King @ 2005-03-26 11:35 UTC (permalink / raw) To: Nick Piggin; +Cc: Hugh Dickins, akpm, davem, tony.luck, benh, ak, linux-kernel On Sat, Mar 26, 2005 at 01:06:47PM +1100, Nick Piggin wrote: > The reject should be confined to include/asm-ia64, so it will still > work for you. I guess I should've tried a little harder last night then. Sorry. > But I've put a clean rollup of all Hugh's patches here in case you'd > like to try it. > > http://www.kerneltrap.org/~npiggin/freepgt-2.6.12-rc1.patch This works fine on ARM with high vectors. With low vectors (located in the 1st page of virtual memory space) I get: kernel BUG at mm/mmap.c:1934! Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = c1e88000 [00000000] *pgd=c1e86031, *pte=c04440cf, *ppte=c044400e Internal error: Oops: c1e8981f [#1] Modules linked in: CPU: 0 PC is at __bug+0x40/0x54 LR is at 0x1 pc : [<c0223870>] lr : [<00000001>] Not tainted sp : c1e7bd18 ip : 60000093 fp : c1e7bd28 r10: c1f4b040 r9 : 00000006 r8 : c1f02ca0 r7 : 00000000 r6 : 00000000 r5 : c015b8c0 r4 : 00000000 r3 : 00000000 r2 : 00000000 r1 : 00000d4e r0 : 00000001 Flags: nZCv IRQs on FIQs on Mode SVC_32 Segment user Control: C1E8917F Table: C1E8917F DAC: 00000015 Process init (pid: 235, stack limit = 0xc1e7a194) Stack: (0xc1e7bd18 to 0xc1e7c000) ... Backtrace: [<c0223830>] (__bug+0x0/0x54) from [<c02691d8>] (exit_mmap+0x154/0x168) r4 = C1E7BD3C [<c0269084>] (exit_mmap+0x0/0x168) from [<c02358c8>] (mmput+0x40/0xc0) r7 = C015B8C0 r6 = C015B8C0 r5 = 00000000 r4 = C015B8C0 [<c0235888>] (mmput+0x0/0xc0) from [<c027ecec>] (exec_mmap+0xec/0x134) r4 = C015B6A0 [<c027ec00>] (exec_mmap+0x0/0x134) from [<c027f234>] (flush_old_exec+0x4c8/0x6e4) r7 = C012B940 r6 = C1E7A000 r5 = C0498A00 r4 = 00000000 [<c027ed6c>] (flush_old_exec+0x0/0x6e4) from [<c029d53c>] (load_elf_binary+0x5c0/0xdc0) [<c029cf7c>] (load_elf_binary+0x0/0xdc0) from [<c027f6e0>] (search_binary_handler+0xa0/0x244) [<c027f640>] (search_binary_handler+0x0/0x244) from [<c029c4e8>] (load_script+0x224/0x22c) [<c029c2c4>] (load_script+0x0/0x22c) from [<c027f6e0>] (search_binary_handler+0xa0/0x244) r6 = C1E7A000 r5 = C015E400 r4 = C03EC2B4 [<c027f640>] (search_binary_handler+0x0/0x244) from [<c027f9b8>] (do_execve+0x134/0x1f8) [<c027f884>] (do_execve+0x0/0x1f8) from [<c02223f8>] (sys_execve+0x3c/0x5c) [<c02223bc>] (sys_execve+0x0/0x5c) from [<c021dca0>] (ret_fast_syscall+0x0/0x2c) r7 = 0000000B r6 = BED6AA74 r5 = BED6AB00 r4 = 0200F008 Code: 1b0051b8 e59f0014 eb0051b6 e3a03000 (e5833000) In this case, we have a page which must be kept mapped at virtual address 0, which means the first entry in the L1 page table must always exist. However, user threads start from 0x8000, which is also mapped via the first entry in the L1 page table. At a guess, I'd imagine that we're freeing the first L1 page table entry, thereby causing mm->nr_ptes to become -1. I'll do some debugging and try to work out if that (or exactly what's) going on. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 Serial core ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/6] freepgt: free_pgtables shakeup 2005-03-26 11:35 ` Russell King @ 2005-03-26 13:37 ` Russell King 2005-03-26 13:51 ` Nick Piggin 2005-03-26 13:42 ` Nick Piggin 1 sibling, 1 reply; 24+ messages in thread From: Russell King @ 2005-03-26 13:37 UTC (permalink / raw) To: Nick Piggin, Hugh Dickins, akpm, davem, tony.luck, benh, ak, linux-kernel On Sat, Mar 26, 2005 at 11:35:30AM +0000, Russell King wrote: > In this case, we have a page which must be kept mapped at virtual address > 0, which means the first entry in the L1 page table must always exist. > However, user threads start from 0x8000, which is also mapped via the > first entry in the L1 page table. > > At a guess, I'd imagine that we're freeing the first L1 page table entry, > thereby causing mm->nr_ptes to become -1. > > I'll do some debugging and try to work out if that (or exactly what's) > going on. Ok. What's happening is that the ARM pgd_alloc uses pte_alloc_map() to allocate the first L1 page table. This sets mm->nr_ptes to be one. The ARM free_pgd knows about this, and will free this PTE at the appropriate time. However, exit_mmap() doesn't know about this itself, so in the ARM case, it should BUG_ON(mm->nr_ptes != 1) if we're using low vectors. I guess we could hack it such that the ARM pgd_alloc decrements mm->nr_ptes itself to keep things balanced, since free_pte_range() won't be called for this pte. However, I don't like that because its likely to break at some point in the future. Any ideas how to cleanly support this with the new infrastructure? -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 Serial core ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/6] freepgt: free_pgtables shakeup 2005-03-26 13:37 ` Russell King @ 2005-03-26 13:51 ` Nick Piggin 2005-03-26 15:03 ` Russell King 0 siblings, 1 reply; 24+ messages in thread From: Nick Piggin @ 2005-03-26 13:51 UTC (permalink / raw) To: Russell King; +Cc: Hugh Dickins, akpm, davem, tony.luck, benh, ak, linux-kernel Russell King wrote: > Ok. What's happening is that the ARM pgd_alloc uses pte_alloc_map() to > allocate the first L1 page table. This sets mm->nr_ptes to be one. > > The ARM free_pgd knows about this, and will free this PTE at the > appropriate time. However, exit_mmap() doesn't know about this itself, > so in the ARM case, it should BUG_ON(mm->nr_ptes != 1) if we're using > low vectors. > > I guess we could hack it such that the ARM pgd_alloc decrements mm->nr_ptes > itself to keep things balanced, since free_pte_range() won't be called > for this pte. However, I don't like that because its likely to break at > some point in the future. > > Any ideas how to cleanly support this with the new infrastructure? > Hmm, in that case it could be just a problem with that BUG_ON() - it wasn't there before... but it seems like a very useful test, especially with all this new work going on, so it would be a shame not to run it in releases. But I don't quite understand (should really look at the code more), how come you aren't leaking memory? Do _all_ processes share this same first L1 page table? If so, can it be allocated with a more specialised function? If not, can nr_ptes be decremented in the place where it is freed? /me goes to bed now - I'll have a bit of a look tomorrow. Nick ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/6] freepgt: free_pgtables shakeup 2005-03-26 13:51 ` Nick Piggin @ 2005-03-26 15:03 ` Russell King 2005-03-30 17:00 ` Hugh Dickins 0 siblings, 1 reply; 24+ messages in thread From: Russell King @ 2005-03-26 15:03 UTC (permalink / raw) To: Nick Piggin; +Cc: Hugh Dickins, akpm, davem, tony.luck, benh, ak, linux-kernel On Sun, Mar 27, 2005 at 12:51:14AM +1100, Nick Piggin wrote: > Hmm, in that case it could be just a problem with that BUG_ON() - > it wasn't there before... but it seems like a very useful test, > especially with all this new work going on, so it would be a shame > not to run it in releases. Indeed. > But I don't quite understand (should really look at the code more), > how come you aren't leaking memory? The ARM free_pgd_slow() knows about this special first L1 page table, and knows to free it if it exists when its called. > Do _all_ processes share this same first L1 page table? No. It has to be specific to each process. Each L1 page table entry covers 2MB, but executables start at virtual 0x8000. I guess we could open-code pte_alloc_map() in get_pgd_slow() which could solve this problem by omitting the mm->nr_ptes accounting; it may also reduce the complexity as well. I'm just slightly concerned that this may be rather fragile in terms of future development. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 Serial core ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/6] freepgt: free_pgtables shakeup 2005-03-26 15:03 ` Russell King @ 2005-03-30 17:00 ` Hugh Dickins 2005-03-30 17:39 ` Russell King 0 siblings, 1 reply; 24+ messages in thread From: Hugh Dickins @ 2005-03-30 17:00 UTC (permalink / raw) To: Russell King; +Cc: Nick Piggin, akpm, davem, tony.luck, benh, ak, linux-kernel On Sat, 26 Mar 2005, Russell King wrote: > On Sun, Mar 27, 2005 at 12:51:14AM +1100, Nick Piggin wrote: > > > But I don't quite understand (should really look at the code more), > > how come you aren't leaking memory? > > The ARM free_pgd_slow() knows about this special first L1 page table, and > knows to free it if it exists when its called. > > > Do _all_ processes share this same first L1 page table? > > No. It has to be specific to each process. Each L1 page table entry > covers 2MB, but executables start at virtual 0x8000. I'm not satisfied with Nick's patch because it defines FIRST_USER_ADDRESS as FIRST_USER_PGD_NR * PGDIR_SIZE i.e. 2MB. And if that is the first user address, then there is no need for his patch, because the new free_pgtables will never touch that pgd because there's no vma in it. That's why I thought arm needed no special code for this (overlooking the nr_ptes bug issue, sorry). But above you say FIRST_USER_ADDRESS should actually be 0x8000? In that case, I think we should define FIRST_USER_ADDRESS to 0, either in every asm-arch or in asm-generic, with arm overriding it to 0x8000 - or better, to whatever #define you already have for that 0x8000, but I didn't find it. If vmas can occur in between 0x8000 and 2MB, then with Nick's patch we'd be liable to pass free_pgtables a vma with vm_start lower than floor, which is not a case I've thought through, nor wish to. And, whether FIRST_USER_ADDRESS is 0x8000 or 2MB, shouldn't your arch_get_unmapped_area be enforcing it? Hugh ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/6] freepgt: free_pgtables shakeup 2005-03-30 17:00 ` Hugh Dickins @ 2005-03-30 17:39 ` Russell King 0 siblings, 0 replies; 24+ messages in thread From: Russell King @ 2005-03-30 17:39 UTC (permalink / raw) To: Hugh Dickins; +Cc: Nick Piggin, akpm, davem, tony.luck, benh, ak, linux-kernel On Wed, Mar 30, 2005 at 06:00:54PM +0100, Hugh Dickins wrote: > And, whether FIRST_USER_ADDRESS is 0x8000 or 2MB, > shouldn't your arch_get_unmapped_area be enforcing it? Why should it? arch_get_unmapped_area allocates address space dynamically when NULL is passed, and always starts from TASK_UNMAPPED_BASE. This will be greater than 32K. The protection against mapping things below 32K comes from the syscall wrappers on mmap() and mremap(). -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 Serial core ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/6] freepgt: free_pgtables shakeup 2005-03-26 11:35 ` Russell King 2005-03-26 13:37 ` Russell King @ 2005-03-26 13:42 ` Nick Piggin 2005-03-26 15:52 ` Russell King 1 sibling, 1 reply; 24+ messages in thread From: Nick Piggin @ 2005-03-26 13:42 UTC (permalink / raw) To: Russell King; +Cc: Hugh Dickins, akpm, davem, tony.luck, benh, ak, linux-kernel [-- Attachment #1: Type: text/plain, Size: 3350 bytes --] Russell King wrote: > On Sat, Mar 26, 2005 at 01:06:47PM +1100, Nick Piggin wrote: > >>The reject should be confined to include/asm-ia64, so it will still >>work for you. > > > I guess I should've tried a little harder last night then. Sorry. > Well I only said that just in case you had time to try testing again. I wouldn't expect you to go hunting through rejects and various kernel trees to try to get a working patch ;) > >>But I've put a clean rollup of all Hugh's patches here in case you'd >>like to try it. >> >>http://www.kerneltrap.org/~npiggin/freepgt-2.6.12-rc1.patch > > > This works fine on ARM with high vectors. With low vectors (located in > the 1st page of virtual memory space) I get: > > kernel BUG at mm/mmap.c:1934! > Unable to handle kernel NULL pointer dereference at virtual address 00000000 > pgd = c1e88000 > [00000000] *pgd=c1e86031, *pte=c04440cf, *ppte=c044400e > Internal error: Oops: c1e8981f [#1] > Modules linked in: > CPU: 0 > PC is at __bug+0x40/0x54 > LR is at 0x1 > pc : [<c0223870>] lr : [<00000001>] Not tainted > sp : c1e7bd18 ip : 60000093 fp : c1e7bd28 > r10: c1f4b040 r9 : 00000006 r8 : c1f02ca0 > r7 : 00000000 r6 : 00000000 r5 : c015b8c0 r4 : 00000000 > r3 : 00000000 r2 : 00000000 r1 : 00000d4e r0 : 00000001 > Flags: nZCv IRQs on FIQs on Mode SVC_32 Segment user > Control: C1E8917F Table: C1E8917F DAC: 00000015 > Process init (pid: 235, stack limit = 0xc1e7a194) > Stack: (0xc1e7bd18 to 0xc1e7c000) > ... > Backtrace: > [<c0223830>] (__bug+0x0/0x54) from [<c02691d8>] (exit_mmap+0x154/0x168) > r4 = C1E7BD3C > [<c0269084>] (exit_mmap+0x0/0x168) from [<c02358c8>] (mmput+0x40/0xc0) > r7 = C015B8C0 r6 = C015B8C0 r5 = 00000000 r4 = C015B8C0 > [<c0235888>] (mmput+0x0/0xc0) from [<c027ecec>] (exec_mmap+0xec/0x134) > r4 = C015B6A0 > [<c027ec00>] (exec_mmap+0x0/0x134) from [<c027f234>] (flush_old_exec+0x4c8/0x6e4) > r7 = C012B940 r6 = C1E7A000 r5 = C0498A00 r4 = 00000000 > [<c027ed6c>] (flush_old_exec+0x0/0x6e4) from [<c029d53c>] (load_elf_binary+0x5c0/0xdc0) > [<c029cf7c>] (load_elf_binary+0x0/0xdc0) from [<c027f6e0>] (search_binary_handler+0xa0/0x244) > [<c027f640>] (search_binary_handler+0x0/0x244) from [<c029c4e8>] (load_script+0x224/0x22c) > [<c029c2c4>] (load_script+0x0/0x22c) from [<c027f6e0>] (search_binary_handler+0xa0/0x244) > r6 = C1E7A000 r5 = C015E400 r4 = C03EC2B4 > [<c027f640>] (search_binary_handler+0x0/0x244) from [<c027f9b8>] (do_execve+0x134/0x1f8) > [<c027f884>] (do_execve+0x0/0x1f8) from [<c02223f8>] (sys_execve+0x3c/0x5c) > [<c02223bc>] (sys_execve+0x0/0x5c) from [<c021dca0>] (ret_fast_syscall+0x0/0x2c) > r7 = 0000000B r6 = BED6AA74 r5 = BED6AB00 r4 = 0200F008 > Code: 1b0051b8 e59f0014 eb0051b6 e3a03000 (e5833000) > > In this case, we have a page which must be kept mapped at virtual address > 0, which means the first entry in the L1 page table must always exist. > However, user threads start from 0x8000, which is also mapped via the > first entry in the L1 page table. > > At a guess, I'd imagine that we're freeing the first L1 page table entry, > thereby causing mm->nr_ptes to become -1. > > I'll do some debugging and try to work out if that (or exactly what's) > going on. > OK, thanks that would be good. You could well be right in your analysis. May I suggest a possible avenue of investigation: [-- Attachment #2: freepgt-ARM-fix.patch --] [-- Type: text/plain, Size: 1224 bytes --] Index: linux-2.6/mm/mmap.c =================================================================== --- linux-2.6.orig/mm/mmap.c 2005-03-26 23:47:51.000000000 +1100 +++ linux-2.6/mm/mmap.c 2005-03-27 00:41:00.000000000 +1100 @@ -1612,6 +1612,8 @@ static void unmap_vma_list(struct mm_str validate_mm(mm); } +#define FIRST_USER_ADDRESS (FIRST_USER_PGD_NR * PGDIR_SIZE) + /* * Get rid of page table information in the indicated region. * @@ -1630,7 +1632,7 @@ static void unmap_region(struct mm_struc tlb = tlb_gather_mmu(mm, 0); unmap_vmas(&tlb, mm, vma, start, end, &nr_accounted, NULL); vm_unacct_memory(nr_accounted); - free_pgtables(&tlb, vma, prev? prev->vm_end: 0, + free_pgtables(&tlb, vma, prev? prev->vm_end: FIRST_USER_ADDRESS, next? next->vm_start: 0); tlb_finish_mmu(tlb, start, end); spin_unlock(&mm->page_table_lock); @@ -1910,7 +1912,7 @@ void exit_mmap(struct mm_struct *mm) /* Use -1 here to ensure all VMAs in the mm are unmapped */ end = unmap_vmas(&tlb, mm, vma, 0, -1, &nr_accounted, NULL); vm_unacct_memory(nr_accounted); - free_pgtables(&tlb, vma, 0, 0); + free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, 0); tlb_finish_mmu(tlb, 0, end); mm->mmap = mm->mmap_cache = NULL; ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/6] freepgt: free_pgtables shakeup 2005-03-26 13:42 ` Nick Piggin @ 2005-03-26 15:52 ` Russell King 2005-03-27 3:41 ` Nick Piggin 2005-03-30 16:30 ` Hugh Dickins 0 siblings, 2 replies; 24+ messages in thread From: Russell King @ 2005-03-26 15:52 UTC (permalink / raw) To: Nick Piggin; +Cc: Hugh Dickins, akpm, davem, tony.luck, benh, ak, linux-kernel On Sun, Mar 27, 2005 at 12:42:56AM +1100, Nick Piggin wrote: > OK, thanks that would be good. You could well be right in your analysis. > May I suggest a possible avenue of investigation: Yes, this patch seems to also be required, otherwise I see: free_pgtables : floor 40015000 ceiling 4001b000 free_pgd_range: floor 40015000 ceiling 4001b000 addr 40015000 end 40016000 free_pgtables : floor 40015000 ceiling 4001b000 free_pgd_range: floor 40015000 ceiling 4001b000 addr 40015000 end 40016000 free_pgtables : floor 40015000 ceiling 4001b000 free_pgd_range: floor 40015000 ceiling 4001b000 addr 40015000 end 40016000 free_pgtables : floor 00000000 ceiling 00000000 free_pgd_range: floor 00000000 ceiling 40000000 addr 00008000 end 0001d000 free_pud_range: floor 00000000 ceiling 40000000 addr 00000000 end 00200000 free_pmd_range: floor 00000000 ceiling 40000000 addr 00000000 end 00200000 free_pgd_range: floor 00000000 ceiling bef4e000 addr 40000000 end 4012d000 free_pud_range: floor 00000000 ceiling bef4e000 addr 40000000 end 40200000 free_pmd_range: floor 00000000 ceiling bef4e000 addr 40000000 end 40200000 free_pgd_range: floor 00000000 ceiling 00000000 addr bef4e000 end bef63000 free_pud_range: floor 00000000 ceiling 00000000 addr bee00000 end bf000000 free_pmd_range: floor 00000000 ceiling 00000000 addr bee00000 end bf000000 exit_mmap: nr_ptes -1 The above is with my fix to ARMs get_pgd_slow, which shows that we accidentally freed the first entry in the L1 page table. With my fix and your patch, low-vectored ARMs work again. I don't think it'll be invasive to push my get_pgd_slow() fix before these freepgt patches appear. For the record, this is the patch I'm using at present. With a bit more effort, I could probably eliminate pmd_alloc (and therefore the unnecessary spinlocking) here. diff -up -x BitKeeper -x ChangeSet -x SCCS -x _xlk -x *.orig -x *.rej orig/arch/arm/mm/mm-armv.c linux/arch/arm/mm/mm-armv.c --- orig/arch/arm/mm/mm-armv.c Sat Mar 19 11:20:01 2005 +++ linux/arch/arm/mm/mm-armv.c Sat Mar 26 15:51:57 2005 @@ -160,6 +160,8 @@ pgd_t *get_pgd_slow(struct mm_struct *mm init_pgd = pgd_offset_k(0); if (!vectors_high()) { + struct page *new; + /* * This lock is here just to satisfy pmd_alloc and pte_lock */ @@ -170,12 +172,16 @@ pgd_t *get_pgd_slow(struct mm_struct *mm * contains the machine vectors. */ new_pmd = pmd_alloc(mm, new_pgd, 0); + spin_unlock(&mm->page_table_lock); if (!new_pmd) goto no_pmd; - new_pte = pte_alloc_map(mm, new_pmd, 0); - if (!new_pte) + new = pte_alloc_one(mm, 0); + if (!new) goto no_pte; + inc_page_state(nr_page_table_pages); + pmd_populate(mm, new_pmd, new); + new_pte = pte_offset_map(new_pmd, 0); init_pmd = pmd_offset(init_pgd, 0); init_pte = pte_offset_map_nested(init_pmd, 0); @@ -197,16 +203,9 @@ pgd_t *get_pgd_slow(struct mm_struct *mm return new_pgd; no_pte: - spin_unlock(&mm->page_table_lock); pmd_free(new_pmd); - free_pages((unsigned long)new_pgd, 2); - return NULL; - no_pmd: - spin_unlock(&mm->page_table_lock); free_pages((unsigned long)new_pgd, 2); - return NULL; - no_pgd: return NULL; } -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 Serial core ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/6] freepgt: free_pgtables shakeup 2005-03-26 15:52 ` Russell King @ 2005-03-27 3:41 ` Nick Piggin 2005-03-27 7:57 ` Russell King 2005-03-30 16:30 ` Hugh Dickins 1 sibling, 1 reply; 24+ messages in thread From: Nick Piggin @ 2005-03-27 3:41 UTC (permalink / raw) To: Russell King; +Cc: Hugh Dickins, akpm, davem, tony.luck, benh, ak, linux-kernel Russell King wrote: > On Sun, Mar 27, 2005 at 12:42:56AM +1100, Nick Piggin wrote: > >>OK, thanks that would be good. You could well be right in your analysis. >>May I suggest a possible avenue of investigation: > > > Yes, this patch seems to also be required, otherwise I see: > [...] OK. > > The above is with my fix to ARMs get_pgd_slow, which shows that we > accidentally freed the first entry in the L1 page table. With my > fix and your patch, low-vectored ARMs work again. > > I don't think it'll be invasive to push my get_pgd_slow() fix before > these freepgt patches appear. For the record, this is the patch I'm > using at present. With a bit more effort, I could probably eliminate > pmd_alloc (and therefore the unnecessary spinlocking) here. > Seems OK if you're happy with it. Is this going to leak "nr_page_table_pages" too, though? Hmm... no, because free_pgd_slow decrements it? In that case, can you have free_pgd_slow also decrement nr_ptes, instead of your below patch? ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/6] freepgt: free_pgtables shakeup 2005-03-27 3:41 ` Nick Piggin @ 2005-03-27 7:57 ` Russell King 2005-03-27 18:17 ` David S. Miller 0 siblings, 1 reply; 24+ messages in thread From: Russell King @ 2005-03-27 7:57 UTC (permalink / raw) To: Nick Piggin; +Cc: Hugh Dickins, akpm, davem, tony.luck, benh, ak, linux-kernel On Sun, Mar 27, 2005 at 01:41:46PM +1000, Nick Piggin wrote: > Hmm... no, because free_pgd_slow decrements it? In that case, can > you have free_pgd_slow also decrement nr_ptes, instead of your > below patch? Unfortunately not - free_pgd_slow doesn't have any knowledge about the mm_struct that the pgd was associated with. Also remember that nr_ptes wasn't incremented in get_pgd_slow() for this page table either, and this is the "bug" fix. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 Serial core ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/6] freepgt: free_pgtables shakeup 2005-03-27 7:57 ` Russell King @ 2005-03-27 18:17 ` David S. Miller 2005-03-28 7:51 ` Russell King 0 siblings, 1 reply; 24+ messages in thread From: David S. Miller @ 2005-03-27 18:17 UTC (permalink / raw) To: Russell King; +Cc: nickpiggin, hugh, akpm, tony.luck, benh, ak, linux-kernel On Sun, 27 Mar 2005 08:57:25 +0100 Russell King <rmk+lkml@arm.linux.org.uk> wrote: > Unfortunately not - free_pgd_slow doesn't have any knowledge about the > mm_struct that the pgd was associated with. You could store the mm pointer in the page struct of the pgd, we used to that before set_pte_at() existed on sparc64 and ppc64 for pte level tables. page->mapping and page->index are basically free game for tracking information assosciated with page table chunks. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/6] freepgt: free_pgtables shakeup 2005-03-27 18:17 ` David S. Miller @ 2005-03-28 7:51 ` Russell King 2005-03-28 18:47 ` David S. Miller 0 siblings, 1 reply; 24+ messages in thread From: Russell King @ 2005-03-28 7:51 UTC (permalink / raw) To: David S. Miller; +Cc: nickpiggin, hugh, akpm, tony.luck, benh, ak, linux-kernel On Sun, Mar 27, 2005 at 10:17:39AM -0800, David S. Miller wrote: > On Sun, 27 Mar 2005 08:57:25 +0100 > Russell King <rmk+lkml@arm.linux.org.uk> wrote: > > > Unfortunately not - free_pgd_slow doesn't have any knowledge about the > > mm_struct that the pgd was associated with. > > You could store the mm pointer in the page struct of the > pgd, we used to that before set_pte_at() existed on > sparc64 and ppc64 for pte level tables. > > page->mapping and page->index are basically free game for > tracking information assosciated with page table chunks. Why would I want to do this, given that decrementing mm->nr_ptes in free_pgd_slow() would make it negative ? Am I missing something obvious? -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 Serial core ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/6] freepgt: free_pgtables shakeup 2005-03-28 7:51 ` Russell King @ 2005-03-28 18:47 ` David S. Miller 0 siblings, 0 replies; 24+ messages in thread From: David S. Miller @ 2005-03-28 18:47 UTC (permalink / raw) To: Russell King; +Cc: nickpiggin, hugh, akpm, tony.luck, benh, ak, linux-kernel On Mon, 28 Mar 2005 08:51:36 +0100 Russell King <rmk+lkml@arm.linux.org.uk> wrote: > Why would I want to do this, given that decrementing mm->nr_ptes in > free_pgd_slow() would make it negative ? Am I missing something > obvious? You were saying there was no way to figure out which mm is assosociate a particular pgd, and I'm merely showing you how you can in fact do it. :-) ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/6] freepgt: free_pgtables shakeup 2005-03-26 15:52 ` Russell King 2005-03-27 3:41 ` Nick Piggin @ 2005-03-30 16:30 ` Hugh Dickins 1 sibling, 0 replies; 24+ messages in thread From: Hugh Dickins @ 2005-03-30 16:30 UTC (permalink / raw) To: Russell King; +Cc: Nick Piggin, akpm, davem, tony.luck, benh, ak, linux-kernel On Sat, 26 Mar 2005, Russell King wrote: > > I don't think it'll be invasive to push my get_pgd_slow() fix before > these freepgt patches appear. For the record, this is the patch I'm > using at present. With a bit more effort, I could probably eliminate > pmd_alloc (and therefore the unnecessary spinlocking) here. Your get_pgd_slow patch looks good to me. Yes, it slightly increases the assumptions here about what is done in common, to the extent of a pmd_populate, but even the nr_page_table_pages adjustment just nicely balances what you were already having to do in free_pgd_slow. Sorry for dumping you suddenly into this with my BUG_ON(mm->nr_ptes), but I think we all agree it's a worthwhile check now. And sorry for being so slow to respond, but I needed to think through what's right for your case. I'll write separately about Nick's FIRST_USER_ADDRESS patch, I'm still puzzled, and not quite happy with that one. Hugh ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2005-03-30 23:43 UTC | newest] Thread overview: 24+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-03-24 1:00 [PATCH 0/6] freepgt: free_pgtables shakeup Luck, Tony 2005-03-24 1:07 ` Nick Piggin -- strict thread matches above, loose matches on Subject: below -- 2005-03-23 17:10 Hugh Dickins 2005-03-23 19:57 ` David S. Miller 2005-03-24 0:26 ` Nick Piggin 2005-03-24 5:44 ` David S. Miller 2005-03-30 19:30 ` Hugh Dickins 2005-03-30 23:40 ` Nick Piggin 2005-03-25 21:22 ` Russell King 2005-03-26 2:06 ` Nick Piggin 2005-03-26 11:35 ` Russell King 2005-03-26 13:37 ` Russell King 2005-03-26 13:51 ` Nick Piggin 2005-03-26 15:03 ` Russell King 2005-03-30 17:00 ` Hugh Dickins 2005-03-30 17:39 ` Russell King 2005-03-26 13:42 ` Nick Piggin 2005-03-26 15:52 ` Russell King 2005-03-27 3:41 ` Nick Piggin 2005-03-27 7:57 ` Russell King 2005-03-27 18:17 ` David S. Miller 2005-03-28 7:51 ` Russell King 2005-03-28 18:47 ` David S. Miller 2005-03-30 16:30 ` Hugh Dickins
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox