* Re: I just got got another Oops [not found] ` <200903121431.49437.gene.heskett@gmail.com> @ 2009-03-16 2:55 ` KAMEZAWA Hiroyuki 2009-03-16 3:22 ` KAMEZAWA Hiroyuki 2009-03-16 8:03 ` BUG?: PAGE_FLAGS_CHECK_AT_PREP seems to be cleared too early (Was " KAMEZAWA Hiroyuki 0 siblings, 2 replies; 8+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-03-16 2:55 UTC (permalink / raw) To: Gene Heskett Cc: David Newall, linux-kernel@vger.kernel.org, linux-mm@kvack.org On Thu, 12 Mar 2009 14:31:49 -0400 Gene Heskett <gene.heskett@gmail.com> wrote: > Mar 12 14:15:02 coyote kernel: [ 2656.832669] > Mar 12 14:15:02 coyote kernel: [ 2656.832672] Pid: 18877, comm: kmail Not tainted (2.6.29-rc7 #5) System Product Name > Mar 12 14:15:02 coyote kernel: [ 2656.832675] EIP: 0060:[<c046520b>] EFLAGS: 00210202 CPU: 0 > Mar 12 14:15:02 coyote kernel: [ 2656.832678] EIP is at get_page_from_freelist+0x24b/0x4c0 > Mar 12 14:15:02 coyote kernel: [ 2656.832680] EAX: ffffffff EBX: 80004000 ECX: 00000001 EDX: 00000002 > Mar 12 14:15:02 coyote kernel: [ 2656.832682] ESI: c28fc260 EDI: 00000000 EBP: f2168d5c ESP: f2168cfc > Mar 12 14:15:02 coyote kernel: [ 2656.832684] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 > Mar 12 14:15:02 coyote kernel: [ 2656.832686] Process kmail (pid: 18877, ti=f2168000 task=f22018b0 task.ti=f2168000) > Mar 12 14:15:02 coyote kernel: [ 2656.832688] Stack: > Mar 12 14:15:02 coyote kernel: [ 2656.832689] 00000002 00000044 c28fc060 00000000 f1463ca4 c0744b80 c06d6480 00000002 > Mar 12 14:15:02 coyote kernel: [ 2656.832693] 00000000 00000000 001201d2 00000002 00200246 00000001 c06d6900 00000100 > Mar 12 14:15:02 coyote kernel: [ 2656.832698] 00000000 80000000 c06d7484 c06d6480 c06d6480 c06d6480 f22018b0 00000129 Added linux-mm to CC: 22a9: 8b 1e mov (%esi),%ebx #ebx=80004000 = page->flags 22ab: 89 f2 mov %esi,%edx #remember "page" 22ad: 8b 46 08 mov 0x8(%esi),%eax #esi+8=-1 page->mapcount 22b0: 8b 7e 10 mov 0x10(%esi),%edi #esi+16=0 page->mapping 22b3: f6 c7 40 test $0x40,%bh 22b6: 74 03 je 22bb <get_page_from_freelist+0x24b> 22b8: 8b 56 0c mov 0xc(%esi),%edx #page = page->first_page 22bb: 8b 4a 04 mov 0x4(%edx),%ecx #page->_count Thank you for disassemble list, from above.... In prep_new_page() 610 static int prep_new_page(struct page *page, int order, gfp_t gfp_flags) 611 { 612 if (unlikely(page_mapcount(page) | 613 (page->mapping != NULL) | 614 (page_count(page) != 0) | 615 (page->flags & PAGE_FLAGS_CHECK_AT_PREP))) 616 bad_page(page); page->mapping = NULL, (VALID) page->mapcount = -1 (VALID) page->count ==> NULL access because PageTail() is set, see below. (Note: from .config, CONFIG_PAGEFLAGS_EXTENDED is set.) == 288 static inline int page_count(struct page *page) 289 { 290 return atomic_read(&compound_head(page)->_count); 291 } 281 static inline struct page *compound_head(struct page *page) 282 { 283 if (unlikely(PageTail(page))) 284 return page->first_page; 285 return page; 286 } == PageTail() is true (this is invalid) and page->first_page contains obsolete data. But, here, PG_tail should not be there... Hmm ? Regards, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: I just got got another Oops 2009-03-16 2:55 ` I just got got another Oops KAMEZAWA Hiroyuki @ 2009-03-16 3:22 ` KAMEZAWA Hiroyuki 2009-03-16 8:03 ` BUG?: PAGE_FLAGS_CHECK_AT_PREP seems to be cleared too early (Was " KAMEZAWA Hiroyuki 1 sibling, 0 replies; 8+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-03-16 3:22 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Gene Heskett, David Newall, linux-kernel@vger.kernel.org, linux-mm@kvack.org On Mon, 16 Mar 2009 11:55:09 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > On Thu, 12 Mar 2009 14:31:49 -0400 > Gene Heskett <gene.heskett@gmail.com> wrote: > > > Mar 12 14:15:02 coyote kernel: [ 2656.832669] > > Mar 12 14:15:02 coyote kernel: [ 2656.832672] Pid: 18877, comm: kmail Not tainted (2.6.29-rc7 #5) System Product Name > > Mar 12 14:15:02 coyote kernel: [ 2656.832675] EIP: 0060:[<c046520b>] EFLAGS: 00210202 CPU: 0 > > Mar 12 14:15:02 coyote kernel: [ 2656.832678] EIP is at get_page_from_freelist+0x24b/0x4c0 > > Mar 12 14:15:02 coyote kernel: [ 2656.832680] EAX: ffffffff EBX: 80004000 ECX: 00000001 EDX: 00000002 > > Mar 12 14:15:02 coyote kernel: [ 2656.832682] ESI: c28fc260 EDI: 00000000 EBP: f2168d5c ESP: f2168cfc > > Mar 12 14:15:02 coyote kernel: [ 2656.832684] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 > > Mar 12 14:15:02 coyote kernel: [ 2656.832686] Process kmail (pid: 18877, ti=f2168000 task=f22018b0 task.ti=f2168000) > > Mar 12 14:15:02 coyote kernel: [ 2656.832688] Stack: > > Mar 12 14:15:02 coyote kernel: [ 2656.832689] 00000002 00000044 c28fc060 00000000 f1463ca4 c0744b80 c06d6480 00000002 > > Mar 12 14:15:02 coyote kernel: [ 2656.832693] 00000000 00000000 001201d2 00000002 00200246 00000001 c06d6900 00000100 > > Mar 12 14:15:02 coyote kernel: [ 2656.832698] 00000000 80000000 c06d7484 c06d6480 c06d6480 c06d6480 f22018b0 00000129 > > Added linux-mm to CC: > > 22a9: 8b 1e mov (%esi),%ebx #ebx=80004000 = page->flags > 22ab: 89 f2 mov %esi,%edx #remember "page" > 22ad: 8b 46 08 mov 0x8(%esi),%eax #esi+8=-1 page->mapcount > 22b0: 8b 7e 10 mov 0x10(%esi),%edi #esi+16=0 page->mapping > 22b3: f6 c7 40 test $0x40,%bh > 22b6: 74 03 je 22bb <get_page_from_freelist+0x24b> > 22b8: 8b 56 0c mov 0xc(%esi),%edx #page = page->first_page > 22bb: 8b 4a 04 mov 0x4(%edx),%ecx #page->_count > > Thank you for disassemble list, from above.... > > In prep_new_page() > 610 static int prep_new_page(struct page *page, int order, gfp_t gfp_flags) > 611 { > 612 if (unlikely(page_mapcount(page) | > 613 (page->mapping != NULL) | > 614 (page_count(page) != 0) | > 615 (page->flags & PAGE_FLAGS_CHECK_AT_PREP))) > 616 bad_page(page); > > page->mapping = NULL, (VALID) > page->mapcount = -1 (VALID) > page->count ==> NULL access because PageTail() is set, see below. > (Note: from .config, CONFIG_PAGEFLAGS_EXTENDED is set.) > > == > 288 static inline int page_count(struct page *page) > 289 { > 290 return atomic_read(&compound_head(page)->_count); > 291 } > > 281 static inline struct page *compound_head(struct page *page) > 282 { > 283 if (unlikely(PageTail(page))) > 284 return page->first_page; > 285 return page; > 286 } > == > > PageTail() is true (this is invalid) and page->first_page contains obsolete data. > But, here, PG_tail should not be there... > Gene-san, could you set CONFIG_DEBUG_VM (and other debug option ?) I think it can give us another view. -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* BUG?: PAGE_FLAGS_CHECK_AT_PREP seems to be cleared too early (Was Re: I just got got another Oops 2009-03-16 2:55 ` I just got got another Oops KAMEZAWA Hiroyuki 2009-03-16 3:22 ` KAMEZAWA Hiroyuki @ 2009-03-16 8:03 ` KAMEZAWA Hiroyuki 2009-03-16 21:44 ` Hugh Dickins 1 sibling, 1 reply; 8+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-03-16 8:03 UTC (permalink / raw) To: hugh@veritas.com Cc: Gene Heskett, David Newall, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org Hi, I'm sorry if I miss something.. >From this patch == http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=79f4b7bf393e67bbffec807cc68caaefc72b82ee == #define PAGE_FLAGS_CHECK_AT_PREP ((1 << NR_PAGEFLAGS) - 1) ... @@ -468,16 +467,16 @@ static inline int free_pages_check(struct page *page) (page_count(page) != 0) | (page->flags & PAGE_FLAGS_CHECK_AT_FREE))) .... + if (PageReserved(page)) + return 1; + if (page->flags & PAGE_FLAGS_CHECK_AT_PREP) + page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP; + return 0; } == PAGE_FLAGS_CHECK_AT_PREP is cleared by free_pages_check(). This means PG_head/PG_tail(PG_compound) flags are cleared here and Compound page will never be freed in sane way. Regards, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: BUG?: PAGE_FLAGS_CHECK_AT_PREP seems to be cleared too early (Was Re: I just got got another Oops 2009-03-16 8:03 ` BUG?: PAGE_FLAGS_CHECK_AT_PREP seems to be cleared too early (Was " KAMEZAWA Hiroyuki @ 2009-03-16 21:44 ` Hugh Dickins 2009-03-16 23:44 ` KAMEZAWA Hiroyuki 2009-03-20 15:23 ` Mel Gorman 0 siblings, 2 replies; 8+ messages in thread From: Hugh Dickins @ 2009-03-16 21:44 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Gene Heskett, David Newall, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org On Mon, 16 Mar 2009, KAMEZAWA Hiroyuki wrote: > Hi, > I'm sorry if I miss something.. I think it's me who missed something, and needs to say sorry. > > >From this patch > == > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=79f4b7bf393e67bbffec807cc68caaefc72b82ee > == > #define PAGE_FLAGS_CHECK_AT_PREP ((1 << NR_PAGEFLAGS) - 1) > ... > @@ -468,16 +467,16 @@ static inline int free_pages_check(struct page *page) > (page_count(page) != 0) | > (page->flags & PAGE_FLAGS_CHECK_AT_FREE))) > .... > + if (PageReserved(page)) > + return 1; > + if (page->flags & PAGE_FLAGS_CHECK_AT_PREP) > + page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP; > + return 0; > } > == > > PAGE_FLAGS_CHECK_AT_PREP is cleared by free_pages_check(). > > This means PG_head/PG_tail(PG_compound) flags are cleared here Yes, well spotted. How embarrassing. I must have got confused about when the checking occurred when freeing a compound page. > and Compound page will never be freed in sane way. But is that so? I'll admit I've not tried this out yet, but my understanding is that the Compound page actually gets freed fine: free_compound_page() should have passed the right order down, and this PAGE_FLAGS_CHECK_AT_PREP clearing should remove the Head/Tail/Compound flags - doesn't it all work out sanely, without any leaking? What goes missing is all the destroy_compound_page() checks: that's at present just dead code. There's several things we could do about this. 1. We could regard destroy_compound_page() as legacy debugging code from when compound pages were first introduced, and sanctify my error by removing it. Obviously that's appealing to me, makes me look like a prophet rather than idiot! That's not necessarily the right thing to do, but might appeal also to those cutting overhead from page_alloc.c. 2. We could do the destroy_compound_page() stuff in free_compound_page() before calling __free_pages_ok(), and add the Head/Tail/Compound flags into PAGE_FLAGS_CHECK_AT_FREE. That seems a more natural ordering to me, and would remove the PageCompound check from a hotter path; but I've a suspicion there's a good reason why it was not done that way, that I'm overlooking at this moment. 3. We can define a PAGE_FLAGS_CLEAR_AT_FREE which omits the Head/Tail/ Compound flags, and lets destroy_compound_page() be called as before where it's currently intended. What do you think? I suspect I'm going to have to spend tomorrow worrying about something else entirely, and won't return here until Wednesday. But as regards the original "I just got got another Oops": my bug that you point out here doesn't account for that, does it? It's still a mystery, isn't it, how the PageTail bit came to be set at that point? But that Oops does demonstrate that it's a very bad idea to be using the deceptive page_count() in those bad_page() checks: we need to be checking page->_count directly. And in looking at this, I notice something else to worry about: that CONFIG_HUGETLBFS prep_compound_gigantic_page(), which seems to exist for a more general case than "p = page + i" - what happens when such a gigantic page is freed, and arrives at the various "p = page + i" assumptions on the freeing path? Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: BUG?: PAGE_FLAGS_CHECK_AT_PREP seems to be cleared too early (Was Re: I just got got another Oops 2009-03-16 21:44 ` Hugh Dickins @ 2009-03-16 23:44 ` KAMEZAWA Hiroyuki 2009-03-20 15:23 ` Mel Gorman 1 sibling, 0 replies; 8+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-03-16 23:44 UTC (permalink / raw) To: Hugh Dickins Cc: Gene Heskett, David Newall, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org On Mon, 16 Mar 2009 21:44:11 +0000 (GMT) Hugh Dickins <hugh@veritas.com> wrote: > On Mon, 16 Mar 2009, KAMEZAWA Hiroyuki wrote: > > Hi, > > I'm sorry if I miss something.. > > I think it's me who missed something, and needs to say sorry. > > > > > >From this patch > > == > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=79f4b7bf393e67bbffec807cc68caaefc72b82ee > > == > > #define PAGE_FLAGS_CHECK_AT_PREP ((1 << NR_PAGEFLAGS) - 1) > > ... > > @@ -468,16 +467,16 @@ static inline int free_pages_check(struct page *page) > > (page_count(page) != 0) | > > (page->flags & PAGE_FLAGS_CHECK_AT_FREE))) > > .... > > + if (PageReserved(page)) > > + return 1; > > + if (page->flags & PAGE_FLAGS_CHECK_AT_PREP) > > + page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP; > > + return 0; > > } > > == > > > > PAGE_FLAGS_CHECK_AT_PREP is cleared by free_pages_check(). > > > > This means PG_head/PG_tail(PG_compound) flags are cleared here > > Yes, well spotted. How embarrassing. I must have got confused > about when the checking occurred when freeing a compound page. > > > and Compound page will never be freed in sane way. > > But is that so? I'll admit I've not tried this out yet, but my > understanding is that the Compound page actually gets freed fine: > free_compound_page() should have passed the right order down, and this > PAGE_FLAGS_CHECK_AT_PREP clearing should remove the Head/Tail/Compound > flags - doesn't it all work out sanely, without any leaking? > I think it works sanely and pages are freed in valid way. But bad_page() checking for compound pages (at destroy_compound_page()) is not done. > What goes missing is all the destroy_compound_page() checks: > that's at present just dead code. > > There's several things we could do about this. > > 1. We could regard destroy_compound_page() as legacy debugging code > from when compound pages were first introduced, and sanctify my error > by removing it. Obviously that's appealing to me, makes me look like > a prophet rather than idiot! That's not necessarily the right thing to > do, but might appeal also to those cutting overhead from page_alloc.c. > > 2. We could do the destroy_compound_page() stuff in free_compound_page() > before calling __free_pages_ok(), and add the Head/Tail/Compound flags > into PAGE_FLAGS_CHECK_AT_FREE. That seems a more natural ordering to > me, and would remove the PageCompound check from a hotter path; but > I've a suspicion there's a good reason why it was not done that way, > that I'm overlooking at this moment. > > 3. We can define a PAGE_FLAGS_CLEAR_AT_FREE which omits the Head/Tail/ > Compound flags, and lets destroy_compound_page() be called as before > where it's currently intended. > > What do you think? I suspect I'm going to have to spend tomorrow > worrying about something else entirely, and won't return here until > Wednesday. > I like "2". > But as regards the original "I just got got another Oops": my bug > that you point out here doesn't account for that, does it? It's > still a mystery, isn't it, how the PageTail bit came to be set at > that point? > I never find "who set it/where does it set". But page_alloc.c is an only file which modifies PageTail bit and I'm the last modifier of it. So, I'm intersted in this Oops. > But that Oops does demonstrate that it's a very bad idea to be using > the deceptive page_count() in those bad_page() checks: we need to be > checking page->_count directly. > I think so. > And in looking at this, I notice something else to worry about: > that CONFIG_HUGETLBFS prep_compound_gigantic_page(), which seems > to exist for a more general case than "p = page + i" - what happens > when such a gigantic page is freed, and arrives at the various > "p = page + i" assumptions on the freeing path? > Ah, I missed that path. I'll look into that today. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: BUG?: PAGE_FLAGS_CHECK_AT_PREP seems to be cleared too early (Was Re: I just got got another Oops 2009-03-16 21:44 ` Hugh Dickins 2009-03-16 23:44 ` KAMEZAWA Hiroyuki @ 2009-03-20 15:23 ` Mel Gorman 2009-03-22 14:55 ` Hugh Dickins 1 sibling, 1 reply; 8+ messages in thread From: Mel Gorman @ 2009-03-20 15:23 UTC (permalink / raw) To: Hugh Dickins Cc: KAMEZAWA Hiroyuki, Gene Heskett, David Newall, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org On Mon, Mar 16, 2009 at 09:44:11PM +0000, Hugh Dickins wrote: > On Mon, 16 Mar 2009, KAMEZAWA Hiroyuki wrote: > > Hi, > > I'm sorry if I miss something.. > > I think it's me who missed something, and needs to say sorry. > Joining the party late as always. > > > > >From this patch > > == > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=79f4b7bf393e67bbffec807cc68caaefc72b82ee > > == > > #define PAGE_FLAGS_CHECK_AT_PREP ((1 << NR_PAGEFLAGS) - 1) > > ... > > @@ -468,16 +467,16 @@ static inline int free_pages_check(struct page *page) > > (page_count(page) != 0) | > > (page->flags & PAGE_FLAGS_CHECK_AT_FREE))) > > .... > > + if (PageReserved(page)) > > + return 1; > > + if (page->flags & PAGE_FLAGS_CHECK_AT_PREP) > > + page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP; > > + return 0; > > } > > == > > > > PAGE_FLAGS_CHECK_AT_PREP is cleared by free_pages_check(). > > > > This means PG_head/PG_tail(PG_compound) flags are cleared here > > Yes, well spotted. How embarrassing. I must have got confused > about when the checking occurred when freeing a compound page. > I noticed this actually during the page allocator work and concluded it didn't matter because free_pages_check() cleared out the bits in the same way destroy_compound_page() did. The big difference was that destroy_compound_page() did a lot more sanity checks and was slower. I accidentally fixed this (because I implemented what I though things should be doing instead of what they were really doing) at one point and the overhead was so high of the debugging check that I just made a note to "deal with this later, it's weird looking but ok". > > and Compound page will never be freed in sane way. > > But is that so? I'll admit I've not tried this out yet, but my > understanding is that the Compound page actually gets freed fine: > free_compound_page() should have passed the right order down, and this > PAGE_FLAGS_CHECK_AT_PREP clearing should remove the Head/Tail/Compound > flags - doesn't it all work out sanely, without any leaking? > That's more or less what I thought. It can't leak but it's not what you expect from compound page destructors either. > What goes missing is all the destroy_compound_page() checks: > that's at present just dead code. > > There's several things we could do about this. > > 1. We could regard destroy_compound_page() as legacy debugging code > from when compound pages were first introduced, and sanctify my error > by removing it. Obviously that's appealing to me, makes me look like > a prophet rather than idiot! That's not necessarily the right thing to > do, but might appeal also to those cutting overhead from page_alloc.c. > The function is pretty heavy it has to be said. This would be my preferred option rather than making the allocator go slower. > 2. We could do the destroy_compound_page() stuff in free_compound_page() > before calling __free_pages_ok(), and add the Head/Tail/Compound flags > into PAGE_FLAGS_CHECK_AT_FREE. hat seems a more natural ordering to > me, and would remove the PageCompound check from a hotter path; but > I've a suspicion there's a good reason why it was not done that way, > that I'm overlooking at this moment. > I made this change and dropped it on the grounds it slowed things up so badly. It was part of allowing compound pages to be on the PCP lists. and ended up looking something like static void free_compound_page(struct page *page) { unsigned int order = compound_order(page); VM_BUG_ON(!PageCompound(page)); if (unlikely(destroy_compound_page(page, order))) return; __free_pages_ok(page, order); } > 3. We can define a PAGE_FLAGS_CLEAR_AT_FREE which omits the Head/Tail/ > Compound flags, and lets destroy_compound_page() be called as before > where it's currently intended. > Also did that, slowed things up. Tried fixing destroy_compound_page() but it was doing the same work as free_pages_check() so it also sucked. > What do you think? I suspect I'm going to have to spend tomorrow > worrying about something else entirely, and won't return here until > Wednesday. > > But as regards the original "I just got got another Oops": my bug > that you point out here doesn't account for that, does it? It's > still a mystery, isn't it, how the PageTail bit came to be set at > that point? > > But that Oops does demonstrate that it's a very bad idea to be using > the deceptive page_count() in those bad_page() checks: we need to be > checking page->_count directly. > > And in looking at this, I notice something else to worry about: > that CONFIG_HUGETLBFS prep_compound_gigantic_page(), which seems > to exist for a more general case than "p = page + i" - what happens > when such a gigantic page is freed, and arrives at the various > "p = page + i" assumptions on the freeing path? > That function is a bit confusing I'll give you that. Glancing through, what happens is that the destuctor gets replaced with a free_huge_page() which throws the page onto those free lists instead. It never hits the buddy lists on the grounds they can't handle orders >= MAX_ORDER. Out of curiousity, here is a patch that was intended for a totally different purpose but ended up forcing destroy_compound_page() to be used. It sucked so I ended up unfixing it again. It can't be merged as-is obviously but you'll see I redefined your flags a bit to exclude the compound flags and all that jazz. It could be rebased of course but it'd make more sense to have destroy_compound_page() that only does real work for DEBUG_VM as free_pages_check() already does enough work. ==== ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: BUG?: PAGE_FLAGS_CHECK_AT_PREP seems to be cleared too early (Was Re: I just got got another Oops 2009-03-20 15:23 ` Mel Gorman @ 2009-03-22 14:55 ` Hugh Dickins 2009-03-23 11:27 ` Mel Gorman 0 siblings, 1 reply; 8+ messages in thread From: Hugh Dickins @ 2009-03-22 14:55 UTC (permalink / raw) To: Mel Gorman Cc: KAMEZAWA Hiroyuki, Gene Heskett, David Newall, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org On Fri, 20 Mar 2009, Mel Gorman wrote: > On Mon, Mar 16, 2009 at 09:44:11PM +0000, Hugh Dickins wrote: > > On Mon, 16 Mar 2009, KAMEZAWA Hiroyuki wrote: > > > > > > PAGE_FLAGS_CHECK_AT_PREP is cleared by free_pages_check(). > > > This means PG_head/PG_tail(PG_compound) flags are cleared here > > > > Yes, well spotted. How embarrassing. I must have got confused > > about when the checking occurred when freeing a compound page. > > I noticed this actually during the page allocator work and concluded > it didn't matter because free_pages_check() cleared out the bits in > the same way destroy_compound_page() did. The big difference was that > destroy_compound_page() did a lot more sanity checks and was slower. > > I accidentally fixed this (because I implemented what I though things > should be doing instead of what they were really doing) at one point and > the overhead was so high of the debugging check that I just made a note to > "deal with this later, it's weird looking but ok". I'm surprised the overhead was so high: I'd have imagined that it was just treading on the same cachelines as free_pages_check() already did, doing rather less work. > > > > and Compound page will never be freed in sane way. > > > > But is that so? I'll admit I've not tried this out yet, but my > > understanding is that the Compound page actually gets freed fine: > > free_compound_page() should have passed the right order down, and this > > PAGE_FLAGS_CHECK_AT_PREP clearing should remove the Head/Tail/Compound > > flags - doesn't it all work out sanely, without any leaking? > > > > That's more or less what I thought. It can't leak but it's not what you > expect from compound page destructors either. > > > What goes missing is all the destroy_compound_page() checks: > > that's at present just dead code. > > > > There's several things we could do about this. > > > > 1. We could regard destroy_compound_page() as legacy debugging code > > from when compound pages were first introduced, and sanctify my error > > by removing it. Obviously that's appealing to me, makes me look like > > a prophet rather than idiot! That's not necessarily the right thing to > > do, but might appeal also to those cutting overhead from page_alloc.c. > > > > The function is pretty heavy it has to be said. This would be my preferred > option rather than making the allocator go slower. KAMEZAWA-san has voted for 2, so that was what I was intending to do. But if destroy_compound_page() really is costly, I'm happy to throw it out if others agree. I don't think it actually buys us a great deal: the main thing it checks (looking forward to the reuse of the pages, rather than just checking that what was set up is still there) is that the order being freed is not greater than the order that was allocated; but I think a PG_buddy or a page->_count in the excess should catch that in free_pages_check(). And we don't have any such check for the much(?) more common case of freeing a non-compound high-order page. > > > 2. We could do the destroy_compound_page() stuff in free_compound_page() > > before calling __free_pages_ok(), and add the Head/Tail/Compound flags > > into PAGE_FLAGS_CHECK_AT_FREE. hat seems a more natural ordering to > > me, and would remove the PageCompound check from a hotter path; but > > I've a suspicion there's a good reason why it was not done that way, > > that I'm overlooking at this moment. > > > > I made this change and dropped it on the grounds it slowed things up so > badly. It was part of allowing compound pages to be on the PCP lists. > and ended up looking something like > > static void free_compound_page(struct page *page) > { > unsigned int order = compound_order(page); > > VM_BUG_ON(!PageCompound(page)); > if (unlikely(destroy_compound_page(page, order))) > return; > > __free_pages_ok(page, order); > } Yes, that's how I was imagining it. But I think we'd also want to change hugetlb.c's set_compound_page_dtor(page, NULL) to set_compound_page_dtor(page, free_compound_page), wouldn't we? So far as I can see, that's the case that led the destroy call to be sited in __free_one_page(), but I still don't get why it was done that way. > > > 3. We can define a PAGE_FLAGS_CLEAR_AT_FREE which omits the Head/Tail/ > > Compound flags, and lets destroy_compound_page() be called as before > > where it's currently intended. > > > > Also did that, slowed things up. Tried fixing destroy_compound_page() > but it was doing the same work as free_pages_check() so it also sucked. > > > What do you think? I suspect I'm going to have to spend tomorrow > > worrying about something else entirely, and won't return here until > > Wednesday. > > > > But as regards the original "I just got got another Oops": my bug > > that you point out here doesn't account for that, does it? It's > > still a mystery, isn't it, how the PageTail bit came to be set at > > that point? > > > > But that Oops does demonstrate that it's a very bad idea to be using > > the deceptive page_count() in those bad_page() checks: we need to be > > checking page->_count directly. I notice your/Nick's 20/25 addresses this issue, good - I'd even be happy to see that change go into 2.6.29, though probably too late now (and it has been that way forever). But note, it does need one of us to replace the page_count in bad_page() in the same way, that's missing. I've given up on trying to understand how that PageTail is set in Gene's oops. I was thinking that it got left behind somewhere because of my destroy_compound_page sequence error, but I just can't see how: I wonder if it's just a corrupt bit in the struct. I don't now feel that we need to rush a fix for my error into 2.6.29: it does appear to be working nicely enough with that inadvertent change, and we're not yet agreed on which way to go from here. > > > > And in looking at this, I notice something else to worry about: > > that CONFIG_HUGETLBFS prep_compound_gigantic_page(), which seems > > to exist for a more general case than "p = page + i" - what happens > > when such a gigantic page is freed, and arrives at the various > > "p = page + i" assumptions on the freeing path? > > > > That function is a bit confusing I'll give you that. Glancing through, > what happens is that the destuctor gets replaced with a free_huge_page() > which throws the page onto those free lists instead. It never hits the > buddy lists on the grounds they can't handle orders >= MAX_ORDER. Ah yes, thanks a lot, I'd forgotten all that. Yes, there appear to be adequate MAX_ORDER checks in hugetlb.c to prevent that danger. > > Out of curiousity, My curiosity is very limited at the moment, I'm afraid I've not glanced. > here is a patch that was intended for a totally different > purpose but ended up forcing destroy_compound_page() to be used. It sucked > so I ended up unfixing it again. It can't be merged as-is obviously but > you'll see I redefined your flags a bit to exclude the compound flags > and all that jazz. It could be rebased of course but it'd make more sense > to have destroy_compound_page() that only does real work for DEBUG_VM as > free_pages_check() already does enough work. Yes, putting it under DEBUG_VM could be a compromise; though by now I've persuaded myself that it's of little value, and the times it might catch something would be out there without DEBUG_VM=y. Hugh > > ==== > > >From 93f9b5ebae0000ae3e7985c98680226f4bdd90a8 Mon Sep 17 00:00:00 2001 > From: Mel Gorman <mel@csn.ul.ie> > Date: Mon, 9 Mar 2009 11:56:56 +0000 > Subject: [PATCH 32/34] Allow compound pages to be stored on the PCP lists > > The SLUB allocator frees and allocates compound pages. The setup costs > for compound pages are noticeable in profiles and incur cache misses as > every struct page has to be checked and written. This patch allows > compound pages to be stored on the PCP list to save on teardown and > setup time. > > Signed-off-by: Mel Gorman <mel@csn.ul.ie> > --- > include/linux/page-flags.h | 4 ++- > mm/page_alloc.c | 56 ++++++++++++++++++++++++++++++------------- > 2 files changed, 42 insertions(+), 18 deletions(-) > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h > index 219a523..4177ec1 100644 > --- a/include/linux/page-flags.h > +++ b/include/linux/page-flags.h > @@ -388,7 +388,9 @@ static inline void __ClearPageTail(struct page *page) > * Pages being prepped should not have any flags set. It they are set, > * there has been a kernel bug or struct page corruption. > */ > -#define PAGE_FLAGS_CHECK_AT_PREP ((1 << NR_PAGEFLAGS) - 1) > +#define PAGE_FLAGS_CHECK_AT_PREP_BUDDY ((1 << NR_PAGEFLAGS) - 1) > +#define PAGE_FLAGS_CHECK_AT_PREP (((1 << NR_PAGEFLAGS) - 1) & \ > + ~(1 << PG_head | 1 << PG_tail)) > > #endif /* !__GENERATING_BOUNDS_H */ > #endif /* PAGE_FLAGS_H */ > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 253fd98..2941638 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -280,11 +280,7 @@ out: > * put_page() function. Its ->lru.prev holds the order of allocation. > * This usage means that zero-order pages may not be compound. > */ > - > -static void free_compound_page(struct page *page) > -{ > - __free_pages_ok(page, compound_order(page)); > -} > +static void free_compound_page(struct page *page); > > void prep_compound_page(struct page *page, unsigned long order) > { > @@ -553,7 +549,9 @@ static inline void __free_one_page(struct page *page, > zone->free_area[page_order(page)].nr_free++; > } > > -static inline int free_pages_check(struct page *page) > +/* Sanity check a free pages flags */ > +static inline int check_freepage_flags(struct page *page, > + unsigned long prepflags) > { > if (unlikely(page_mapcount(page) | > (page->mapping != NULL) | > @@ -562,8 +560,8 @@ static inline int free_pages_check(struct page *page) > bad_page(page); > return 1; > } > - if (page->flags & PAGE_FLAGS_CHECK_AT_PREP) > - page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP; > + if (page->flags & prepflags) > + page->flags &= ~prepflags; > return 0; > } > > @@ -602,6 +600,12 @@ static int free_pcppages_bulk(struct zone *zone, int count, > page = list_entry(list->prev, struct page, lru); > freed += 1 << page->index; > list_del(&page->lru); > + > + /* SLUB can have compound pages to the free lists */ > + if (unlikely(PageCompound(page))) > + if (unlikely(destroy_compound_page(page, page->index))) > + continue; > + > __free_one_page(page, zone, page->index, migratetype); > } > spin_unlock(&zone->lock); > @@ -633,8 +637,10 @@ static void __free_pages_ok(struct page *page, unsigned int order) > int bad = 0; > int clearMlocked = PageMlocked(page); > > + VM_BUG_ON(PageCompound(page)); > for (i = 0 ; i < (1 << order) ; ++i) > - bad += free_pages_check(page + i); > + bad += check_freepage_flags(page + i, > + PAGE_FLAGS_CHECK_AT_PREP_BUDDY); > if (bad) > return; > > @@ -738,8 +744,20 @@ static int prep_new_page(struct page *page, int order, gfp_t gfp_flags) > if (gfp_flags & __GFP_ZERO) > prep_zero_page(page, order, gfp_flags); > > - if (order && (gfp_flags & __GFP_COMP)) > - prep_compound_page(page, order); > + /* > + * If a compound page is requested, we have to check the page being > + * prepped. If it's already compound, we leave it alone. If a > + * compound page is not requested but the page being prepped is > + * compound, then it must be destroyed > + */ > + if (order) { > + if ((gfp_flags & __GFP_COMP) && !PageCompound(page)) > + prep_compound_page(page, order); > + > + if (!(gfp_flags & __GFP_COMP) && PageCompound(page)) > + if (unlikely(destroy_compound_page(page, order))) > + return 1; > + } > > return 0; > } > @@ -1105,14 +1123,9 @@ static void free_hot_cold_page(struct page *page, int order, int cold) > int migratetype; > int clearMlocked = PageMlocked(page); > > - /* SLUB can return lowish-order compound pages that need handling */ > - if (order > 0 && unlikely(PageCompound(page))) > - if (unlikely(destroy_compound_page(page, order))) > - return; > - > if (PageAnon(page)) > page->mapping = NULL; > - if (free_pages_check(page)) > + if (check_freepage_flags(page, PAGE_FLAGS_CHECK_AT_PREP)) > return; > > if (!PageHighMem(page)) { > @@ -1160,6 +1173,15 @@ out: > put_cpu(); > } > > +static void free_compound_page(struct page *page) > +{ > + unsigned int order = compound_order(page); > + if (order <= PAGE_ALLOC_COSTLY_ORDER) > + free_hot_cold_page(page, order, 0); > + else > + __free_pages_ok(page, order); > +} > + > void free_hot_page(struct page *page) > { > free_hot_cold_page(page, 0, 0); > -- > 1.5.6.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: BUG?: PAGE_FLAGS_CHECK_AT_PREP seems to be cleared too early (Was Re: I just got got another Oops 2009-03-22 14:55 ` Hugh Dickins @ 2009-03-23 11:27 ` Mel Gorman 0 siblings, 0 replies; 8+ messages in thread From: Mel Gorman @ 2009-03-23 11:27 UTC (permalink / raw) To: Hugh Dickins Cc: KAMEZAWA Hiroyuki, Gene Heskett, David Newall, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org On Sun, Mar 22, 2009 at 02:55:08PM +0000, Hugh Dickins wrote: > On Fri, 20 Mar 2009, Mel Gorman wrote: > > On Mon, Mar 16, 2009 at 09:44:11PM +0000, Hugh Dickins wrote: > > > On Mon, 16 Mar 2009, KAMEZAWA Hiroyuki wrote: > > > > > > > > PAGE_FLAGS_CHECK_AT_PREP is cleared by free_pages_check(). > > > > This means PG_head/PG_tail(PG_compound) flags are cleared here > > > > > > Yes, well spotted. How embarrassing. I must have got confused > > > about when the checking occurred when freeing a compound page. > > > > I noticed this actually during the page allocator work and concluded > > it didn't matter because free_pages_check() cleared out the bits in > > the same way destroy_compound_page() did. The big difference was that > > destroy_compound_page() did a lot more sanity checks and was slower. > > > > I accidentally fixed this (because I implemented what I though things > > should be doing instead of what they were really doing) at one point and > > the overhead was so high of the debugging check that I just made a note to > > "deal with this later, it's weird looking but ok". > > I'm surprised the overhead was so high: I'd have imagined that it > was just treading on the same cachelines as free_pages_check() > already did, doing rather less work. > My recollection is that it looked heavy because I was running netperf which was allocating on one CPU and freeing on the other, incurring a cache miss for every page it wrote to. This showed up heavily in profiles as you might imagine. However, this penalty would also be hit in free_pages_check() if destroy_compound_page() had not run so that skewed my perception. Still, we are running over the same array of pages twice, when we could have done it once. > > > > > > and Compound page will never be freed in sane way. > > > > > > But is that so? I'll admit I've not tried this out yet, but my > > > understanding is that the Compound page actually gets freed fine: > > > free_compound_page() should have passed the right order down, and this > > > PAGE_FLAGS_CHECK_AT_PREP clearing should remove the Head/Tail/Compound > > > flags - doesn't it all work out sanely, without any leaking? > > > > > > > That's more or less what I thought. It can't leak but it's not what you > > expect from compound page destructors either. > > > > > What goes missing is all the destroy_compound_page() checks: > > > that's at present just dead code. > > > > > > There's several things we could do about this. > > > > > > 1. We could regard destroy_compound_page() as legacy debugging code > > > from when compound pages were first introduced, and sanctify my error > > > by removing it. Obviously that's appealing to me, makes me look like > > > a prophet rather than idiot! That's not necessarily the right thing to > > > do, but might appeal also to those cutting overhead from page_alloc.c. > > > > > > > The function is pretty heavy it has to be said. This would be my preferred > > option rather than making the allocator go slower. > > KAMEZAWA-san has voted for 2, so that was what I was intending to do. > But if destroy_compound_page() really is costly, I'm happy to throw > it out if others agree. > I withdraw the objection on the grounds that 2 is the more correct option of the two. Even though it is heavy, it is also possible to hold compound pages on the PCP lists for a time and can be avoided in more ways than one. > I don't think it actually buys us a great deal: the main thing it checks > (looking forward to the reuse of the pages, rather than just checking > that what was set up is still there) is that the order being freed is > not greater than the order that was allocated; but I think a PG_buddy > or a page->_count in the excess should catch that in free_pages_check(). > > And we don't have any such check for the much(?) more common case of > freeing a non-compound high-order page. > We have a similar check sortof. It looks like this for (i = 0 ; i < (1 << order) ; ++i) bad += free_pages_check(page + i); This is where we are walking over the array twice. One way of fixing this would be to move the free_pages_check() higher in the call chain for high-order pages and have destroy_compound_page() first checkec the tail pages know where their head is and then call free_pages_check(). That should re-enable just the debugging check without too much cost. > > > 2. We could do the destroy_compound_page() stuff in free_compound_page() > > > before calling __free_pages_ok(), and add the Head/Tail/Compound flags > > > into PAGE_FLAGS_CHECK_AT_FREE. hat seems a more natural ordering to > > > me, and would remove the PageCompound check from a hotter path; but > > > I've a suspicion there's a good reason why it was not done that way, > > > that I'm overlooking at this moment. > > > > > > > I made this change and dropped it on the grounds it slowed things up so > > badly. It was part of allowing compound pages to be on the PCP lists. > > and ended up looking something like > > > > static void free_compound_page(struct page *page) > > { > > unsigned int order = compound_order(page); > > > > VM_BUG_ON(!PageCompound(page)); > > if (unlikely(destroy_compound_page(page, order))) > > return; > > > > __free_pages_ok(page, order); > > } > > Yes, that's how I was imagining it. But I think we'd also want > to change hugetlb.c's set_compound_page_dtor(page, NULL) to > set_compound_page_dtor(page, free_compound_page), wouldn't we? For full correctness, yes. As it is, it happens to work because the compound flags get cleared and destroy_compound_page() is little more than a debug check. > So far as I can see, that's the case that led the destroy call > to be sited in __free_one_page(), but I still don't get why it > was done that way. > I don't recall any reasoning but probably because it just worked. The first time huge pages had a destructor set to NULL was commit 41d78ba55037468e6c86c53e3076d1a74841de39 and it appears to have been carried forward ever since. > > > > > 3. We can define a PAGE_FLAGS_CLEAR_AT_FREE which omits the Head/Tail/ > > > Compound flags, and lets destroy_compound_page() be called as before > > > where it's currently intended. > > > > > > > Also did that, slowed things up. Tried fixing destroy_compound_page() > > but it was doing the same work as free_pages_check() so it also sucked. > > > > > What do you think? I suspect I'm going to have to spend tomorrow > > > worrying about something else entirely, and won't return here until > > > Wednesday. > > > > > > But as regards the original "I just got got another Oops": my bug > > > that you point out here doesn't account for that, does it? It's > > > still a mystery, isn't it, how the PageTail bit came to be set at > > > that point? > > > > > > But that Oops does demonstrate that it's a very bad idea to be using > > > the deceptive page_count() in those bad_page() checks: we need to be > > > checking page->_count directly. > > I notice your/Nick's 20/25 addresses this issue, good - I'd even be > happy to see that change go into 2.6.29, though probably too late now > (and it has been that way forever). Agreed, although that change is an accident essentially. It's not super clear to me it would help but I haven't looked closely enough at the oops to have a useful opinion. > But note, it does need one of us > to replace the page_count in bad_page() in the same way, that's missing. > > I've given up on trying to understand how that PageTail is set in > Gene's oops. I was thinking that it got left behind somewhere > because of my destroy_compound_page sequence error, but I just > can't see how: I wonder if it's just a corrupt bit in the struct. > I can't see how it can be left behind either as it should have been getting clobbered. If it was something like inappropriate buddy merging, a lot more would have broken. > I don't now feel that we need to rush a fix for my error into 2.6.29: > it does appear to be working nicely enough with that inadvertent > change, and we're not yet agreed on which way to go from here. > > > > > > > And in looking at this, I notice something else to worry about: > > > that CONFIG_HUGETLBFS prep_compound_gigantic_page(), which seems > > > to exist for a more general case than "p = page + i" - what happens > > > when such a gigantic page is freed, and arrives at the various > > > "p = page + i" assumptions on the freeing path? > > > > > > > That function is a bit confusing I'll give you that. Glancing through, > > what happens is that the destuctor gets replaced with a free_huge_page() > > which throws the page onto those free lists instead. It never hits the > > buddy lists on the grounds they can't handle orders >= MAX_ORDER. > > Ah yes, thanks a lot, I'd forgotten all that. Yes, there appear to > be adequate MAX_ORDER checks in hugetlb.c to prevent that danger. > > > > > Out of curiousity, > > My curiosity is very limited at the moment, I'm afraid I've not glanced. > No harm. > > here is a patch that was intended for a totally different > > purpose but ended up forcing destroy_compound_page() to be used. It sucked > > so I ended up unfixing it again. It can't be merged as-is obviously but > > you'll see I redefined your flags a bit to exclude the compound flags > > and all that jazz. It could be rebased of course but it'd make more sense > > to have destroy_compound_page() that only does real work for DEBUG_VM as > > free_pages_check() already does enough work. > > Yes, putting it under DEBUG_VM could be a compromise; though by now I've > persuaded myself that it's of little value, and the times it might catch > something would be out there without DEBUG_VM=y. > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2009-03-23 10:24 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <200903120133.11583.gene.heskett@gmail.com>
[not found] ` <49B8C98D.3020309@davidnewall.com>
[not found] ` <200903121431.49437.gene.heskett@gmail.com>
2009-03-16 2:55 ` I just got got another Oops KAMEZAWA Hiroyuki
2009-03-16 3:22 ` KAMEZAWA Hiroyuki
2009-03-16 8:03 ` BUG?: PAGE_FLAGS_CHECK_AT_PREP seems to be cleared too early (Was " KAMEZAWA Hiroyuki
2009-03-16 21:44 ` Hugh Dickins
2009-03-16 23:44 ` KAMEZAWA Hiroyuki
2009-03-20 15:23 ` Mel Gorman
2009-03-22 14:55 ` Hugh Dickins
2009-03-23 11:27 ` Mel Gorman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).