* Hugepage regression @ 2006-10-10 8:47 David Gibson 2006-10-10 9:04 ` Andrew Morton 0 siblings, 1 reply; 13+ messages in thread From: David Gibson @ 2006-10-10 8:47 UTC (permalink / raw) To: Chen, Kenneth W; +Cc: linux-kernel It seems commit fe1668ae5bf0145014c71797febd9ad5670d5d05 causes a hugepage regression. A git bisect points the finger at that commit for causing an oops in the 'alloc-instantiate-race' test from the libhugetlbfs testsuite. Still looking to determine the reason it breaks things. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Hugepage regression 2006-10-10 8:47 Hugepage regression David Gibson @ 2006-10-10 9:04 ` Andrew Morton 2006-10-10 9:15 ` David Gibson 0 siblings, 1 reply; 13+ messages in thread From: Andrew Morton @ 2006-10-10 9:04 UTC (permalink / raw) To: David Gibson; +Cc: Chen, Kenneth W, linux-kernel On Tue, 10 Oct 2006 18:47:48 +1000 David Gibson <david@gibson.dropbear.id.au> wrote: > It seems commit fe1668ae5bf0145014c71797febd9ad5670d5d05 causes a > hugepage regression. A git bisect points the finger at that commit > for causing an oops in the 'alloc-instantiate-race' test from the > libhugetlbfs testsuite. > > Still looking to determine the reason it breaks things. > It's assuming that unmap_hugepage_range() is always freeing these pages. If the page is shared by another mapping, bad things will happen: the threads fight over page->lru. Doing + if (page_count(page) == 1) list_add(&page->lru, &page_list); might help. But then we miss the tlb flush in rare racy conditions. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Hugepage regression 2006-10-10 9:04 ` Andrew Morton @ 2006-10-10 9:15 ` David Gibson 2006-10-10 17:35 ` Chen, Kenneth W 0 siblings, 1 reply; 13+ messages in thread From: David Gibson @ 2006-10-10 9:15 UTC (permalink / raw) To: Andrew Morton; +Cc: Kenneth W. Chen, linux-kernel On Tue, Oct 10, 2006 at 02:04:26AM -0700, Andrew Morton wrote: > On Tue, 10 Oct 2006 18:47:48 +1000 > David Gibson <david@gibson.dropbear.id.au> wrote: > > > It seems commit fe1668ae5bf0145014c71797febd9ad5670d5d05 causes a > > hugepage regression. A git bisect points the finger at that commit > > for causing an oops in the 'alloc-instantiate-race' test from the > > libhugetlbfs testsuite. > > > > Still looking to determine the reason it breaks things. > > > > It's assuming that unmap_hugepage_range() is always freeing these pages. > If the page is shared by another mapping, bad things will happen: the > threads fight over page->lru. > > Doing > > + if (page_count(page) == 1) > list_add(&page->lru, &page_list); > > might help. But then we miss the tlb flush in rare racy conditions. Well, there'd need to be an else doing a put_page(), too. Looks like the fundamental problem is that a list is not a suitable data structure for gathering here, since it's not truly local. We should probably change it to a small array, like in the normal tlb gather structure. If we run out of space we can force the tlb flush and keep going. Or we could just give up on lazy tlb flush for hugepages. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: Hugepage regression 2006-10-10 9:15 ` David Gibson @ 2006-10-10 17:35 ` Chen, Kenneth W 2006-10-10 19:14 ` Andrew Morton 2006-10-10 19:18 ` Hugh Dickins 0 siblings, 2 replies; 13+ messages in thread From: Chen, Kenneth W @ 2006-10-10 17:35 UTC (permalink / raw) To: 'David Gibson', Andrew Morton; +Cc: linux-kernel David Gibson wrote on Tuesday, October 10, 2006 2:16 AM > > > It seems commit fe1668ae5bf0145014c71797febd9ad5670d5d05 causes a > > > hugepage regression. A git bisect points the finger at that commit > > > for causing an oops in the 'alloc-instantiate-race' test from the > > > libhugetlbfs testsuite. > > > > > > Still looking to determine the reason it breaks things. > > > > > > > It's assuming that unmap_hugepage_range() is always freeing these pages. > > If the page is shared by another mapping, bad things will happen: the > > threads fight over page->lru. > > > > Doing > > > > + if (page_count(page) == 1) > > list_add(&page->lru, &page_list); > > > > might help. But then we miss the tlb flush in rare racy conditions. > > Well, there'd need to be an else doing a put_page(), too. > > Looks like the fundamental problem is that a list is not a suitable > data structure for gathering here, since it's not truly local. We > should probably change it to a small array, like in the normal tlb > gather structure. If we run out of space we can force the tlb flush > and keep going. With the pending shared page table for hugetlb currently sitting in -mm, we serialize the all hugetlb unmap with a per file i_mmap_lock. This race could well be solved by that pending patch? http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.19-rc1/2.6.19-rc1-mm1/broken-out/shared-page-table-for-hugetlb-page-v 4.patch - Ken ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Hugepage regression 2006-10-10 17:35 ` Chen, Kenneth W @ 2006-10-10 19:14 ` Andrew Morton 2006-10-10 19:18 ` Hugh Dickins 1 sibling, 0 replies; 13+ messages in thread From: Andrew Morton @ 2006-10-10 19:14 UTC (permalink / raw) To: Chen, Kenneth W; +Cc: 'David Gibson', linux-kernel On Tue, 10 Oct 2006 10:35:50 -0700 "Chen, Kenneth W" <kenneth.w.chen@intel.com> wrote: > David Gibson wrote on Tuesday, October 10, 2006 2:16 AM > > > > It seems commit fe1668ae5bf0145014c71797febd9ad5670d5d05 causes a > > > > hugepage regression. A git bisect points the finger at that commit > > > > for causing an oops in the 'alloc-instantiate-race' test from the > > > > libhugetlbfs testsuite. > > > > > > > > Still looking to determine the reason it breaks things. > > > > > > > > > > It's assuming that unmap_hugepage_range() is always freeing these pages. > > > If the page is shared by another mapping, bad things will happen: the > > > threads fight over page->lru. > > > > > > Doing > > > > > > + if (page_count(page) == 1) > > > list_add(&page->lru, &page_list); > > > > > > might help. But then we miss the tlb flush in rare racy conditions. > > > > Well, there'd need to be an else doing a put_page(), too. > > > > Looks like the fundamental problem is that a list is not a suitable > > data structure for gathering here, since it's not truly local. We > > should probably change it to a small array, like in the normal tlb > > gather structure. If we run out of space we can force the tlb flush > > and keep going. > > > With the pending shared page table for hugetlb currently sitting in -mm, > we serialize the all hugetlb unmap with a per file i_mmap_lock. This > race could well be solved by that pending patch? > > http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.19-rc1/2.6.19-rc1-mm1/broken-out/shared-page-table-for-hugetlb-page-v > 4.patch > We need something for 2.6.19 though. As David indicates, not using page->lru should fix it (pagevec_add, pagevec_release would suit). Or just a separate TBL invalidation per page. Is that likely to be particularly expensive? It's the first one which hurts? ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: Hugepage regression 2006-10-10 17:35 ` Chen, Kenneth W 2006-10-10 19:14 ` Andrew Morton @ 2006-10-10 19:18 ` Hugh Dickins 2006-10-10 19:30 ` Chen, Kenneth W 2006-10-10 23:34 ` Chen, Kenneth W 1 sibling, 2 replies; 13+ messages in thread From: Hugh Dickins @ 2006-10-10 19:18 UTC (permalink / raw) To: Chen, Kenneth W; +Cc: 'David Gibson', Andrew Morton, linux-kernel On Tue, 10 Oct 2006, Chen, Kenneth W wrote: > > With the pending shared page table for hugetlb currently sitting in -mm, > we serialize the all hugetlb unmap with a per file i_mmap_lock. This > race could well be solved by that pending patch? > > http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.19-rc1/2.6.19-rc1-mm1/broken-out/shared-page-table-for-hugetlb-page-v4.patch Hey, nice try, Ken! But I don't think we can let you sneak shared pagetables into 2.6.19 that way ;) Sorry for not noticing this bug in your original TLB flush fix, which had looked good to me. Yes, I'd expect your i_mmap_lock to solve the problem: and since you're headed in that direction anyway, it makes most sense to use that solution rather than get into defining arrays, or sacrificing the lazy flush, or risking page_count races. So please extract the __unmap_hugepage_range mods from your shared pagetable patch, and use that to fix the bug. But again, I protest the "if (vma->vm_file)" in your unmap_hugepage_range - how would a hugepage area ever have NULL vma->vm_file? Hugh ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: Hugepage regression 2006-10-10 19:18 ` Hugh Dickins @ 2006-10-10 19:30 ` Chen, Kenneth W 2006-10-10 20:10 ` Hugh Dickins 2006-10-10 23:34 ` Chen, Kenneth W 1 sibling, 1 reply; 13+ messages in thread From: Chen, Kenneth W @ 2006-10-10 19:30 UTC (permalink / raw) To: 'Hugh Dickins' Cc: 'David Gibson', Andrew Morton, linux-kernel Hugh Dickins wrote on Tuesday, October 10, 2006 12:18 PM > On Tue, 10 Oct 2006, Chen, Kenneth W wrote: > > > > With the pending shared page table for hugetlb currently sitting in -mm, > > we serialize the all hugetlb unmap with a per file i_mmap_lock. This > > race could well be solved by that pending patch? > > > > http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.19-rc1/2.6.19-rc1-mm1/broken-out/shared-page-table-for-hugetlb-page-v 4.patch > > Hey, nice try, Ken! But I don't think we can let you sneak shared > pagetables into 2.6.19 that way ;) It wasn't my intention to sneak in shared page table, though it does sort of look like so. > Yes, I'd expect your i_mmap_lock to solve the problem: and since > you're headed in that direction anyway, it makes most sense to use > that solution rather than get into defining arrays, or sacrificing > the lazy flush, or risking page_count races. > > So please extract the __unmap_hugepage_range mods from your shared > pagetable patch, and use that to fix the bug. OK, I was about to do so too. > But again, I protest > the "if (vma->vm_file)" in your unmap_hugepage_range - how would a > hugepage area ever have NULL vma->vm_file? It's coming from do_mmap_pgoff(), file->f_op->mmap can fail with error code (e.g. not enough hugetlb page) and in the error recovery path, it nulls out vma->vm_file first before calls down to unmap_region(). I asked that question before: can we reverse that order (call unmap_region and then nulls out vma->vmfile and fput)? unmap_and_free_vma: if (correct_wcount) atomic_inc(&inode->i_writecount); vma->vm_file = NULL; fput(file); /* Undo any partial mapping done by a device driver. */ unmap_region(mm, vma, prev, vma->vm_start, vma->vm_end); ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: Hugepage regression 2006-10-10 19:30 ` Chen, Kenneth W @ 2006-10-10 20:10 ` Hugh Dickins 2006-10-10 23:03 ` Chen, Kenneth W 0 siblings, 1 reply; 13+ messages in thread From: Hugh Dickins @ 2006-10-10 20:10 UTC (permalink / raw) To: Chen, Kenneth W; +Cc: 'David Gibson', Andrew Morton, linux-kernel On Tue, 10 Oct 2006, Chen, Kenneth W wrote: > Hugh Dickins wrote on Tuesday, October 10, 2006 12:18 PM > > > But again, I protest > > the "if (vma->vm_file)" in your unmap_hugepage_range - how would a > > hugepage area ever have NULL vma->vm_file? > > It's coming from do_mmap_pgoff(), file->f_op->mmap can fail with error > code (e.g. not enough hugetlb page) and in the error recovery path, it > nulls out vma->vm_file first before calls down to unmap_region(). I stand corrected: thanks. > I asked that question before: So you did, on Oct 2nd: sorry, that got lost amidst the overload in my mailbox, I've just forwarded it to myself again, to check later on what else I may have missed. (I am aware that I've still not scrutinized the patches you sent out a day or two later, now in -mm.) > can we reverse that order (call unmap_region > and then nulls out vma->vmfile and fput)? I'm pretty sure we cannot: the ordering is quite intentional, that if a driver ->mmap failed, then it'd be wrong to call down to driver in the unmap_region (if a driver is nicely behaved, that unmap_region shouldn't be unnecessary; but some do rely on us clearing ptes there). Okay, last refuge of all who've made a fool of themselves: may I ask you to add a comment in your unmap_hugepage_range, pointing to how the do_mmap_pgoff error case NULLifies vm_file? (Or else change hugetlbfs_file_mmap to set VM_HUGETLB only once it's succeeded: but that smacks of me refusing to accept I was wrong.) Hugh > > unmap_and_free_vma: > if (correct_wcount) > atomic_inc(&inode->i_writecount); > vma->vm_file = NULL; > fput(file); > > /* Undo any partial mapping done by a device driver. */ > unmap_region(mm, vma, prev, vma->vm_start, vma->vm_end); ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: Hugepage regression 2006-10-10 20:10 ` Hugh Dickins @ 2006-10-10 23:03 ` Chen, Kenneth W 2006-10-13 17:03 ` Hugh Dickins 0 siblings, 1 reply; 13+ messages in thread From: Chen, Kenneth W @ 2006-10-10 23:03 UTC (permalink / raw) To: 'Hugh Dickins' Cc: 'David Gibson', Andrew Morton, linux-kernel Hugh Dickins wrote on Tuesday, October 10, 2006 1:10 PM > > can we reverse that order (call unmap_region > > and then nulls out vma->vmfile and fput)? > > I'm pretty sure we cannot: the ordering is quite intentional, that if > a driver ->mmap failed, then it'd be wrong to call down to driver in > the unmap_region (if a driver is nicely behaved, that unmap_region > shouldn't be unnecessary; but some do rely on us clearing ptes there). Even not something like the following? I believe you that nullifying vma->vm_file can not be done after unmap_region(), I just want to make sure we are talking the same thing. It looks OK to me to defer the fput in the do_mmap_pgoff clean up path. --- ./mm/mmap.c.orig 2006-10-10 15:58:17.000000000 -0700 +++ ./mm/mmap.c 2006-10-10 15:59:02.000000000 -0700 @@ -1159,11 +1159,12 @@ out: unmap_and_free_vma: if (correct_wcount) atomic_inc(&inode->i_writecount); - vma->vm_file = NULL; - fput(file); /* Undo any partial mapping done by a device driver. */ unmap_region(mm, vma, prev, vma->vm_start, vma->vm_end); + + vma->vm_file = NULL; + fput(file); charged = 0; free_vma: kmem_cache_free(vm_area_cachep, vma); ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: Hugepage regression 2006-10-10 23:03 ` Chen, Kenneth W @ 2006-10-13 17:03 ` Hugh Dickins 0 siblings, 0 replies; 13+ messages in thread From: Hugh Dickins @ 2006-10-13 17:03 UTC (permalink / raw) To: Chen, Kenneth W; +Cc: 'David Gibson', Andrew Morton, linux-kernel On Tue, 10 Oct 2006, Chen, Kenneth W wrote: > Hugh Dickins wrote on Tuesday, October 10, 2006 1:10 PM > > > can we reverse that order (call unmap_region > > > and then nulls out vma->vmfile and fput)? > > > > I'm pretty sure we cannot: the ordering is quite intentional, that if > > a driver ->mmap failed, then it'd be wrong to call down to driver in > > the unmap_region (if a driver is nicely behaved, that unmap_region > > shouldn't be unnecessary; but some do rely on us clearing ptes there). Looking at it again, my explanation seems wrong: I can't see any danger of calling down to the _driver_ there (there's no remove_vma): rather, it's __remove_shared_vm_struct we're avoiding by setting vm_file NULL. > > Even not something like the following? I believe you that nullifying > vma->vm_file can not be done after unmap_region(), Yet in your patch below, you do nullify vm_file _after_ unmap_region. > I just want to make sure we are talking the same thing. So I'm not sure if we are! > It looks OK to me to defer the fput in the do_mmap_pgoff clean up path. Yes, it would be quite okay to defer the fput until after the unmap_region; but there's no point in making that change, is there? Hugh (sorry, I was off sick for a couple of days) > > > --- ./mm/mmap.c.orig 2006-10-10 15:58:17.000000000 -0700 > +++ ./mm/mmap.c 2006-10-10 15:59:02.000000000 -0700 > @@ -1159,11 +1159,12 @@ out: > unmap_and_free_vma: > if (correct_wcount) > atomic_inc(&inode->i_writecount); > - vma->vm_file = NULL; > - fput(file); > > /* Undo any partial mapping done by a device driver. */ > unmap_region(mm, vma, prev, vma->vm_start, vma->vm_end); > + > + vma->vm_file = NULL; > + fput(file); > charged = 0; > free_vma: > kmem_cache_free(vm_area_cachep, vma); > ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: Hugepage regression 2006-10-10 19:18 ` Hugh Dickins 2006-10-10 19:30 ` Chen, Kenneth W @ 2006-10-10 23:34 ` Chen, Kenneth W 2006-10-11 1:18 ` 'David Gibson' 1 sibling, 1 reply; 13+ messages in thread From: Chen, Kenneth W @ 2006-10-10 23:34 UTC (permalink / raw) To: 'Hugh Dickins' Cc: 'David Gibson', Andrew Morton, linux-kernel Hugh Dickins wrote on Tuesday, October 10, 2006 12:18 PM > Yes, I'd expect your i_mmap_lock to solve the problem: and since > you're headed in that direction anyway, it makes most sense to use > that solution rather than get into defining arrays, or sacrificing > the lazy flush, or risking page_count races. > > So please extract the __unmap_hugepage_range mods from your shared > pagetable patch, and use that to fix the bug. OK, here is a bug fix patch fixing earlier "bug fix" patch :-( [patch] hugetlb: fix linked list corruption in unmap_hugepage_range commit fe1668ae5bf0145014c71797febd9ad5670d5d05 causes kernel to oops with libhugetlbfs test suite. The problem is that hugetlb pages can be shared by multiple mappings. Multiple threads can fight over page->lru in the unmap path and bad things happen. We now serialize __unmap_hugepage_range to void concurrent linked list manipulation. Such serialization is also needed for shared page table page on hugetlb area. This patch will fixed the bug and also serve as a prepatch for shared page table. Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> --- ./fs/hugetlbfs/inode.c.orig 2006-10-05 10:25:20.000000000 -0700 +++ ./fs/hugetlbfs/inode.c 2006-10-10 14:27:48.000000000 -0700 @@ -293,7 +293,7 @@ hugetlb_vmtruncate_list(struct prio_tree if (h_vm_pgoff >= h_pgoff) v_offset = 0; - unmap_hugepage_range(vma, + __unmap_hugepage_range(vma, vma->vm_start + v_offset, vma->vm_end); } } --- ./mm/hugetlb.c.orig 2006-10-05 10:25:21.000000000 -0700 +++ ./mm/hugetlb.c 2006-10-10 14:27:48.000000000 -0700 @@ -356,8 +356,8 @@ nomem: return -ENOMEM; } -void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, - unsigned long end) +void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, + unsigned long end) { struct mm_struct *mm = vma->vm_mm; unsigned long address; @@ -398,6 +398,24 @@ void unmap_hugepage_range(struct vm_area } } +void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, + unsigned long end) +{ + /* + * It is undesirable to test vma->vm_file as it should be non-null + * for valid hugetlb area. However, vm_file will be NULL in the error + * cleanup path of do_mmap_pgoff. When hugetlbfs ->mmap method fails, + * do_mmap_pgoff() nullifies vma->vm_file before calling this function + * to clean up. Since no pte has actually been setup, it is safe to + * do nothing in this case. + */ + if (vma->vm_file) { + spin_lock(&vma->vm_file->f_mapping->i_mmap_lock); + __unmap_hugepage_range(vma, start, end); + spin_unlock(&vma->vm_file->f_mapping->i_mmap_lock); + } +} + static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, pte_t *ptep, pte_t pte) { --- ./include/linux/hugetlb.h.orig 2006-10-05 10:25:21.000000000 -0700 +++ ./include/linux/hugetlb.h 2006-10-10 13:08:48.000000000 -0700 @@ -17,6 +17,7 @@ int hugetlb_sysctl_handler(struct ctl_ta int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_area_struct *); int follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, struct page **, struct vm_area_struct **, unsigned long *, int *, int); void unmap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned long); +void __unmap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned long); int hugetlb_prefault(struct address_space *, struct vm_area_struct *); int hugetlb_report_meminfo(char *); int hugetlb_report_node_meminfo(int, char *); ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Hugepage regression 2006-10-10 23:34 ` Chen, Kenneth W @ 2006-10-11 1:18 ` 'David Gibson' 2006-10-11 2:47 ` Chen, Kenneth W 0 siblings, 1 reply; 13+ messages in thread From: 'David Gibson' @ 2006-10-11 1:18 UTC (permalink / raw) To: Chen, Kenneth W; +Cc: 'Hugh Dickins', Andrew Morton, linux-kernel On Tue, Oct 10, 2006 at 04:34:40PM -0700, Chen, Kenneth W wrote: > Hugh Dickins wrote on Tuesday, October 10, 2006 12:18 PM > > Yes, I'd expect your i_mmap_lock to solve the problem: and since > > you're headed in that direction anyway, it makes most sense to use > > that solution rather than get into defining arrays, or sacrificing > > the lazy flush, or risking page_count races. > > > > So please extract the __unmap_hugepage_range mods from your shared > > pagetable patch, and use that to fix the bug. > > > OK, here is a bug fix patch fixing earlier "bug fix" patch :-( > > > [patch] hugetlb: fix linked list corruption in unmap_hugepage_range > > commit fe1668ae5bf0145014c71797febd9ad5670d5d05 causes kernel to oops with > libhugetlbfs test suite. The problem is that hugetlb pages can be shared > by multiple mappings. Multiple threads can fight over page->lru in the unmap > path and bad things happen. We now serialize __unmap_hugepage_range to void > concurrent linked list manipulation. Such serialization is also needed for > shared page table page on hugetlb area. This patch will fixed the bug and > also serve as a prepatch for shared page table. Can I suggest that you put a big comment on the linked list declaration itself saying that you're relying on serialization here. Otherwise I'm worried someone will try to de-serialize it again, and break it without realizing. Given the number of people who failed to spot the problem with the patch the first time around.. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: Hugepage regression 2006-10-11 1:18 ` 'David Gibson' @ 2006-10-11 2:47 ` Chen, Kenneth W 0 siblings, 0 replies; 13+ messages in thread From: Chen, Kenneth W @ 2006-10-11 2:47 UTC (permalink / raw) To: 'David Gibson' Cc: 'Hugh Dickins', Andrew Morton, linux-kernel David Gibson wrote on Tuesday, October 10, 2006 6:18 PM > Can I suggest that you put a big comment on the linked list > declaration itself saying that you're relying on serialization here. > Otherwise I'm worried someone will try to de-serialize it again, and > break it without realizing. Given the number of people who failed to > spot the problem with the patch the first time around.. I'm not very good at writing comments, how about the following? Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> --- linus-2.6/mm/hugetlb.c.orig 2006-10-10 19:32:36.000000000 -0700 +++ linus-2.6/mm/hugetlb.c 2006-10-10 19:41:18.000000000 -0700 @@ -365,6 +365,11 @@ void __unmap_hugepage_range(struct vm_ar pte_t pte; struct page *page; struct page *tmp; + /* + * A page gathering list, protected by per file i_mmap_lock. The + * lock is used to avoid list corruption from multiple unmapping + * of the same page since we are using page->lru. + */ LIST_HEAD(page_list); WARN_ON(!is_vm_hugetlb_page(vma)); ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2006-10-13 17:03 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-10-10 8:47 Hugepage regression David Gibson 2006-10-10 9:04 ` Andrew Morton 2006-10-10 9:15 ` David Gibson 2006-10-10 17:35 ` Chen, Kenneth W 2006-10-10 19:14 ` Andrew Morton 2006-10-10 19:18 ` Hugh Dickins 2006-10-10 19:30 ` Chen, Kenneth W 2006-10-10 20:10 ` Hugh Dickins 2006-10-10 23:03 ` Chen, Kenneth W 2006-10-13 17:03 ` Hugh Dickins 2006-10-10 23:34 ` Chen, Kenneth W 2006-10-11 1:18 ` 'David Gibson' 2006-10-11 2:47 ` Chen, Kenneth W
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox