* truncate_list_pages() BUG and confusion
@ 2002-03-08 0:45 Dave Hansen
2002-03-08 1:08 ` Andrew Morton
0 siblings, 1 reply; 12+ messages in thread
From: Dave Hansen @ 2002-03-08 0:45 UTC (permalink / raw)
To: linux-kernel
in truncate_list_pages()
failed = TryLockPage(page);
So, the page is always locked here.
truncate_complete_page(page);
remove_inode_page(page);
if (!PageLocked(page))
PAGE_BUG(page);
page_cache_release(page);
calls __free_pages_ok(page, 0);
if (PageLocked(page))
BUG();
It is a BUG if the page is not locked in remove_inode_page() and also a
bug if it IS locked in __free_pages_ok(). What am I missing?
ksymoopsed output follows:
kernel BUG at page_alloc.c:109!
invalid operand: 0000
CPU: 1
EIP: 0010:[<c012f27c>] Not tainted
EFLAGS: 00010202
eax: 01000001 ebx: c13ba15c ecx: c13ba15c edx: c13ba15c
esi: 00000000 edi: db5aff20 ebp: 00000000 esp: db5afe90
ds: 0018 es: 0018 ss: 0018
Stack: c13ba15c 00000000 db5aff20 00000000 c13ba15c c13ba15c 00000000
c13ba15c
00000000 db5aff20 00000000 c012717a c13ba15c 00000000 c012fb05
c13ba15c
c01271c2 c13ba15c c13ba15c c0127326 c13ba15c 00000000 db5aff20
00000018
Call Trace: [<c012717a>] [<c012fb05>] [<c01271c2>] [<c0127326>]
[<c01273db>]
[<c0125192>] [<c012a49d>] [<c01361fb>] [<c0108a23>]
Code: 0f 0b 6d 00 60 89 24 c0 8b 4c 24 10 8b 41 18 a8 40 74 08 0f
>>EIP; c012f27c <__free_pages_ok+6c/29c> <=====
Trace; c012717a <do_flushpage+26/2c>
Trace; c012fb05 <page_cache_release+2d/30>
Trace; c01271c2 <truncate_complete_page+42/48>
Trace; c0127326 <truncate_list_pages+15e/1c4>
Trace; c01273db <truncate_inode_pages+4f/80>
Trace; c0125192 <vmtruncate+be/154>
Trace; c012a49d <generic_file_write+62d/6f8>
Trace; c01361fb <sys_write+8f/10c>
Trace; c0108a23 <syscall_call+7/b>
Code; c012f27c <__free_pages_ok+6c/29c>
00000000 <_EIP>:
Code; c012f27c <__free_pages_ok+6c/29c> <=====
0: 0f 0b ud2a <=====
Code; c012f27e <__free_pages_ok+6e/29c>
2: 6d insl (%dx),%es:(%edi)
Code; c012f27f <__free_pages_ok+6f/29c>
3: 00 60 89 add %ah,0xffffff89(%eax)
Code; c012f282 <__free_pages_ok+72/29c>
6: 24 c0 and $0xc0,%al
Code; c012f284 <__free_pages_ok+74/29c>
8: 8b 4c 24 10 mov 0x10(%esp,1),%ecx
Code; c012f288 <__free_pages_ok+78/29c>
c: 8b 41 18 mov 0x18(%ecx),%eax
Code; c012f28b <__free_pages_ok+7b/29c>
f: a8 40 test $0x40,%al
Code; c012f28d <__free_pages_ok+7d/29c>
11: 74 08 je 1b <_EIP+0x1b> c012f297
<__free_pages_ok+87/29c>
Code; c012f28f <__free_pages_ok+7f/29c>
13: 0f 00 00 sldt (%eax)
--
Dave Hansen
haveblue@us.ibm.com
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: truncate_list_pages() BUG and confusion 2002-03-08 0:45 truncate_list_pages() BUG and confusion Dave Hansen @ 2002-03-08 1:08 ` Andrew Morton 2002-03-08 2:54 ` Dave Hansen ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Andrew Morton @ 2002-03-08 1:08 UTC (permalink / raw) To: Dave Hansen; +Cc: linux-kernel Dave Hansen wrote: > > in truncate_list_pages() > > failed = TryLockPage(page); > > So, the page is always locked here. > > truncate_complete_page(page); > remove_inode_page(page); > if (!PageLocked(page)) > PAGE_BUG(page); > > page_cache_release(page); > calls __free_pages_ok(page, 0); > if (PageLocked(page)) > BUG(); > > It is a BUG if the page is not locked in remove_inode_page() and also a > bug if it IS locked in __free_pages_ok(). What am I missing? the page_cache_release() in truncate_complete_page() is just dropping the reference count against the page which is due to that page's presence in the pagecache. We just took it out, so we drop the reference. Note that the caller of truncate_complete_page() also took a reference to the page, and undoes that reference *after* unlocking the page. This additional reference will prevent __free_pages_ok() from being called by truncate_complete_page(). > ksymoopsed output follows: > > kernel BUG at page_alloc.c:109! Now how did you manage that? Looks like someone re-locked the page after truncate_list_pages unlocked it. - ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: truncate_list_pages() BUG and confusion 2002-03-08 1:08 ` Andrew Morton @ 2002-03-08 2:54 ` Dave Hansen 2002-03-08 2:55 ` Martin J. Bligh 2002-03-08 21:14 ` Martin J. Bligh 2 siblings, 0 replies; 12+ messages in thread From: Dave Hansen @ 2002-03-08 2:54 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel Andrew Morton wrote: >>ksymoopsed output follows: >> >>kernel BUG at page_alloc.c:109! >> > > Now how did you manage that? Looks like someone re-locked > the page after truncate_list_pages unlocked it. I stopped getting oopses from the dbench 64 on a small partition, and started getting these BUG()s instead. The oopses I _was_ getting were because create_buffers() was returning a buffer chain with one of the bh->b_this_page entries set to 0x01010101. The funny part was that it was always the same number, not a memory address, or NULL, it was always 0x01010101! The next time through the loop, "bh->b_end_io = NULL;" was blowing up. (no surprise there :) I put some code into create_buffers() to look for that magic number, but stopped getting the oopses after I added a couple of if( XX == 0x0101010 ) panic("foo"). I now can't recreate the original oopses. The disassembly of create_empty_buffers() looked screwy the first time I looked, so I'm guessing that I just encountered a transient gcc bug or something. But, I'm still getting those damn "__block_prepare_write: zeroing uptodate buffer!" messages, in addition to the new BUG(), which never happened before. The hardware is extremely stable, and 2.4 doesn't show any of these problems. Have you had anyone else try the dbench torture test on other SMP machines? -- Dave Hansen haveblue@us.ibm.com ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: truncate_list_pages() BUG and confusion 2002-03-08 1:08 ` Andrew Morton 2002-03-08 2:54 ` Dave Hansen @ 2002-03-08 2:55 ` Martin J. Bligh 2002-03-08 3:02 ` Andrew Morton 2002-03-08 3:04 ` Dave Hansen 2002-03-08 21:14 ` Martin J. Bligh 2 siblings, 2 replies; 12+ messages in thread From: Martin J. Bligh @ 2002-03-08 2:55 UTC (permalink / raw) To: Andrew Morton, Dave Hansen; +Cc: linux-kernel --On Thursday, March 07, 2002 5:08 PM -0800 Andrew Morton <akpm@zip.com.au> wrote: > Dave Hansen wrote: >> >> in truncate_list_pages() >> >> failed = TryLockPage(page); >> >> So, the page is always locked here. >> >> truncate_complete_page(page); >> remove_inode_page(page); >> if (!PageLocked(page)) >> PAGE_BUG(page); >> >> page_cache_release(page); >> calls __free_pages_ok(page, 0); >> if (PageLocked(page)) >> BUG(); >> >> It is a BUG if the page is not locked in remove_inode_page() and also a >> bug if it IS locked in __free_pages_ok(). What am I missing? > > the page_cache_release() in truncate_complete_page() is just dropping > the reference count against the page which is due to that page's > presence in the pagecache. We just took it out, so we drop the reference. > > Note that the caller of truncate_complete_page() also took a reference to > the page, and undoes that reference *after* unlocking the page. This > additional reference will prevent __free_pages_ok() from being called > by truncate_complete_page(). > >> ksymoopsed output follows: >> >> kernel BUG at page_alloc.c:109! > > Now how did you manage that? Looks like someone re-locked > the page after truncate_list_pages unlocked it. Where did truncate_list_pages unlock it? It doesn't do that until after it's called truncate_complete_page, which (indirectly) frees the page ... ? Dave & I must be missing something, as this could never work the way we're reading it, but I can't see what ;-) M. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: truncate_list_pages() BUG and confusion 2002-03-08 2:55 ` Martin J. Bligh @ 2002-03-08 3:02 ` Andrew Morton 2002-03-08 3:04 ` Dave Hansen 1 sibling, 0 replies; 12+ messages in thread From: Andrew Morton @ 2002-03-08 3:02 UTC (permalink / raw) To: Martin J. Bligh; +Cc: Dave Hansen, linux-kernel "Martin J. Bligh" wrote: > > Where did truncate_list_pages unlock it? if (*partial && (offset + 1) == start) { truncate_partial_page(page, *partial); *partial = 0; } else truncate_complete_page(page); UnlockPage(page); } else wait_on_page(page); It's either unlocked directly, or we wait for whoever locked it (presumably I/O) to unlock it. - ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: truncate_list_pages() BUG and confusion 2002-03-08 2:55 ` Martin J. Bligh 2002-03-08 3:02 ` Andrew Morton @ 2002-03-08 3:04 ` Dave Hansen 1 sibling, 0 replies; 12+ messages in thread From: Dave Hansen @ 2002-03-08 3:04 UTC (permalink / raw) To: Martin J. Bligh; +Cc: Andrew Morton, linux-kernel Martin J. Bligh wrote: > Where did truncate_list_pages unlock it? It doesn't do that > until after it's called truncate_complete_page, which (indirectly) > frees the page ... ? Ahh, put_page_testzero #define put_page_testzero(p) atomic_dec_and_test(&(p)->count) It will only try to free if the page count is zero. The caller makes sure that this doesn't happen. if (!PageReserved(page) && put_page_testzero(page)) { if (PageLRU(page)) lru_cache_del(page); __free_pages_ok(page, 0); } -- Dave Hansen haveblue@us.ibm.com ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: truncate_list_pages() BUG and confusion 2002-03-08 1:08 ` Andrew Morton 2002-03-08 2:54 ` Dave Hansen 2002-03-08 2:55 ` Martin J. Bligh @ 2002-03-08 21:14 ` Martin J. Bligh 2002-03-08 21:53 ` Andrew Morton 2 siblings, 1 reply; 12+ messages in thread From: Martin J. Bligh @ 2002-03-08 21:14 UTC (permalink / raw) To: Andrew Morton, Dave Hansen; +Cc: linux-kernel OK, Dave and I went through this again, and we're both still confused. Hopefully you still have the patience to keep explaining this to us ;-) There must be something wrong here, otherwise we wouldn't get the panic. It looks like the real problem here is to do with the page's count rather than the locked stuff .... which fits in with the other panics Dave has been seeing. > the page_cache_release() in truncate_complete_page() is just dropping > the reference count against the page which is due to that page's > presence in the pagecache. We just took it out, so we drop the reference. > > Note that the caller of truncate_complete_page() also took a reference to > the page, and undoes that reference *after* unlocking the page. This > additional reference will prevent __free_pages_ok() from being called > by truncate_complete_page(). OK, I'd missed a whole bit here, which I now understand (or think I do ;-)). So truncate_list_pages does a page_cache_get(page), which increments the count. Nowhere does it decrement the count, or do an UnlockPage before it gets into page_cache_release, which looks like this: void page_cache_release(struct page *page) { if (!PageReserved(page) && put_page_testzero(page)) { if (PageLRU(page)) lru_cache_del(page); __free_pages_ok(page, 0); } } We enter page_cache_release with the supposedly locked, and its count non-zero (we incremented it). put_page_testzero does atomic_dec_and_test on count which says it returns true if the result is 0, or false for all other cases. So if nobody else was holding a reference to the page, we've decremented it's count to 0, and put_page_testzero returns 1. We then try to free the page. It's still locked. BUG. John Stultz also pointed out that on return to the flow of truncate_list_pages we call page_cache release again, which looks really odd. Something is wrong here ..... >> ksymoopsed output follows: >> >> kernel BUG at page_alloc.c:109! > > Now how did you manage that? Looks like someone re-locked > the page after truncate_list_pages unlocked it.> Where did truncate_list_pages unlock it? > > if (*partial && (offset + 1) == start) { > truncate_partial_page(page, *partial); > *partial = 0; > } else > truncate_complete_page(page); > > UnlockPage(page); > } else > wait_on_page(page); > > It's either unlocked directly, or we wait for whoever locked > it (presumably I/O) to unlock it. But that's all *after* we call truncate_complete_page, so it's irrelevant, no ?? M. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: truncate_list_pages() BUG and confusion 2002-03-08 21:14 ` Martin J. Bligh @ 2002-03-08 21:53 ` Andrew Morton 2002-03-08 22:13 ` Dave Hansen 2002-03-09 0:04 ` Martin J. Bligh 0 siblings, 2 replies; 12+ messages in thread From: Andrew Morton @ 2002-03-08 21:53 UTC (permalink / raw) To: Martin J. Bligh; +Cc: Dave Hansen, linux-kernel "Martin J. Bligh" wrote: > > ... > So truncate_list_pages does a page_cache_get(page), which increments > the count. Nowhere does it decrement the count, or do an UnlockPage > before it gets into page_cache_release, which looks like this: well... truncate_list_pages _does_ unlock the page. Let's snip some bits: truncate_list_pages() { ... /* Bump the refcount. From now until the matching * page_cache_release(), the page *cannot* be freed, * because we hold a reference */ page_cache_get(page); lock_page(page); truncate_complete_page(page); --> { remove_inode_page(page); /* Remove from pagecache */ /* * We now drop the refcount which is due to * presence in the pagecache. This can't free * the page, because truncate_list_pages() holds a reference */ page_cache_release(page); } <--- UnlockPage(); /* Remember, wait_on_page() is an indirect unlock, too */ /* * Now truncate_list_pages() drops its temporary reference. * The page is unlocked. If this was the last reference * (and it usually will be) then the page will now be freed. */ page_cache_release(page); } > void page_cache_release(struct page *page) > { > if (!PageReserved(page) && put_page_testzero(page)) { > if (PageLRU(page)) > lru_cache_del(page); > __free_pages_ok(page, 0); > } > } > > We enter page_cache_release with the supposedly locked, and its count > non-zero (we incremented it). put_page_testzero does atomic_dec_and_test > on count which says it returns true if the result is 0, or false for all other cases. > > So if nobody else was holding a reference to the page, we've decremented > it's count to 0, and put_page_testzero returns 1. We then try to free the page. > It's still locked. BUG. If the page_cache_release() in truncate_complete_page() is calling __free_pages_ok() then something really horrid has happened. Yes, it could be that the page has had its refcount incorrectly decremented somewhere. Or the page wasn't in the pagecache at all. Is this happening with the dbench/ENOSPC/1k blocksize testcase? In that case, something is clearly going wrong with the association of buffers against the page. And the presence of buffers at page->buffers _also_ contributes to page->count. So there's a commonality here. Possibly we forgot to increment the page count when we added buffers, or we bogusly thought the page had buffers when it didn't, and then dropped the page refcount when we thought we removed the buffers which weren't really there. - ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: truncate_list_pages() BUG and confusion 2002-03-08 21:53 ` Andrew Morton @ 2002-03-08 22:13 ` Dave Hansen 2002-03-08 22:35 ` Andrew Morton 2002-03-09 0:04 ` Martin J. Bligh 1 sibling, 1 reply; 12+ messages in thread From: Dave Hansen @ 2002-03-08 22:13 UTC (permalink / raw) To: Andrew Morton; +Cc: Martin J. Bligh, linux-kernel Andrew Morton wrote: > If the page_cache_release() in truncate_complete_page() is calling > __free_pages_ok() then something really horrid has happened. Something horrid _IS_ happening: kernel BUG at page_alloc.c:109! CPU: 1 EIP: 0010:[<c012d9ac>] Not tainted Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010202 eax: 01000005 ebx: c1471560 ecx: c1471560 edx: c1471560 esi: 00000000 edi: 00000000 ebp: 00000000 esp: da569ed0 ds: 0018 es: 0018 ss: 0018 Stack: c1471560 00000000 00000000 00000000 c1471560 c1471560 00000000 c1471560 00000000 00000000 00000000 c012574a c1471560 00000000 c012e235 c1471560 c0125792 c1471560 c1471560 c01258f6 c1471560 00000000 da569f60 00000000 Call Trace: [<c012574a>] [<c012e235>] [<c0125792>] [<c01258f6>] [<c01259ab>] [<c0148df5>] [<c0146697>] [<c0140635>] [<c01070a3>] Code: 0f 0b 6d 00 80 6d 23 c0 8b 4c 24 10 8b 41 18 a8 40 74 08 0f >>EIP; c012d9ac <__free_pages_ok+6c/29c> <===== Trace; c012574a <do_flushpage+26/2c> Trace; c012e235 <page_cache_release+2d/30> Trace; c0125792 <truncate_complete_page+42/48> Trace; c01258f6 <truncate_list_pages+15e/1c4> Trace; c01259ab <truncate_inode_pages+4f/80> Trace; c0148df5 <iput+b5/1d8> Trace; c0146697 <dput+e7/148> Trace; c0140635 <sys_unlink+b1/124> Trace; c01070a3 <syscall_call+7/b> Code; c012d9ac <__free_pages_ok+6c/29c> 00000000 <_EIP>: Code; c012d9ac <__free_pages_ok+6c/29c> <===== 0: 0f 0b ud2a <===== Code; c012d9ae <__free_pages_ok+6e/29c> 2: 6d insl (%dx),%es:(%edi) Code; c012d9af <__free_pages_ok+6f/29c> 3: 00 80 6d 23 c0 8b add %al,0x8bc0236d(%eax) Code; c012d9b5 <__free_pages_ok+75/29c> 9: 4c dec %esp Code; c012d9b6 <__free_pages_ok+76/29c> a: 24 10 and $0x10,%al Code; c012d9b8 <__free_pages_ok+78/29c> c: 8b 41 18 mov 0x18(%ecx),%eax Code; c012d9bb <__free_pages_ok+7b/29c> f: a8 40 test $0x40,%al Code; c012d9bd <__free_pages_ok+7d/29c> 11: 74 08 je 1b <_EIP+0x1b> c012d9c7 <__free_pages_ok+87/29c> Code; c012d9bf <__free_pages_ok+7f/29c> 13: 0f 00 00 sldt (%eax) > Is this happening with the dbench/ENOSPC/1k blocksize testcase? yes For this particular one, dbench ran without failure, but while rm'ing the leftover directories, it BUGged. -- Dave Hansen haveblue@us.ibm.com ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: truncate_list_pages() BUG and confusion 2002-03-08 22:13 ` Dave Hansen @ 2002-03-08 22:35 ` Andrew Morton 0 siblings, 0 replies; 12+ messages in thread From: Andrew Morton @ 2002-03-08 22:35 UTC (permalink / raw) To: Dave Hansen; +Cc: Martin J. Bligh, linux-kernel Dave Hansen wrote: > > Andrew Morton wrote: > > If the page_cache_release() in truncate_complete_page() is calling > > __free_pages_ok() then something really horrid has happened. > > Something horrid _IS_ happening: > > kernel BUG at page_alloc.c:109! > ... > > Is this happening with the dbench/ENOSPC/1k blocksize testcase? > yes > > For this particular one, dbench ran without failure, but while rm'ing > the leftover directories, it BUGged. Are you also seeing the 'VFS: free of freed buffer' message? There were some changes made in both 2.4 and 2.5 in this area. These were to handle the situation where block allocation fails when we're partway through mapping a page to disk. Here's the 2.5 diff. A `patch -R' of this will revert those changes. There are three independent parts to this - generic_file_write(), block_write_full_page() and __block_prepare_write(). If reversion of this entire patch makes the problem go away, then would you be able to narrow it down to one of those three functions? --- linux-2.5.5-pre1/fs/buffer.c Wed Feb 13 15:22:00 2002 +++ 25/fs/buffer.c Thu Feb 14 22:43:33 2002 @@ -1431,6 +1431,7 @@ static int __block_write_full_page(struc int err, i; unsigned long block; struct buffer_head *bh, *head; + int need_unlock; if (!PageLocked(page)) BUG(); @@ -1486,8 +1487,34 @@ static int __block_write_full_page(struc return 0; out: + /* + * ENOSPC, or some other error. We may already have added some + * blocks to the file, so we need to write these out to avoid + * exposing stale data. + */ ClearPageUptodate(page); - UnlockPage(page); + bh = head; + need_unlock = 1; + /* Recovery: lock and submit the mapped buffers */ + do { + if (buffer_mapped(bh)) { + lock_buffer(bh); + set_buffer_async_io(bh); + need_unlock = 0; + } + bh = bh->b_this_page; + } while (bh != head); + do { + struct buffer_head *next = bh->b_this_page; + if (buffer_mapped(bh)) { + set_bit(BH_Uptodate, &bh->b_state); + clear_bit(BH_Dirty, &bh->b_state); + submit_bh(WRITE, bh); + } + bh = next; + } while (bh != head); + if (need_unlock) + UnlockPage(page); return err; } @@ -1518,6 +1545,7 @@ static int __block_prepare_write(struct continue; if (block_start >= to) break; + clear_bit(BH_New, &bh->b_state); if (!buffer_mapped(bh)) { err = get_block(inode, block, bh, 1); if (err) @@ -1552,12 +1580,35 @@ static int __block_prepare_write(struct */ while(wait_bh > wait) { wait_on_buffer(*--wait_bh); - err = -EIO; if (!buffer_uptodate(*wait_bh)) - goto out; + return -EIO; } return 0; out: + /* + * Zero out any newly allocated blocks to avoid exposing stale + * data. If BH_New is set, we know that the block was newly + * allocated in the above loop. + */ + bh = head; + block_start = 0; + do { + block_end = block_start+blocksize; + if (block_end <= from) + goto next_bh; + if (block_start >= to) + break; + if (buffer_new(bh)) { + if (buffer_uptodate(bh)) + printk(KERN_ERR "%s: zeroing uptodate buffer!\n", __FUNCTION__); + memset(kaddr+block_start, 0, bh->b_size); + set_bit(BH_Uptodate, &bh->b_state); + mark_buffer_dirty(bh); + } +next_bh: + block_start = block_end; + bh = bh->b_this_page; + } while (bh != head); return err; } --- linux-2.5.5-pre1/mm/filemap.c Sun Feb 10 22:00:36 2002 +++ 25/mm/filemap.c Thu Feb 14 22:42:15 2002 @@ -2999,7 +2999,7 @@ generic_file_write(struct file *file,con kaddr = kmap(page); status = mapping->a_ops->prepare_write(file, page, offset, offset+bytes); if (status) - goto unlock; + goto sync_failure; page_fault = __copy_from_user(kaddr+offset, buf, bytes); flush_dcache_page(page); status = mapping->a_ops->commit_write(file, page, offset, offset+bytes); @@ -3024,6 +3024,7 @@ unlock: if (status < 0) break; } while (count); +done: *ppos = pos; if (cached_page) @@ -3045,6 +3046,18 @@ out: fail_write: status = -EFAULT; goto unlock; + +sync_failure: + /* + * If blocksize < pagesize, prepare_write() may have instantiated a + * few blocks outside i_size. Trim these off again. + */ + kunmap(page); + UnlockPage(page); + page_cache_release(page); + if (pos + bytes > inode->i_size) + vmtruncate(inode, inode->i_size); + goto done; o_direct: written = generic_file_direct_IO(WRITE, file, (char *) buf, count, pos); - ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: truncate_list_pages() BUG and confusion 2002-03-08 21:53 ` Andrew Morton 2002-03-08 22:13 ` Dave Hansen @ 2002-03-09 0:04 ` Martin J. Bligh 2002-03-09 0:17 ` Andrew Morton 1 sibling, 1 reply; 12+ messages in thread From: Martin J. Bligh @ 2002-03-09 0:04 UTC (permalink / raw) To: Andrew Morton; +Cc: Dave Hansen, linux-kernel >> void page_cache_release(struct page *page) >> { >> if (!PageReserved(page) && put_page_testzero(page)) { >> if (PageLRU(page)) >> lru_cache_del(page); >> __free_pages_ok(page, 0); >> } >> } >> >> We enter page_cache_release with the supposedly locked, and its count >> non-zero (we incremented it). put_page_testzero does atomic_dec_and_test >> on count which says it returns true if the result is 0, or false for all other cases. >> >> So if nobody else was holding a reference to the page, we've decremented >> it's count to 0, and put_page_testzero returns 1. We then try to free the page. >> It's still locked. BUG. > > If the page_cache_release() in truncate_complete_page() is calling > __free_pages_ok() then something really horrid has happened. That's exactly what's happening. > Yes, it could be that the page has had its refcount incorrectly > decremented somewhere. I don't see you need that to make this bug happen. Say count is 0 when we enter truncate_list_pages. We increment it. It's now 1 when we call page_cache_release. put_page_testzero dec's it back to 0, and returns true. We do a __free_pages_ok. Page is still locked. BUG. No other process, nothing funky happening, no races, no other refcount decrements. Or that's the way I read it. > Or the page wasn't in the pagecache at all. The only thing I can think of was the pagecount shouldn't have been 0 to start with (or the code path we're reading is wrong ;-) ) M. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: truncate_list_pages() BUG and confusion 2002-03-09 0:04 ` Martin J. Bligh @ 2002-03-09 0:17 ` Andrew Morton 0 siblings, 0 replies; 12+ messages in thread From: Andrew Morton @ 2002-03-09 0:17 UTC (permalink / raw) To: Martin J. Bligh; +Cc: Dave Hansen, linux-kernel "Martin J. Bligh" wrote: > > .. > > If the page_cache_release() in truncate_complete_page() is calling > > __free_pages_ok() then something really horrid has happened. > > That's exactly what's happening. > > > Yes, it could be that the page has had its refcount incorrectly > > decremented somewhere. > > I don't see you need that to make this bug happen. > Say count is 0 when we enter truncate_list_pages. It mustn't be zero! The page is *known* to be in the pagecache, so its count must be at least 1. > We increment it. > It's now 1 when we call page_cache_release. > put_page_testzero dec's it back to 0, and returns true. > We do a __free_pages_ok. Page is still locked. BUG. > > No other process, nothing funky happening, no races, no other > refcount decrements. Or that's the way I read it. > > > Or the page wasn't in the pagecache at all. > > The only thing I can think of was the pagecount shouldn't have been 0 > to start with (or the code path we're reading is wrong ;-) ) yup. You can stick an if (page_count(page) == 0) BUG(); at the top of truncate_list_pages... Actually it might help to add some extra checks to page_cache_release(): 1: If the page is in the pagecache, that's one. 2: If the page has buffers, that's another. so void page_cache_release(struct page *page) { + int expected = 0; + + if (page->buffers) + expected++; + if (page->mapping) { + if (!list_empty(page->list)) + BUG(); + expected++; + } + + expected++; /* The caller has a ref too */ + + if (page_count(page) < expected) + BUG(); + if (!PageReserved(page) && put_page_testzero(page)) { if (PageLRU(page)) lru_cache_del(page); __free_pages_ok(page, 0); } } The list_empty() check will require that remove_page_from_inode_queue() use list_del_init() instead of list_del(). The above code (untested, and racy as hell on SMP and preempt) may catch whoever is droppng the refcount at the wrong time. - ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2002-03-09 0:19 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2002-03-08 0:45 truncate_list_pages() BUG and confusion Dave Hansen 2002-03-08 1:08 ` Andrew Morton 2002-03-08 2:54 ` Dave Hansen 2002-03-08 2:55 ` Martin J. Bligh 2002-03-08 3:02 ` Andrew Morton 2002-03-08 3:04 ` Dave Hansen 2002-03-08 21:14 ` Martin J. Bligh 2002-03-08 21:53 ` Andrew Morton 2002-03-08 22:13 ` Dave Hansen 2002-03-08 22:35 ` Andrew Morton 2002-03-09 0:04 ` Martin J. Bligh 2002-03-09 0:17 ` Andrew Morton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox