truncate_list_pages() BUG and confusion

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* truncate_list_pages()  BUG and confusion
@ 2002-03-08  0:45 Dave Hansen
  2002-03-08  1:08 ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Hansen @ 2002-03-08  0:45 UTC (permalink / raw)
  To: linux-kernel

in truncate_list_pages()

   failed = TryLockPage(page);

So, the page is always locked here.

   truncate_complete_page(page);
         remove_inode_page(page);
              if (!PageLocked(page))
                 PAGE_BUG(page);

         page_cache_release(page);
             calls __free_pages_ok(page, 0);
                 if (PageLocked(page))
                    BUG();

It is a BUG if the page is not locked in remove_inode_page() and also a 
bug if it IS locked in __free_pages_ok().  What am I missing?

ksymoopsed output follows:

kernel BUG at page_alloc.c:109!
invalid operand: 0000
CPU:    1
EIP:    0010:[<c012f27c>]    Not tainted
EFLAGS: 00010202
eax: 01000001   ebx: c13ba15c   ecx: c13ba15c   edx: c13ba15c
esi: 00000000   edi: db5aff20   ebp: 00000000   esp: db5afe90
ds: 0018   es: 0018   ss: 0018
Stack: c13ba15c 00000000 db5aff20 00000000 c13ba15c c13ba15c 00000000 
c13ba15c
        00000000 db5aff20 00000000 c012717a c13ba15c 00000000 c012fb05 
c13ba15c
        c01271c2 c13ba15c c13ba15c c0127326 c13ba15c 00000000 db5aff20 
00000018
Call Trace: [<c012717a>] [<c012fb05>] [<c01271c2>] [<c0127326>] 
[<c01273db>]
    [<c0125192>] [<c012a49d>] [<c01361fb>] [<c0108a23>]
Code: 0f 0b 6d 00 60 89 24 c0 8b 4c 24 10 8b 41 18 a8 40 74 08 0f

 >>EIP; c012f27c <__free_pages_ok+6c/29c>   <=====
Trace; c012717a <do_flushpage+26/2c>
Trace; c012fb05 <page_cache_release+2d/30>
Trace; c01271c2 <truncate_complete_page+42/48>
Trace; c0127326 <truncate_list_pages+15e/1c4>
Trace; c01273db <truncate_inode_pages+4f/80>
Trace; c0125192 <vmtruncate+be/154>
Trace; c012a49d <generic_file_write+62d/6f8>
Trace; c01361fb <sys_write+8f/10c>
Trace; c0108a23 <syscall_call+7/b>
Code;  c012f27c <__free_pages_ok+6c/29c>
00000000 <_EIP>:
Code;  c012f27c <__free_pages_ok+6c/29c>   <=====
    0:   0f 0b                     ud2a      <=====
Code;  c012f27e <__free_pages_ok+6e/29c>
    2:   6d                        insl   (%dx),%es:(%edi)
Code;  c012f27f <__free_pages_ok+6f/29c>
    3:   00 60 89                  add    %ah,0xffffff89(%eax)
Code;  c012f282 <__free_pages_ok+72/29c>
    6:   24 c0                     and    $0xc0,%al
Code;  c012f284 <__free_pages_ok+74/29c>
    8:   8b 4c 24 10               mov    0x10(%esp,1),%ecx
Code;  c012f288 <__free_pages_ok+78/29c>
    c:   8b 41 18                  mov    0x18(%ecx),%eax
Code;  c012f28b <__free_pages_ok+7b/29c>
    f:   a8 40                     test   $0x40,%al
Code;  c012f28d <__free_pages_ok+7d/29c>
   11:   74 08                     je     1b <_EIP+0x1b> c012f297 
<__free_pages_ok+87/29c>
Code;  c012f28f <__free_pages_ok+7f/29c>
   13:   0f 00 00                  sldt   (%eax)


-- 
Dave Hansen
haveblue@us.ibm.com


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: truncate_list_pages()  BUG and confusion
  2002-03-08  0:45 truncate_list_pages() BUG and confusion Dave Hansen
@ 2002-03-08  1:08 ` Andrew Morton
  2002-03-08  2:54   ` Dave Hansen
                     ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Andrew Morton @ 2002-03-08  1:08 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel

Dave Hansen wrote:
> 
> in truncate_list_pages()
> 
>    failed = TryLockPage(page);
> 
> So, the page is always locked here.
> 
>    truncate_complete_page(page);
>          remove_inode_page(page);
>               if (!PageLocked(page))
>                  PAGE_BUG(page);
> 
>          page_cache_release(page);
>              calls __free_pages_ok(page, 0);
>                  if (PageLocked(page))
>                     BUG();
> 
> It is a BUG if the page is not locked in remove_inode_page() and also a
> bug if it IS locked in __free_pages_ok().  What am I missing?

the page_cache_release() in truncate_complete_page() is just dropping
the reference count against the page which is due to that page's
presence in the pagecache.  We just took it out, so we drop the reference.

Note that the caller of truncate_complete_page() also took a reference to
the page, and undoes that reference *after* unlocking the page.  This
additional reference will prevent __free_pages_ok() from being called
by truncate_complete_page().

> ksymoopsed output follows:
>
> kernel BUG at page_alloc.c:109!

Now how did you manage that?  Looks like someone re-locked
the page after truncate_list_pages unlocked it.

-

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: truncate_list_pages()  BUG and confusion
  2002-03-08  1:08 ` Andrew Morton
@ 2002-03-08  2:54   ` Dave Hansen
  2002-03-08  2:55   ` Martin J. Bligh
  2002-03-08 21:14   ` Martin J. Bligh
  2 siblings, 0 replies; 12+ messages in thread
From: Dave Hansen @ 2002-03-08  2:54 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Andrew Morton wrote:
>>ksymoopsed output follows:
>>
>>kernel BUG at page_alloc.c:109!
>>
> 
> Now how did you manage that?  Looks like someone re-locked
> the page after truncate_list_pages unlocked it.

I stopped getting oopses from the dbench 64 on a small partition, and 
started getting these BUG()s instead.

The oopses I _was_ getting were because create_buffers() was returning a 
buffer chain with one of the bh->b_this_page entries set to 0x01010101. 
  The funny part was that it was always the same number, not a memory 
address, or NULL, it was always 0x01010101!  The next time through the 
loop, "bh->b_end_io = NULL;" was blowing up. (no surprise there :)

I put some code into create_buffers() to look for that magic number, but 
stopped getting the oopses after I added a couple of if( XX == 0x0101010 
) panic("foo").  I now can't recreate the original oopses.  The 
disassembly of create_empty_buffers() looked screwy the first time I 
looked, so I'm guessing that I just encountered a transient gcc bug or 
something.  But, I'm still getting those damn "__block_prepare_write: 
zeroing uptodate buffer!" messages, in addition to the new BUG(), which 
never happened before.  The hardware is extremely stable, and 2.4 
doesn't show any of these problems.

Have you had anyone else try the dbench torture test on other SMP machines?

-- 
Dave Hansen
haveblue@us.ibm.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: truncate_list_pages()  BUG and confusion
  2002-03-08  1:08 ` Andrew Morton
  2002-03-08  2:54   ` Dave Hansen
@ 2002-03-08  2:55   ` Martin J. Bligh
  2002-03-08  3:02     ` Andrew Morton
  2002-03-08  3:04     ` Dave Hansen
  2002-03-08 21:14   ` Martin J. Bligh
  2 siblings, 2 replies; 12+ messages in thread
From: Martin J. Bligh @ 2002-03-08  2:55 UTC (permalink / raw)
  To: Andrew Morton, Dave Hansen; +Cc: linux-kernel

--On Thursday, March 07, 2002 5:08 PM -0800 Andrew Morton <akpm@zip.com.au> wrote:

> Dave Hansen wrote:
>> 
>> in truncate_list_pages()
>> 
>>    failed = TryLockPage(page);
>> 
>> So, the page is always locked here.
>> 
>>    truncate_complete_page(page);
>>          remove_inode_page(page);
>>               if (!PageLocked(page))
>>                  PAGE_BUG(page);
>> 
>>          page_cache_release(page);
>>              calls __free_pages_ok(page, 0);
>>                  if (PageLocked(page))
>>                     BUG();
>> 
>> It is a BUG if the page is not locked in remove_inode_page() and also a
>> bug if it IS locked in __free_pages_ok().  What am I missing?
> 
> the page_cache_release() in truncate_complete_page() is just dropping
> the reference count against the page which is due to that page's
> presence in the pagecache.  We just took it out, so we drop the reference.
> 
> Note that the caller of truncate_complete_page() also took a reference to
> the page, and undoes that reference *after* unlocking the page.  This
> additional reference will prevent __free_pages_ok() from being called
> by truncate_complete_page().
> 
>> ksymoopsed output follows:
>> 
>> kernel BUG at page_alloc.c:109!
> 
> Now how did you manage that?  Looks like someone re-locked
> the page after truncate_list_pages unlocked it.

Where did truncate_list_pages unlock it? It doesn't do that
until after it's called truncate_complete_page, which (indirectly)
frees the page ... ?

Dave & I must be missing something, as this could never work
the way we're reading it, but I can't see what ;-)

M.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: truncate_list_pages()  BUG and confusion
  2002-03-08  2:55   ` Martin J. Bligh
@ 2002-03-08  3:02     ` Andrew Morton
  2002-03-08  3:04     ` Dave Hansen
  1 sibling, 0 replies; 12+ messages in thread
From: Andrew Morton @ 2002-03-08  3:02 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Dave Hansen, linux-kernel

"Martin J. Bligh" wrote:
> 
> Where did truncate_list_pages unlock it?

                                if (*partial && (offset + 1) == start) {
                                        truncate_partial_page(page, *partial);
                                        *partial = 0;
                                } else 
                                        truncate_complete_page(page);

                                UnlockPage(page);
                        } else
                                wait_on_page(page);

It's either unlocked directly, or we wait for whoever locked
it (presumably I/O) to unlock it.

-

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: truncate_list_pages()  BUG and confusion
  2002-03-08  2:55   ` Martin J. Bligh
  2002-03-08  3:02     ` Andrew Morton
@ 2002-03-08  3:04     ` Dave Hansen
  1 sibling, 0 replies; 12+ messages in thread
From: Dave Hansen @ 2002-03-08  3:04 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Andrew Morton, linux-kernel

Martin J. Bligh wrote:
> Where did truncate_list_pages unlock it? It doesn't do that
> until after it's called truncate_complete_page, which (indirectly)
> frees the page ... ?

Ahh, put_page_testzero
#define put_page_testzero(p)    atomic_dec_and_test(&(p)->count)

It will only try to free if the page count is zero.  The caller makes 
sure that this doesn't happen.

if (!PageReserved(page) && put_page_testzero(page)) {
	if (PageLRU(page))
         	lru_cache_del(page);
	__free_pages_ok(page, 0);
}


-- 
Dave Hansen
haveblue@us.ibm.com


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: truncate_list_pages()  BUG and confusion
  2002-03-08  1:08 ` Andrew Morton
  2002-03-08  2:54   ` Dave Hansen
  2002-03-08  2:55   ` Martin J. Bligh
@ 2002-03-08 21:14   ` Martin J. Bligh
  2002-03-08 21:53     ` Andrew Morton
  2 siblings, 1 reply; 12+ messages in thread
From: Martin J. Bligh @ 2002-03-08 21:14 UTC (permalink / raw)
  To: Andrew Morton, Dave Hansen; +Cc: linux-kernel

OK, Dave and I went through this again, and we're both still confused.
Hopefully you still have the patience to keep explaining this to us ;-)
There must be something wrong here, otherwise we wouldn't get the
panic. It looks like the real problem here is to do with the page's count
rather than the locked stuff .... which fits in with the other panics Dave
has been seeing.

> the page_cache_release() in truncate_complete_page() is just dropping
> the reference count against the page which is due to that page's
> presence in the pagecache.  We just took it out, so we drop the reference.
> 
> Note that the caller of truncate_complete_page() also took a reference to
> the page, and undoes that reference *after* unlocking the page.  This
> additional reference will prevent __free_pages_ok() from being called
> by truncate_complete_page().

OK, I'd missed a whole bit here, which I now understand (or think I do ;-)). 
So truncate_list_pages does a page_cache_get(page), which increments 
the count. Nowhere does it decrement the count, or do an UnlockPage 
before it gets into page_cache_release, which looks like this:

void page_cache_release(struct page *page)
{
        if (!PageReserved(page) && put_page_testzero(page)) {
                if (PageLRU(page))
                        lru_cache_del(page);
                __free_pages_ok(page, 0);
        }
}

We enter page_cache_release with the supposedly locked, and its count
non-zero (we incremented it).  put_page_testzero does atomic_dec_and_test
on count which says it returns true if the result is 0, or false for all other cases.

So if nobody else was holding a reference to the page, we've decremented
it's count to 0, and put_page_testzero returns 1. We then try to free the page.
It's still locked. BUG.

John Stultz also pointed out that on return to the flow of truncate_list_pages
we call page_cache release again, which looks really odd. Something is 
wrong here .....

>> ksymoopsed output follows:
>> 
>> kernel BUG at page_alloc.c:109!
> 
> Now how did you manage that?  Looks like someone re-locked
> the page after truncate_list_pages unlocked it.> Where did truncate_list_pages unlock it?

> 
>                                 if (*partial && (offset + 1) == start) {
>                                         truncate_partial_page(page, *partial);
>                                         *partial = 0;
>                                 } else 
>                                         truncate_complete_page(page);
>  
>                                UnlockPage(page);
>                         } else
>                                 wait_on_page(page);
> 
> It's either unlocked directly, or we wait for whoever locked
> it (presumably I/O) to unlock it.

But that's all *after* we call truncate_complete_page, so it's
irrelevant, no ??

M.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: truncate_list_pages()  BUG and confusion
  2002-03-08 21:14   ` Martin J. Bligh
@ 2002-03-08 21:53     ` Andrew Morton
  2002-03-08 22:13       ` Dave Hansen
  2002-03-09  0:04       ` Martin J. Bligh
  0 siblings, 2 replies; 12+ messages in thread
From: Andrew Morton @ 2002-03-08 21:53 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Dave Hansen, linux-kernel

"Martin J. Bligh" wrote:
> 
> ...
> So truncate_list_pages does a page_cache_get(page), which increments
> the count. Nowhere does it decrement the count, or do an UnlockPage
> before it gets into page_cache_release, which looks like this:

well...  truncate_list_pages _does_ unlock the page.  Let's
snip some bits:

truncate_list_pages()
{
	...
	/* Bump the refcount.  From now until the matching
	 * page_cache_release(), the page *cannot* be freed,
	 * because we hold a reference
	 */
	page_cache_get(page);

	lock_page(page);

	truncate_complete_page(page);
	-->	{
			remove_inode_page(page);	/* Remove from pagecache */
			/*
			 * We now drop the refcount which is due to
			 * presence in the pagecache.  This can't free
			 * the page, because truncate_list_pages() holds a reference
			 */
			page_cache_release(page);
		}

	<---
	UnlockPage();		/* Remember, wait_on_page() is an indirect unlock, too */

	/*
	 * Now truncate_list_pages() drops its temporary reference.
	 * The page is unlocked.  If this was the last reference
	 * (and it usually will be) then the page will now be freed.
	 */
	page_cache_release(page);
}

> void page_cache_release(struct page *page)
> {
>         if (!PageReserved(page) && put_page_testzero(page)) {
>                 if (PageLRU(page))
>                         lru_cache_del(page);
>                 __free_pages_ok(page, 0);
>         }
> }
> 
> We enter page_cache_release with the supposedly locked, and its count
> non-zero (we incremented it).  put_page_testzero does atomic_dec_and_test
> on count which says it returns true if the result is 0, or false for all other cases.
> 
> So if nobody else was holding a reference to the page, we've decremented
> it's count to 0, and put_page_testzero returns 1. We then try to free the page.
> It's still locked. BUG.

If the page_cache_release() in truncate_complete_page() is calling
__free_pages_ok() then something really horrid has happened.

Yes, it could be that the page has had its refcount incorrectly
decremented somewhere.

Or the page wasn't in the pagecache at all.

Is this happening with the dbench/ENOSPC/1k blocksize testcase?
In that case, something is clearly going wrong with the association
of buffers against the page.  And the presence of buffers at page->buffers
_also_ contributes to page->count.  So there's a commonality here.
Possibly we forgot to increment the page count when we added buffers,
or we bogusly thought the page had buffers when it didn't, and then
dropped the page refcount when we thought we removed the buffers
which weren't really there.

-

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: truncate_list_pages()  BUG and confusion
  2002-03-08 21:53     ` Andrew Morton
@ 2002-03-08 22:13       ` Dave Hansen
  2002-03-08 22:35         ` Andrew Morton
  2002-03-09  0:04       ` Martin J. Bligh
  1 sibling, 1 reply; 12+ messages in thread
From: Dave Hansen @ 2002-03-08 22:13 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Martin J. Bligh, linux-kernel

Andrew Morton wrote:
> If the page_cache_release() in truncate_complete_page() is calling
> __free_pages_ok() then something really horrid has happened.

Something horrid _IS_ happening:

kernel BUG at page_alloc.c:109!
CPU:    1
EIP:    0010:[<c012d9ac>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010202
eax: 01000005   ebx: c1471560   ecx: c1471560   edx: c1471560
esi: 00000000   edi: 00000000   ebp: 00000000   esp: da569ed0
ds: 0018   es: 0018   ss: 0018
Stack: c1471560 00000000 00000000 00000000 c1471560 c1471560 00000000 
c1471560
        00000000 00000000 00000000 c012574a c1471560 00000000 c012e235 
c1471560
        c0125792 c1471560 c1471560 c01258f6 c1471560 00000000 da569f60 
00000000
Call Trace: [<c012574a>] [<c012e235>] [<c0125792>] [<c01258f6>] 
[<c01259ab>]
    [<c0148df5>] [<c0146697>] [<c0140635>] [<c01070a3>]
Code: 0f 0b 6d 00 80 6d 23 c0 8b 4c 24 10 8b 41 18 a8 40 74 08 0f

 >>EIP; c012d9ac <__free_pages_ok+6c/29c>   <=====
Trace; c012574a <do_flushpage+26/2c>
Trace; c012e235 <page_cache_release+2d/30>
Trace; c0125792 <truncate_complete_page+42/48>
Trace; c01258f6 <truncate_list_pages+15e/1c4>
Trace; c01259ab <truncate_inode_pages+4f/80>
Trace; c0148df5 <iput+b5/1d8>
Trace; c0146697 <dput+e7/148>
Trace; c0140635 <sys_unlink+b1/124>
Trace; c01070a3 <syscall_call+7/b>
Code;  c012d9ac <__free_pages_ok+6c/29c>
00000000 <_EIP>:
Code;  c012d9ac <__free_pages_ok+6c/29c>   <=====
    0:   0f 0b                     ud2a      <=====
Code;  c012d9ae <__free_pages_ok+6e/29c>
    2:   6d                        insl   (%dx),%es:(%edi)
Code;  c012d9af <__free_pages_ok+6f/29c>
    3:   00 80 6d 23 c0 8b         add    %al,0x8bc0236d(%eax)
Code;  c012d9b5 <__free_pages_ok+75/29c>
    9:   4c                        dec    %esp
Code;  c012d9b6 <__free_pages_ok+76/29c>
    a:   24 10                     and    $0x10,%al
Code;  c012d9b8 <__free_pages_ok+78/29c>
    c:   8b 41 18                  mov    0x18(%ecx),%eax
Code;  c012d9bb <__free_pages_ok+7b/29c>
    f:   a8 40                     test   $0x40,%al
Code;  c012d9bd <__free_pages_ok+7d/29c>
   11:   74 08                     je     1b <_EIP+0x1b> c012d9c7 
<__free_pages_ok+87/29c>
Code;  c012d9bf <__free_pages_ok+7f/29c>
   13:   0f 00 00                  sldt   (%eax)

 > Is this happening with the dbench/ENOSPC/1k blocksize testcase?
yes

For this particular one, dbench ran without failure, but while rm'ing 
the leftover directories, it BUGged.

-- 
Dave Hansen
haveblue@us.ibm.com


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: truncate_list_pages()  BUG and confusion
  2002-03-08 22:13       ` Dave Hansen
@ 2002-03-08 22:35         ` Andrew Morton
  0 siblings, 0 replies; 12+ messages in thread
From: Andrew Morton @ 2002-03-08 22:35 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Martin J. Bligh, linux-kernel

Dave Hansen wrote:
> 
> Andrew Morton wrote:
> > If the page_cache_release() in truncate_complete_page() is calling
> > __free_pages_ok() then something really horrid has happened.
> 
> Something horrid _IS_ happening:
> 
> kernel BUG at page_alloc.c:109!
> ...
>  > Is this happening with the dbench/ENOSPC/1k blocksize testcase?
> yes
> 
> For this particular one, dbench ran without failure, but while rm'ing
> the leftover directories, it BUGged.

Are you also seeing the 'VFS: free of freed buffer' message?

There were some changes made in both 2.4 and 2.5 in this area.
These were to handle the situation where block allocation fails
when we're partway through mapping a page to disk.

Here's the 2.5 diff.  A `patch -R' of this will revert those
changes.

There are three independent parts to this - generic_file_write(),
block_write_full_page() and __block_prepare_write().  If reversion
of this entire patch makes the problem go away, then would you be
able to narrow it down to one of those three functions?


--- linux-2.5.5-pre1/fs/buffer.c	Wed Feb 13 15:22:00 2002
+++ 25/fs/buffer.c	Thu Feb 14 22:43:33 2002
@@ -1431,6 +1431,7 @@ static int __block_write_full_page(struc
 	int err, i;
 	unsigned long block;
 	struct buffer_head *bh, *head;
+	int need_unlock;
 
 	if (!PageLocked(page))
 		BUG();
@@ -1486,8 +1487,34 @@ static int __block_write_full_page(struc
 	return 0;
 
 out:
+	/*
+	 * ENOSPC, or some other error.  We may already have added some
+	 * blocks to the file, so we need to write these out to avoid
+	 * exposing stale data.
+	 */
 	ClearPageUptodate(page);
-	UnlockPage(page);
+	bh = head;
+	need_unlock = 1;
+	/* Recovery: lock and submit the mapped buffers */
+	do {
+		if (buffer_mapped(bh)) {
+			lock_buffer(bh);
+			set_buffer_async_io(bh);
+			need_unlock = 0;
+		}
+		bh = bh->b_this_page;
+	} while (bh != head);
+	do {
+		struct buffer_head *next = bh->b_this_page;
+		if (buffer_mapped(bh)) {
+			set_bit(BH_Uptodate, &bh->b_state);
+			clear_bit(BH_Dirty, &bh->b_state);
+			submit_bh(WRITE, bh);
+		}
+		bh = next;
+	} while (bh != head);
+	if (need_unlock)
+		UnlockPage(page);
 	return err;
 }
 
@@ -1518,6 +1545,7 @@ static int __block_prepare_write(struct 
 			continue;
 		if (block_start >= to)
 			break;
+		clear_bit(BH_New, &bh->b_state);
 		if (!buffer_mapped(bh)) {
 			err = get_block(inode, block, bh, 1);
 			if (err)
@@ -1552,12 +1580,35 @@ static int __block_prepare_write(struct 
 	 */
 	while(wait_bh > wait) {
 		wait_on_buffer(*--wait_bh);
-		err = -EIO;
 		if (!buffer_uptodate(*wait_bh))
-			goto out;
+			return -EIO;
 	}
 	return 0;
 out:
+	/*
+	 * Zero out any newly allocated blocks to avoid exposing stale
+	 * data.  If BH_New is set, we know that the block was newly
+	 * allocated in the above loop.
+	 */
+	bh = head;
+	block_start = 0;
+	do {
+		block_end = block_start+blocksize;
+		if (block_end <= from)
+			goto next_bh;
+		if (block_start >= to)
+			break;
+		if (buffer_new(bh)) {
+			if (buffer_uptodate(bh))
+				printk(KERN_ERR "%s: zeroing uptodate buffer!\n", __FUNCTION__);
+			memset(kaddr+block_start, 0, bh->b_size);
+			set_bit(BH_Uptodate, &bh->b_state);
+			mark_buffer_dirty(bh);
+		}
+next_bh:
+		block_start = block_end;
+		bh = bh->b_this_page;
+	} while (bh != head);
 	return err;
 }
 
--- linux-2.5.5-pre1/mm/filemap.c	Sun Feb 10 22:00:36 2002
+++ 25/mm/filemap.c	Thu Feb 14 22:42:15 2002
@@ -2999,7 +2999,7 @@ generic_file_write(struct file *file,con
 		kaddr = kmap(page);
 		status = mapping->a_ops->prepare_write(file, page, offset, offset+bytes);
 		if (status)
-			goto unlock;
+			goto sync_failure;
 		page_fault = __copy_from_user(kaddr+offset, buf, bytes);
 		flush_dcache_page(page);
 		status = mapping->a_ops->commit_write(file, page, offset, offset+bytes);
@@ -3024,6 +3024,7 @@ unlock:
 		if (status < 0)
 			break;
 	} while (count);
+done:
 	*ppos = pos;
 
 	if (cached_page)
@@ -3045,6 +3046,18 @@ out:
 fail_write:
 	status = -EFAULT;
 	goto unlock;
+
+sync_failure:
+	/*
+	 * If blocksize < pagesize, prepare_write() may have instantiated a
+	 * few blocks outside i_size.  Trim these off again.
+	 */
+	kunmap(page);
+	UnlockPage(page);
+	page_cache_release(page);
+	if (pos + bytes > inode->i_size)
+		vmtruncate(inode, inode->i_size);
+	goto done;
 
 o_direct:
 	written = generic_file_direct_IO(WRITE, file, (char *) buf, count, pos);


-

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: truncate_list_pages()  BUG and confusion
  2002-03-08 21:53     ` Andrew Morton
  2002-03-08 22:13       ` Dave Hansen
@ 2002-03-09  0:04       ` Martin J. Bligh
  2002-03-09  0:17         ` Andrew Morton
  1 sibling, 1 reply; 12+ messages in thread
From: Martin J. Bligh @ 2002-03-09  0:04 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Dave Hansen, linux-kernel

>> void page_cache_release(struct page *page)
>> {
>>         if (!PageReserved(page) && put_page_testzero(page)) {
>>                 if (PageLRU(page))
>>                         lru_cache_del(page);
>>                 __free_pages_ok(page, 0);
>>         }
>> }
>> 
>> We enter page_cache_release with the supposedly locked, and its count
>> non-zero (we incremented it).  put_page_testzero does atomic_dec_and_test
>> on count which says it returns true if the result is 0, or false for all other cases.
>> 
>> So if nobody else was holding a reference to the page, we've decremented
>> it's count to 0, and put_page_testzero returns 1. We then try to free the page.
>> It's still locked. BUG.
> 
> If the page_cache_release() in truncate_complete_page() is calling
> __free_pages_ok() then something really horrid has happened.

That's exactly what's happening.
 
> Yes, it could be that the page has had its refcount incorrectly
> decremented somewhere.

I don't see you need that to make this bug happen.
Say count is 0 when we enter truncate_list_pages. We increment it.
It's now 1 when we call page_cache_release. 
put_page_testzero dec's it back to 0, and returns true.
We do a __free_pages_ok. Page is still locked. BUG.

No other process, nothing funky happening, no races, no other
refcount decrements. Or that's the way I read it.

> Or the page wasn't in the pagecache at all.

The only thing I can think of was the pagecount shouldn't have been 0
to start with (or the code path we're reading is wrong ;-) )

M.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: truncate_list_pages()  BUG and confusion
  2002-03-09  0:04       ` Martin J. Bligh
@ 2002-03-09  0:17         ` Andrew Morton
  0 siblings, 0 replies; 12+ messages in thread
From: Andrew Morton @ 2002-03-09  0:17 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Dave Hansen, linux-kernel

"Martin J. Bligh" wrote:
> 
> ..
> > If the page_cache_release() in truncate_complete_page() is calling
> > __free_pages_ok() then something really horrid has happened.
> 
> That's exactly what's happening.
> 
> > Yes, it could be that the page has had its refcount incorrectly
> > decremented somewhere.
> 
> I don't see you need that to make this bug happen.
> Say count is 0 when we enter truncate_list_pages.

It mustn't be zero!  The page is *known* to be in the pagecache,
so its count must be at least 1.

> We increment it.
> It's now 1 when we call page_cache_release.
> put_page_testzero dec's it back to 0, and returns true.
> We do a __free_pages_ok. Page is still locked. BUG.
> 
> No other process, nothing funky happening, no races, no other
> refcount decrements. Or that's the way I read it.
> 
> > Or the page wasn't in the pagecache at all.
> 
> The only thing I can think of was the pagecount shouldn't have been 0
> to start with (or the code path we're reading is wrong ;-) )

yup.  You can stick an

	if (page_count(page) == 0)
		BUG();

at the top of truncate_list_pages...

Actually it might help to add some extra checks to
page_cache_release():

1: If the page is in the pagecache, that's one.
2: If the page has buffers, that's another.

so 

  void page_cache_release(struct page *page)  
  {
+	int expected = 0;
+
+	if (page->buffers)
+		expected++;
+	if (page->mapping) {
+		if (!list_empty(page->list))
+			BUG();
+		expected++;
+	}
+
+	expected++;	/* The caller has a ref too */
+
+	if (page_count(page) < expected)
+		BUG();
+
        if (!PageReserved(page) && put_page_testzero(page)) {
                if (PageLRU(page))
                        lru_cache_del(page);
                __free_pages_ok(page, 0);
        }
  }

The list_empty() check will require that remove_page_from_inode_queue()
use list_del_init() instead of list_del().

The above code (untested, and racy as hell on SMP and preempt) may
catch whoever is droppng the refcount at the wrong time.

-

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2002-03-09  0:19 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-03-08  0:45 truncate_list_pages() BUG and confusion Dave Hansen
2002-03-08  1:08 ` Andrew Morton
2002-03-08  2:54   ` Dave Hansen
2002-03-08  2:55   ` Martin J. Bligh
2002-03-08  3:02     ` Andrew Morton
2002-03-08  3:04     ` Dave Hansen
2002-03-08 21:14   ` Martin J. Bligh
2002-03-08 21:53     ` Andrew Morton
2002-03-08 22:13       ` Dave Hansen
2002-03-08 22:35         ` Andrew Morton
2002-03-09  0:04       ` Martin J. Bligh
2002-03-09  0:17         ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox