From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Mon, 17 Jun 2002 05:31:36 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Mon, 17 Jun 2002 05:31:35 -0400 Received: from parcelfarce.linux.theplanet.co.uk ([195.92.249.252]:37640 "EHLO www.linux.org.uk") by vger.kernel.org with ESMTP id ; Mon, 17 Jun 2002 05:31:34 -0400 Message-ID: <3D0DAD69.5C667D63@zip.com.au> Date: Mon, 17 Jun 2002 02:35:37 -0700 From: Andrew Morton X-Mailer: Mozilla 4.79 [en] (X11; U; Linux 2.4.19-pre9 i686) X-Accept-Language: en MIME-Version: 1.0 To: Linus Torvalds , lkml Subject: Re: [patch 4/19] stack space reduction (remove MAX_BUF_PER_PAGE) References: <3D0D86F0.5016809@zip.com.au> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Andrew Morton wrote: > > .. > + do { > + if (buffer_async_read(bh)) > + submit_bh(READ, bh); > + } while ((bh = bh->b_this_page) != head); That's a bug. We cannot touch bh->b_this_page after submitting the buffer because the I/O could complete synchronously and the page can come unlocked and the buffers can be stripped by the VM. The buffer at `bh' could be dead memory. This can happen on SMP with PIO-mode IDE disks. We cannot fix this with the usual do { next = bh->b_this_page; if (something) submit_bh(bh); } while ((bh = next) != head); approach because the final buffer on the page could be unlocked, clean and uptodate. Pinning one of the buffers while we walk the ring will keep try_to_free_buffers() away. Here is an incremental patch. --- 2.5.22/fs/ntfs/aops.c~ntfs-race-fix Mon Jun 17 01:50:32 2002 +++ 2.5.22-akpm/fs/ntfs/aops.c Mon Jun 17 02:24:00 2002 @@ -220,10 +220,12 @@ handle_zblock: } while ((bh = bh->b_this_page) != head); /* Finally, start i/o on the buffers. */ + get_bh(head); /* Pin the buffers while we walk the ring */ do { if (buffer_async_read(bh)) submit_bh(READ, bh); } while ((bh = bh->b_this_page) != head); + put_bh(head); return 0; } /* No i/o was scheduled on any of the buffers. */ @@ -510,10 +512,12 @@ handle_zblock: } while ((bh = bh->b_this_page) != head); /* Finally, start i/o on the buffers. */ + get_bh(head); /* Pin the buffers while we walk the ring */ do { if (buffer_async_read(bh)) submit_bh(READ, bh); } while ((bh = bh->b_this_page) != head); + put_bh(head); return 0; } /* No i/o was scheduled on any of the buffers. */ @@ -774,10 +778,12 @@ handle_zblock: } while ((bh = bh->b_this_page) != head); /* Finally, start i/o on the buffers. */ + get_bh(head); /* Pin the buffers while we walk the ring */ do { if (buffer_async_read(bh)) submit_bh(READ, bh); } while ((bh = bh->b_this_page) != head); + put_bh(head); return 0; } /* No i/o was scheduled on any of the buffers. */ --- 2.5.22/fs/buffer.c~ntfs-race-fix Mon Jun 17 01:55:55 2002 +++ 2.5.22-akpm/fs/buffer.c Mon Jun 17 02:24:33 2002 @@ -2024,7 +2024,12 @@ int block_read_full_page(struct page *pa * Stage 3: start the IO. Check for uptodateness * inside the buffer lock in case another process reading * the underlying blockdev brought it uptodate (the sct fix). + * + * Bump the refcount of one buffer while walking the ring so + * that the VM cannot release the buffers while we're looking at + * ->b_this_page. */ + get_bh(head); do { if (buffer_async_read(bh)) { if (buffer_uptodate(bh)) @@ -2033,6 +2038,7 @@ int block_read_full_page(struct page *pa submit_bh(READ, bh); } } while ((bh = bh->b_this_page) != head); + put_bh(head); return 0; } -