* Segmentation Faults with 062508 ext4-patch-queue snapshot @ 2008-06-25 13:53 Gary Hawco 2008-06-26 4:14 ` Eric Sandeen 0 siblings, 1 reply; 13+ messages in thread From: Gary Hawco @ 2008-06-25 13:53 UTC (permalink / raw) To: Mingming, linux-ext4@vger.kernel.org I redid my tests on the last few ext-patch-queue snapshots. The snapshot: ext4-patch-queue-476b1104923c2228b48aee73070cddd5430b3b54.tar.gz from 062408 @2256hrs GMT works fine. No segfaults. The latest snapshot: ext4-patch-queue-b5db22ef52ed53d8e3fa978a5a29e1609c9333aa.tar.gz from 062508 @ 0019hrs GMT causes segmentation faults whenever I do extensive copying of small files (my Gentoo portage tree & metadata cache folders) to another ext4dev partition set up identically and then when I make a tarball from this copied data. I have been able to reproduce the segfaults consistently. Both partitions were formatted with flex_bg & meta_bg features enabled (obviously no resize_inodes) and are set for ordered data mode. I assume delalloc is enabled although ext4-patch-queue does not yet print message saying so on boot like mballoc does. Any suggestions would be most appreciated. Thanks, Gary ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Segmentation Faults with 062508 ext4-patch-queue snapshot 2008-06-25 13:53 Segmentation Faults with 062508 ext4-patch-queue snapshot Gary Hawco @ 2008-06-26 4:14 ` Eric Sandeen 2008-06-26 22:12 ` Segmentation Faults with both 062608 snapshots Gary Hawco 0 siblings, 1 reply; 13+ messages in thread From: Eric Sandeen @ 2008-06-26 4:14 UTC (permalink / raw) To: Gary Hawco; +Cc: Mingming, linux-ext4@vger.kernel.org Gary Hawco wrote: > I redid my tests on the last few ext-patch-queue snapshots. > > The snapshot: > ext4-patch-queue-476b1104923c2228b48aee73070cddd5430b3b54.tar.gz from > 062408 @2256hrs GMT works fine. No segfaults. > > The latest snapshot: > ext4-patch-queue-b5db22ef52ed53d8e3fa978a5a29e1609c9333aa.tar.gz from > 062508 @ 0019hrs GMT causes segmentation faults whenever I do extensive > copying of small files (my Gentoo portage tree & metadata cache folders) to > another ext4dev partition set up identically and then when I make a tarball > from this copied data. > > I have been able to reproduce the segfaults consistently. Both partitions > were formatted with flex_bg & meta_bg features enabled (obviously no > resize_inodes) and are set for ordered data mode. > > I assume delalloc is enabled although ext4-patch-queue does not yet print > message saying so on boot like mballoc does. > > Any suggestions would be most appreciated. Do you get kernel messages when this happens? If so can you provide them? Thanks, -Eric ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Segmentation Faults with both 062608 snapshots 2008-06-26 4:14 ` Eric Sandeen @ 2008-06-26 22:12 ` Gary Hawco 2008-07-01 2:32 ` Theodore Tso 0 siblings, 1 reply; 13+ messages in thread From: Gary Hawco @ 2008-06-26 22:12 UTC (permalink / raw) To: Eric Sandeen, linux-ext4@vger.kernel.org More on my segmentation problems. Just looking at the end of each kernel message, they always end with: EIP [XXXXX] jbd2_journal_data_metadata-XXX This occurs when I try to copy data from my main partition to a backup partition or make a tarball. Both partitions are setup with flex_bg,meta_bg,uninit_bg with ordered data mode using noatime,nodiratime,journal_async_commit mount parameters. Again, the snapshot from 062508/0019hrs GMT -- ext4-patch-queue-b5db22ef52ed53d8e3fa978a5a29e1609c9333aa.tar.gz works fine patching a clean linux-2.6.26-rc6 source and exhibits no problems whatsoever. Starting with the next snapshot (062608/0042hrs GMT) is where my segfaults start. This continues using the newest snapshot from 062608/2251hrs GMT. I have patched the source from the individual snapshots mentioned as well as just pulling and updating my ext4-patch-queue folder with the same results. If I roll the kernel back to the 062508/0019 snapshot, the problems vanish. No errors noted during e2fsck -fvy on either partition. This is driving me crazy! Gary ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Segmentation Faults with both 062608 snapshots 2008-06-26 22:12 ` Segmentation Faults with both 062608 snapshots Gary Hawco @ 2008-07-01 2:32 ` Theodore Tso 2008-07-01 0:00 ` More ext4dev snapshot weirdness Gary Hawco 0 siblings, 1 reply; 13+ messages in thread From: Theodore Tso @ 2008-07-01 2:32 UTC (permalink / raw) To: Gary Hawco; +Cc: Eric Sandeen, linux-ext4@vger.kernel.org On Thu, Jun 26, 2008 at 10:12:27PM +0000, Gary Hawco wrote: > More on my segmentation problems. Just looking at the end of each kernel > message, they always end with: > EIP [XXXXX] jbd2_journal_data_metadata-XXX Hmm... any chance you can take a picture of the crash message with a digital camera and post the jpg? (I'm assuming you haven't been able to capture the OOPS stack trace in /var/log/kern.log or /var/log/messages or some such.) > This occurs when I try to copy data from my main partition to a backup > partition or make a tarball. I am currently using the ext4 patch queue comit id #555132eb from 2008-06-30 13:04:35 -0400, and I can't reproduce it. I just tried backing up by isync mail directory to a tar.gz file and then restored it, with no problems. Both the source and backup partition are ext4 filesystems with flex_bg, meta_bg, and uninit_bg, located on separate LVM logical volumes on my laptop.[1] [1] http://thunk.org/tytso/blog/2008/06/30/ext4-is-now-the-primary-filesystem-on-my-laptop/ The partition is mounted using mount options: noatime,errors=remount-ro,barrier=1,data=ordered according to /proc/mounts. The big difference between your mount options and mine is that I dont have journal_async_commit as a mount option. Which would be largely moot for me since LVM doesn't support barrier operations. :-( :-( :-( You seem to be able to reproduce the problem at will. Could you try removing te journal_sync_commit option and see if it makes the problem go away? That would be very interesting if it were the case... - Ted ^ permalink raw reply [flat|nested] 13+ messages in thread
* More ext4dev snapshot weirdness 2008-07-01 2:32 ` Theodore Tso @ 2008-07-01 0:00 ` Gary Hawco 2008-07-01 16:02 ` Theodore Tso 0 siblings, 1 reply; 13+ messages in thread From: Gary Hawco @ 2008-07-01 0:00 UTC (permalink / raw) To: Theodore Tso, linux-ext4@vger.kernel.org Ted, Did some more testing before replying. Have two linux operating systems: Both were setup identically, i.e, formatted with flex_bg & meta_bg. Tune2fs used to enable uninit_bg. Ordered data mode. Grub used to pass boot parameter of rootflags=commit=15. Fstab mount options of noatime,nodiratime,journal_async_commit. Slackware 12.1--this one uses its BSD style startup scripts. No problems whatsoever with any of the snapshots including the newest from 30 June 2008/2219hrs. Gentoo --this is the system I am having trouble with. It will boot the snapshot from 062508/0019hrs fine. Everything works fine here including my script that updates the portage/metadata trees, copies them to another partition setup identically with ext4dev, i.e. flex_bg, meta_bg & uninit_bg, then creates a tarball. All of the snapshots after the 062508 timestamp through the 062708/2353hrs. snapshot would segfault running this script. (Looking for a digital camera, that uses flash media that windows or linux will recognize so I can save and post the jpeg image) Starting with today's snapshots rebased against 2.6.26-rc8 kernel a new problem surfaced, and the old segfault issues disappeared. That is, I could run my script without any problems. My Gentoo uses an updated baselayout/OpenRC start configuration. It has a /lib/rc folder as opposed to var/lib/init.d, containing several subfolders, such as /lib/rc/init.d which cache dependencies and make for a very fast boot and shutdown. With today's snapshots I get one clean start, then I get errors where the network interface eth0 does not initialize on startup but can manually be pulled in by "dhcpcd eth0". If I delete the contents of /lib/rc/init.d on the next restart the network interface initializes properly. If I roll back to the 062508 kernel snapshot, this new problem goes away. I tried removing journal_async_commit from fstab mount options without any difference. I was also passing rootflags=commit=15 during bootup with grub. I removed that parameter changing back to the default 5 second commit interval without any improvement. This new startup style uses parallel service starts. I changed back to series starting without any improvement. I even tried switching from ordered data mode to writeback mode without success. So if I haven't thoroughly confused you or put you to sleep, to summarize: Slackware loves all the snapshots. Gentoo was fine through 062508/00119hrs. No problems. From 062608/0042hrs - 062708/2353 snapshots boot sequence was fine, but segfaulting running my portage/metadata backup script (lots of small files). Today's updates rebased against 2.6.26-rc8 are NOT segfaulting running the backup script, but seem to be corrupting the /lib/rc/init.d/database files after the first start. I am willing to bet that Gentoo on the old baselayout/Non open-rc startup up scripts would have no problems ala Slackware, but it's curious everything was fine through the 062508/0019GMT snapshot. It seems that once delalloc was brought back in with ordered data mode problems started to arise. I tried to roll back the baselayout v2 to the older version 1.12, but I broke the os and had to quickly reinstall using a recent tarball. It's the only explanation why Gentoo is having problems, but Slackware is not. And now that today with the latest rc-8 snapshots the initialization of devices during startup is getting fubared, I am certain the Baselayout2/open-rc-2.5 does not like the latest iterations of the ext4-patch-queue kernel. Hope this expose sheds some light on things. Thanks again, Gary ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: More ext4dev snapshot weirdness 2008-07-01 0:00 ` More ext4dev snapshot weirdness Gary Hawco @ 2008-07-01 16:02 ` Theodore Tso 2008-07-01 10:54 ` delalloc filesystem corruption Gary Hawco 0 siblings, 1 reply; 13+ messages in thread From: Theodore Tso @ 2008-07-01 16:02 UTC (permalink / raw) To: Gary Hawco; +Cc: linux-ext4@vger.kernel.org On Tue, Jul 01, 2008 at 12:00:46AM +0000, Gary Hawco wrote: > Gentoo was fine through 062508/00119hrs. No problems. From 062608/0042hrs - > 062708/2353 snapshots boot sequence was fine, but segfaulting running my > portage/metadata backup script (lots of small files). > > Today's updates rebased against 2.6.26-rc8 are NOT segfaulting running the > backup script, but seem to be corrupting the /lib/rc/init.d/database files > after the first start. > > I am willing to bet that Gentoo on the old baselayout/Non open-rc startup > up scripts would have no problems ala Slackware, but it's curious > everything was fine through the 062508/0019GMT snapshot. It seems that once > delalloc was brought back in with ordered data mode problems started to > arise. I tried to roll back the baselayout v2 to the older version 1.12, > but I broke the os and had to quickly reinstall using a recent tarball. > It's the only explanation why Gentoo is having problems, but Slackware is > not. And now that today with the latest rc-8 snapshots the initialization > of devices during startup is getting fubared, I am certain the > Baselayout2/open-rc-2.5 does not like the latest iterations of the > ext4-patch-queue kernel. Gary, It's definitely the case that kernel oops indicates kernel bugs. It might be the case that one distribution is better than another at exposing the bug and making it visible, but that doesn't mean the bug is in the distribution; in fact, if you can provoke a segfault, this is *good* because it can help us track down the bug more effectively/efficiently. Data corruption like you are seeing now is worse, since it's actually much harder to track down. Of course, in order to track down the segfault we really need the oops messages. In the bad old days I would laborously copy down all of the numbers and function names in the stack trace using pen and paper, and then transcribe it into e-mail. (Yes, I also walked uphill through the snow in the winter, in both directions. :-) Using a digital camera is more convenient, but if it's too hard for you to grab one, you can always fall back to the pen and paper model. So if you have data corruption in your init.d files, does it go away if you disable delalloc? What if you disable mballoc? Of course, do this on Gentoo if it's better at provoking the filesystem bug. Remember, from the point of view of filesystem developers, it's good if you can provoke the bug; since it's only then that you can squash it. :-) - Ted ^ permalink raw reply [flat|nested] 13+ messages in thread
* delalloc filesystem corruption 2008-07-01 16:02 ` Theodore Tso @ 2008-07-01 10:54 ` Gary Hawco 2008-07-01 23:00 ` Mingming Cao 0 siblings, 1 reply; 13+ messages in thread From: Gary Hawco @ 2008-07-01 10:54 UTC (permalink / raw) To: Theodore Tso, linux-ext4@vger.kernel.org Ted, Thanks for your quick reply. With the newest rc8-based snapshots I am no longer getting segfaults, unless you wanted me to roll back to the rc6-based snapshots to reproduce the segfaults. Per your latest request I disabled delalloc, and voila, no more data corruption! Then I enabled delalloc and disabled mballoc and file corruption to /lib/rc/init.d/nettree returned. So delalloc is the culprit. I hope this will help figure this out. Thanks again, Gary ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: delalloc filesystem corruption 2008-07-01 10:54 ` delalloc filesystem corruption Gary Hawco @ 2008-07-01 23:00 ` Mingming Cao 2008-07-01 17:50 ` Gentoo with ext4-patch-queue snapshots Gary Hawco 0 siblings, 1 reply; 13+ messages in thread From: Mingming Cao @ 2008-07-01 23:00 UTC (permalink / raw) To: Gary Hawco; +Cc: Theodore Tso, linux-ext4@vger.kernel.org On Tue, 2008-07-01 at 10:54 +0000, Gary Hawco wrote: > Ted, > > Thanks for your quick reply. With the newest rc8-based snapshots I am no > longer getting segfaults, unless you wanted me to roll back to the > rc6-based snapshots to reproduce the segfaults. > > Per your latest request I disabled delalloc, and voila, no more data > corruption! Then I enabled delalloc and disabled mballoc and file > corruption to /lib/rc/init.d/nettree returned. So delalloc is the culprit. > > I hope this will help figure this out. > Could you try the following patch? It fixes the problem of update on-disk size too early without block allocation, the problem is introduced unintentionally by another bug fix patch added to the patch queue yesterday. Signed-off-by: Mingming Cao <cmm@us.ibm.com> --- fs/ext4/inode.c | 32 ++++++++++++++++++++++++++++++-- 1 file changed, 30 insertions(+), 2 deletions(-) Index: linux-2.6.26-rc8/fs/ext4/inode.c =================================================================== --- linux-2.6.26-rc8.orig/fs/ext4/inode.c 2008-07-01 15:13:00.000000000 -0700 +++ linux-2.6.26-rc8/fs/ext4/inode.c 2008-07-01 15:34:19.000000000 -0700 @@ -1892,6 +1892,31 @@ out: return ret; } +/* + * Check if we should update i_disksize + * when write to the end of file but not require block allocation + */ +static int ext4_da_should_update_i_disksize(struct page *page, + unsigned long offset) +{ + struct buffer_head *head, *bh; + unsigned int curr_off = 0; + + head = page_buffers(page); + bh = head; + do { + unsigned int next_off = curr_off + bh->b_size; + + if ((curr_off >= offset) && + (!buffer_mapped || (buffer_delay(bh))) { + return 0; + } + curr_off = next_off; + } while ((bh = bh->b_this_page) != head); + + return 1; +} + static int ext4_da_write_end(struct file *file, struct address_space *mapping, loff_t pos, unsigned len, unsigned copied, @@ -1901,6 +1926,10 @@ static int ext4_da_write_end(struct file int ret = 0, ret2; handle_t *handle = ext4_journal_current_handle(); loff_t new_i_size; + unsigned long start, end; + + start = pos & (PAGE_CACHE_SIZE - 1); + end = start + copied; /* * generic_write_end() will run mark_inode_dirty() if i_size @@ -1910,8 +1939,7 @@ static int ext4_da_write_end(struct file new_i_size = pos + copied; if (new_i_size > EXT4_I(inode)->i_disksize) - if (!walk_page_buffers(NULL, page_buffers(page), - 0, len, NULL, ext4_bh_unmapped_or_delay)){ + if (ext4_da_should_update_i_disksize(page, end)) /* * Updating i_disksize when extending file without * need block allocation > Thanks again, > Gary > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Gentoo with ext4-patch-queue snapshots 2008-07-01 23:00 ` Mingming Cao @ 2008-07-01 17:50 ` Gary Hawco 2008-07-02 17:19 ` Mingming Cao 0 siblings, 1 reply; 13+ messages in thread From: Gary Hawco @ 2008-07-01 17:50 UTC (permalink / raw) To: Mingming Cao, Aneesh Kumar K.V, Theodore Tso, linux-ext4@vger.kernel.org Mingming, Can you post that patch somewhere for download? I access my email using Windows Vista, not in linux, so it would be very laborious to hand copy this patch and recreate it in linux. Updated the 2.6.26-rc8 kernel with the latest snapshot from today at 1833hrs GMT. All hell broke loose in Gentoo, The new kernel wouldn't allow the system to remount read/write on boot. But it worked fine in Slackware. Gentoo with the experimental openrc-0.2.5 and baselayout2 apparently does not like ext4. And for clarification, the patch you sent me is not yet in the queue? So to summarize my recent (as of the last seven days of snapshots) saga with ext4-queue-patch in Gentoo Through 062508/0019hrs no problems (although I do not believe delalloc was enabled) >From 062608/0042 - 062708/2353hrs had segfaulting when copying and tarring small files. No other detectable issues. >From 063008/1704 - 063008/2219hrs --no segfaulting but data corruption in one initiation file causing network interfaces to have to be manually activated. No data corruption if delalloc disabled latest snapshot 070108/1833hrs --system unable to be remounted read/write after fsck check. Manually booted to livecd with e2fsprogs-1.41WIP (17-June-2008). Efsck shows no filesystem problems. Rolling back to 063008/2219 snapshot fixes boot problems. No segfaulting. No data corruption IF delalloc disabled. Thanks, Gary ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Gentoo with ext4-patch-queue snapshots 2008-07-01 17:50 ` Gentoo with ext4-patch-queue snapshots Gary Hawco @ 2008-07-02 17:19 ` Mingming Cao 2008-07-02 20:33 ` Mingming Cao 0 siblings, 1 reply; 13+ messages in thread From: Mingming Cao @ 2008-07-02 17:19 UTC (permalink / raw) To: Gary Hawco; +Cc: Aneesh Kumar K.V, Theodore Tso, linux-ext4@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 1855 bytes --] On Tue, 2008-07-01 at 17:50 +0000, Gary Hawco wrote: > Mingming, > > Can you post that patch somewhere for download? I access my email using > Windows Vista, not in linux, so it would be very laborious to hand copy > this patch and recreate it in linux. > Patch attached. > Updated the 2.6.26-rc8 kernel with the latest snapshot from today at > 1833hrs GMT. All hell broke loose in Gentoo, The new kernel wouldn't allow > the system to remount read/write on boot. But it worked fine in Slackware. > Gentoo with the experimental openrc-0.2.5 and baselayout2 apparently does > not like ext4. > The only commit made yesterday was add the Add ext4-fix-online-resize-with-mballoc.patch It does not seem to impact the boot. Was the filesystem corrupted before you reboot with new kernel? If so the next reboot will probably refuse to remount it as it need fsck. > And for clarification, the patch you sent me is not yet in the queue? > Just pushed into the queue today. Please check. > So to summarize my recent (as of the last seven days of snapshots) saga > with ext4-queue-patch in Gentoo > > Through 062508/0019hrs no problems (although I do not believe delalloc was > enabled) > From 062608/0042 - 062708/2353hrs had segfaulting when copying and tarring > small files. No other detectable issues. > From 063008/1704 - 063008/2219hrs --no segfaulting but data corruption in > one initiation file causing network interfaces to have to be manually > activated. No data corruption if delalloc disabled > latest snapshot 070108/1833hrs --system unable to be remounted read/write > after fsck check. Manually booted to livecd with e2fsprogs-1.41WIP > (17-June-2008). Efsck shows no filesystem problems. > Rolling back to 063008/2219 snapshot fixes boot problems. No segfaulting. > No data corruption IF delalloc disabled. > > Thanks, > Gary > [-- Attachment #2: delalloc_i_disksize_update-fix.patch --] [-- Type: text/x-patch, Size: 2658 bytes --] Ext4: fix delalloc i_disksize early update issue From: Mingming Cao <cmm@us.ibm.com> Ext4_da_write_end() uses ext4_bh_unmapped_or_delay() function to check if it extend the file size without need for allocation. But at that time the buffer has not being dirtied yet (done in code later in block_commit_write()), so it always return true and update i_disksize (before block allocation). we could fix that ext4_da_write_end() to not use this helper function. This also fixed another issue: The i_disksize is updated at ext4_da_write_end() time if writes to the end of file, and the buffers are all have blocks allocated. But if the page had one buffer marked as buffer_delay, and the write is at EOF and on a buffer has block already allocated, we certainly need to extend the i_disksize. Signed-off-by: Mingming Cao <cmm@us.ibm.com> --- fs/ext4/inode.c | 32 ++++++++++++++++++++++++++++++-- 1 file changed, 30 insertions(+), 2 deletions(-) Index: linux-2.6.26-rc8/fs/ext4/inode.c =================================================================== --- linux-2.6.26-rc8.orig/fs/ext4/inode.c 2008-07-02 09:53:42.000000000 -0700 +++ linux-2.6.26-rc8/fs/ext4/inode.c 2008-07-02 10:01:09.000000000 -0700 @@ -1891,6 +1891,30 @@ out: return ret; } +/* + * Check if we should update i_disksize + * when write to the end of file but not require block allocation + */ +static int ext4_da_should_update_i_disksize(struct page *page, + unsigned long offset) +{ + struct buffer_head *head, *bh; + unsigned int curr_off = 0; + + head = page_buffers(page); + bh = head; + do { + unsigned int next_off = curr_off + bh->b_size; + + if (curr_off >= offset && + (!buffer_mapped(bh) || (buffer_delay(bh)))) + return 0; + curr_off = next_off; + } while ((bh = bh->b_this_page) != head); + + return 1; +} + static int ext4_da_write_end(struct file *file, struct address_space *mapping, loff_t pos, unsigned len, unsigned copied, @@ -1900,6 +1924,10 @@ static int ext4_da_write_end(struct file int ret = 0, ret2; handle_t *handle = ext4_journal_current_handle(); loff_t new_i_size; + unsigned long start, end; + + start = pos & (PAGE_CACHE_SIZE - 1); + end = start + copied; /* * generic_write_end() will run mark_inode_dirty() if i_size @@ -1909,8 +1937,7 @@ static int ext4_da_write_end(struct file new_i_size = pos + copied; if (new_i_size > EXT4_I(inode)->i_disksize) - if (!walk_page_buffers(NULL, page_buffers(page), - 0, len, NULL, ext4_bh_unmapped_or_delay)){ + if (ext4_da_should_update_i_disksize(page, end)) { /* * Updating i_disksize when extending file without * need block allocation ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Gentoo with ext4-patch-queue snapshots 2008-07-02 17:19 ` Mingming Cao @ 2008-07-02 20:33 ` Mingming Cao 2008-07-03 14:07 ` Aneesh Kumar 0 siblings, 1 reply; 13+ messages in thread From: Mingming Cao @ 2008-07-02 20:33 UTC (permalink / raw) To: Gary Hawco; +Cc: Aneesh Kumar K.V, Theodore Tso, linux-ext4@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 3279 bytes --] On Wed, 2008-07-02 at 10:19 -0700, Mingming Cao wrote: > On Tue, 2008-07-01 at 17:50 +0000, Gary Hawco wrote: > > Mingming, > > > > Can you post that patch somewhere for download? I access my email using > > Windows Vista, not in linux, so it would be very laborious to hand copy > > this patch and recreate it in linux. > > > Patch attached. Please use this patch instead, after discuss with Ted, I found an issue with the patch I sent to the list. ext4 patch queue is also updated with latest patch. Ext4: fix delalloc i_disksize early update issue From: Mingming Cao <cmm@us.ibm.com> Ext4_da_write_end() uses ext4_bh_unmapped_or_delay() function to check if it extend the file size without need for allocation. But at that time the buffer has not being dirtied yet (done in code later in block_commit_write()), so it always return true and update i_disksize (before block allocation). we could fix that ext4_da_write_end() to not use this helper function. This also fixed another issue: The i_disksize is updated at ext4_da_write_end() time if writes to the end of file, and the buffers are all have blocks allocated. But in the case blocksize < pagesize, and if the page has, say, the first buffer marked as buffer_delay, and the write is to EOF and on the third buffer, which has block already allocated, we certainly need to extend the i_disksize. Signed-off-by: Mingming Cao <cmm@us.ibm.com> --- fs/ext4/inode.c | 31 +++++++++++++++++++++++++++++-- 1 file changed, 29 insertions(+), 2 deletions(-) Index: linux-2.6.26-rc8/fs/ext4/inode.c =================================================================== --- linux-2.6.26-rc8.orig/fs/ext4/inode.c 2008-07-02 09:53:42.000000000 -0700 +++ linux-2.6.26-rc8/fs/ext4/inode.c 2008-07-02 13:22:52.000000000 -0700 @@ -1891,6 +1891,32 @@ out: return ret; } +/* + * Check if we should update i_disksize + * when write to the end of file but not require block allocation + */ +static int ext4_da_should_update_i_disksize(struct page *page, + unsigned long offset) +{ + struct buffer_head *head, *bh; + unsigned int curr_off = 0; + + head = page_buffers(page); + bh = head; + do { + unsigned int next_off = curr_off + bh->b_size; + + if (curr_off <= offset && offset < next_off) + if (!buffer_mapped(bh) || (buffer_delay(bh))) + return 0; + else + return 1; + curr_off = next_off; + } while ((bh = bh->b_this_page) != head); + + return 1; +} + static int ext4_da_write_end(struct file *file, struct address_space *mapping, loff_t pos, unsigned len, unsigned copied, @@ -1900,6 +1926,10 @@ static int ext4_da_write_end(struct file int ret = 0, ret2; handle_t *handle = ext4_journal_current_handle(); loff_t new_i_size; + unsigned long start, end; + + start = pos & (PAGE_CACHE_SIZE - 1); + end = start + copied; /* * generic_write_end() will run mark_inode_dirty() if i_size @@ -1909,8 +1939,7 @@ static int ext4_da_write_end(struct file new_i_size = pos + copied; if (new_i_size > EXT4_I(inode)->i_disksize) - if (!walk_page_buffers(NULL, page_buffers(page), - 0, len, NULL, ext4_bh_unmapped_or_delay)){ + if (ext4_da_should_update_i_disksize(page, end)) { /* * Updating i_disksize when extending file without * need block allocation [-- Attachment #2: delalloc_i_disksize_update-fix.patch --] [-- Type: text/x-patch, Size: 2703 bytes --] Ext4: fix delalloc i_disksize early update issue From: Mingming Cao <cmm@us.ibm.com> Ext4_da_write_end() uses ext4_bh_unmapped_or_delay() function to check if it extend the file size without need for allocation. But at that time the buffer has not being dirtied yet (done in code later in block_commit_write()), so it always return true and update i_disksize (before block allocation). we could fix that ext4_da_write_end() to not use this helper function. This also fixed another issue: The i_disksize is updated at ext4_da_write_end() time if writes to the end of file, and the buffers are all have blocks allocated. But if the page had one buffer marked as buffer_delay, and the write is at EOF and on a buffer has block already allocated, we certainly need to extend the i_disksize. Signed-off-by: Mingming Cao <cmm@us.ibm.com> --- fs/ext4/inode.c | 31 +++++++++++++++++++++++++++++-- 1 file changed, 29 insertions(+), 2 deletions(-) Index: linux-2.6.26-rc8/fs/ext4/inode.c =================================================================== --- linux-2.6.26-rc8.orig/fs/ext4/inode.c 2008-07-02 09:53:42.000000000 -0700 +++ linux-2.6.26-rc8/fs/ext4/inode.c 2008-07-02 13:22:52.000000000 -0700 @@ -1891,6 +1891,32 @@ out: return ret; } +/* + * Check if we should update i_disksize + * when write to the end of file but not require block allocation + */ +static int ext4_da_should_update_i_disksize(struct page *page, + unsigned long offset) +{ + struct buffer_head *head, *bh; + unsigned int curr_off = 0; + + head = page_buffers(page); + bh = head; + do { + unsigned int next_off = curr_off + bh->b_size; + + if (curr_off <= offset && offset < next_off) + if (!buffer_mapped(bh) || (buffer_delay(bh))) + return 0; + else + return 1; + curr_off = next_off; + } while ((bh = bh->b_this_page) != head); + + return 1; +} + static int ext4_da_write_end(struct file *file, struct address_space *mapping, loff_t pos, unsigned len, unsigned copied, @@ -1900,6 +1926,10 @@ static int ext4_da_write_end(struct file int ret = 0, ret2; handle_t *handle = ext4_journal_current_handle(); loff_t new_i_size; + unsigned long start, end; + + start = pos & (PAGE_CACHE_SIZE - 1); + end = start + copied; /* * generic_write_end() will run mark_inode_dirty() if i_size @@ -1909,8 +1939,7 @@ static int ext4_da_write_end(struct file new_i_size = pos + copied; if (new_i_size > EXT4_I(inode)->i_disksize) - if (!walk_page_buffers(NULL, page_buffers(page), - 0, len, NULL, ext4_bh_unmapped_or_delay)){ + if (ext4_da_should_update_i_disksize(page, end)) { /* * Updating i_disksize when extending file without * need block allocation ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Gentoo with ext4-patch-queue snapshots 2008-07-02 20:33 ` Mingming Cao @ 2008-07-03 14:07 ` Aneesh Kumar 2008-07-03 17:38 ` Mingming Cao 0 siblings, 1 reply; 13+ messages in thread From: Aneesh Kumar @ 2008-07-03 14:07 UTC (permalink / raw) To: linux-ext4 [sending via gmail ] On Thu, Jul 03, 2008 at 05:03:25PM +0530, Aneesh Kumar K.V wrote: > On Wed, Jul 02, 2008 at 10:19:54AM -0700, Mingming Cao wrote: > > > > On Tue, 2008-07-01 at 17:50 +0000, Gary Hawco wrote: > > > Mingming, > > > > > > Can you post that patch somewhere for download? I access my email using > > > Windows Vista, not in linux, so it would be very laborious to hand copy > > > this patch and recreate it in linux. > > > > > Patch attached. > > > > > Updated the 2.6.26-rc8 kernel with the latest snapshot from today at > > > 1833hrs GMT. All hell broke loose in Gentoo, The new kernel wouldn't allow > > > the system to remount read/write on boot. But it worked fine in Slackware. > > > Gentoo with the experimental openrc-0.2.5 and baselayout2 apparently does > > > not like ext4. > > > > > > > I think we need to protect i_disksize update with i_data_sem. Otherwise > a parallel writepages and write_end can cause issues. I guess that is > what Gary is finding. I also did some cleanup for the patch > better one moving ext4_truncate i_disksize update under i_data_sem. ext4_ext_truncate is already doing this. diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index fcaafe4..05e9790 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1893,18 +1893,29 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping, /* * Check if we should update i_disksize * when write to the end of file but not require block allocation + * We check only the buffer head mapping the offset. + * ex: File with blocksize 1K page size 4K + * block 1 and 2 are holes, block 3 is mapped and half filled + * seek to block 1 and write ( marked the buffer delay ) + * seek to block 3 and extent the end of file with end of file still + * falling within block 3. Here the writepages won't update the i_disksize + * properly because it allocate only block 1. So we need to update + * i_disksize in write_end checking only the offset + * */ -static int ext4_da_should_update_i_disksize(struct page *page, - unsigned long offset) +static int ext4_da_should_update_i_disksize(struct address_space *mapping, + struct page *page, unsigned long offset) { - struct buffer_head *bh; - unsigned int idx; int i; + unsigned int idx; + struct buffer_head *bh; + struct inode *inode = mapping->host; + unsigned blocksize = inode->i_sb->s_blocksize; bh = page_buffers(page); - idx = offset/bh->b_size; + idx = (offset + blocksize - 1)/blocksize; - for (i=0; i < idx; i++) + for (i = 1; i < idx; i++) bh = bh->b_this_page; if (!buffer_mapped(bh) || (buffer_delay(bh))) @@ -1934,15 +1945,20 @@ static int ext4_da_write_end(struct file *file, new_i_size = pos + copied; if (new_i_size > EXT4_I(inode)->i_disksize) - if (ext4_da_should_update_i_disksize(page, end)) { + if (ext4_da_should_update_i_disksize(mapping, page, end)) { /* * Updating i_disksize when extending file without * need block allocation */ - if (ext4_should_order_data(inode)) - ret = ext4_jbd2_file_inode(handle, inode); + down_write(&EXT4_I(inode)->i_data_sem); + if (new_i_size > EXT4_I(inode)->i_disksize) { + if (ext4_should_order_data(inode)) + ret = ext4_jbd2_file_inode(handle, + inode); - EXT4_I(inode)->i_disksize = new_i_size; + EXT4_I(inode)->i_disksize = new_i_size; + } + up_write(&EXT4_I(inode)->i_data_sem); } ret2 = generic_write_end(file, mapping, pos, len, copied, page, fsdata); @@ -2987,6 +3003,11 @@ void ext4_truncate(struct inode *inode) */ if (ext4_orphan_add(handle, inode)) goto out_stop; + /* + * From here we block out all ext4_get_block() callers who want to + * modify the block allocation tree. + */ + down_write(&ei->i_data_sem); /* * The orphan list entry will now protect us from any crash which @@ -2997,12 +3018,6 @@ void ext4_truncate(struct inode *inode) */ ei->i_disksize = inode->i_size; - /* - * From here we block out all ext4_get_block() callers who want to - * modify the block allocation tree. - */ - down_write(&ei->i_data_sem); ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: Gentoo with ext4-patch-queue snapshots 2008-07-03 14:07 ` Aneesh Kumar @ 2008-07-03 17:38 ` Mingming Cao 0 siblings, 0 replies; 13+ messages in thread From: Mingming Cao @ 2008-07-03 17:38 UTC (permalink / raw) To: Aneesh Kumar; +Cc: linux-ext4 在 2008-07-03四的 19:37 +0530,Aneesh Kumar写道: > [sending via gmail ] > > On Thu, Jul 03, 2008 at 05:03:25PM +0530, Aneesh Kumar K.V wrote: > > On Wed, Jul 02, 2008 at 10:19:54AM -0700, Mingming Cao wrote: > > > > > > On Tue, 2008-07-01 at 17:50 +0000, Gary Hawco wrote: > > > > Mingming, > > > > > > > > Can you post that patch somewhere for download? I access my email using > > > > Windows Vista, not in linux, so it would be very laborious to hand copy > > > > this patch and recreate it in linux. > > > > > > > Patch attached. > > > > > > > Updated the 2.6.26-rc8 kernel with the latest snapshot from today at > > > > 1833hrs GMT. All hell broke loose in Gentoo, The new kernel wouldn't allow > > > > the system to remount read/write on boot. But it worked fine in Slackware. > > > > Gentoo with the experimental openrc-0.2.5 and baselayout2 apparently does > > > > not like ext4. > > > > > > > > > > > I think we need to protect i_disksize update with i_data_sem. Otherwise > > a parallel writepages and write_end can cause issues. I guess that is > > what Gary is finding. I also did some cleanup for the patch > > > > better one moving ext4_truncate i_disksize update under i_data_sem. > ext4_ext_truncate is already doing this. > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index fcaafe4..05e9790 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -1893,18 +1893,29 @@ static int ext4_da_write_begin(struct file > *file, struct address_space *mapping, > /* > * Check if we should update i_disksize > * when write to the end of file but not require block allocation > + * We check only the buffer head mapping the offset. > + * ex: File with blocksize 1K page size 4K > + * block 1 and 2 are holes, block 3 is mapped and half filled > + * seek to block 1 and write ( marked the buffer delay ) > + * seek to block 3 and extent the end of file with end of file still > + * falling within block 3. Here the writepages won't update the i_disksize > + * properly because it allocate only block 1. So we need to update > + * i_disksize in write_end checking only the offset > + * > */ Okay, Iwill add this comment > -static int ext4_da_should_update_i_disksize(struct page *page, > - unsigned long offset) > +static int ext4_da_should_update_i_disksize(struct address_space *mapping, > + struct page *page, unsigned long offset) > { > - struct buffer_head *bh; > - unsigned int idx; > int i; > + unsigned int idx; > + struct buffer_head *bh; > + struct inode *inode = mapping->host; > + unsigned blocksize = inode->i_sb->s_blocksize; > We could use page->mapping->host to get the inode pointer, so no need to pass the mapping pointer. But I am fine with this. > bh = page_buffers(page); > - idx = offset/bh->b_size; > + idx = (offset + blocksize - 1)/blocksize; > > - for (i=0; i < idx; i++) > + for (i = 1; i < idx; i++) > bh = bh->b_this_page; > Ack > if (!buffer_mapped(bh) || (buffer_delay(bh))) > @@ -1934,15 +1945,20 @@ static int ext4_da_write_end(struct file *file, > > new_i_size = pos + copied; > if (new_i_size > EXT4_I(inode)->i_disksize) > - if (ext4_da_should_update_i_disksize(page, end)) { > + if (ext4_da_should_update_i_disksize(mapping, page, end)) { > /* > * Updating i_disksize when extending file without > * need block allocation > */ > - if (ext4_should_order_data(inode)) > - ret = ext4_jbd2_file_inode(handle, inode); > + down_write(&EXT4_I(inode)->i_data_sem); > + if (new_i_size > EXT4_I(inode)->i_disksize) { > + if (ext4_should_order_data(inode)) > + ret = ext4_jbd2_file_inode(handle, > + inode); > > - EXT4_I(inode)->i_disksize = new_i_size; > + EXT4_I(inode)->i_disksize = new_i_size; > + } > + up_write(&EXT4_I(inode)->i_data_sem); > } > ret2 = generic_write_end(file, mapping, pos, len, copied, > page, fsdata); > @@ -2987,6 +3003,11 @@ void ext4_truncate(struct inode *inode) > */ > if (ext4_orphan_add(handle, inode)) > goto out_stop; > + /* > + * From here we block out all ext4_get_block() callers who want to > + * modify the block allocation tree. > + */ > + down_write(&ei->i_data_sem); > > /* > * The orphan list entry will now protect us from any crash which > @@ -2997,12 +3018,6 @@ void ext4_truncate(struct inode *inode) > */ > ei->i_disksize = inode->i_size; > > - /* > - * From here we block out all ext4_get_block() callers who want to > - * modify the block allocation tree. > - */ > - down_write(&ei->i_data_sem); > - > if (n == 1) { /* direct blocks */ > ext4_free_data(handle, inode, NULL, i_data+offsets[0], > i_data + EXT4_NDIR_BLOCKS); > Ack. I will merge the update to the parent patch, thanx Mingming > -aneesh > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2008-07-03 17:38 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-06-25 13:53 Segmentation Faults with 062508 ext4-patch-queue snapshot Gary Hawco 2008-06-26 4:14 ` Eric Sandeen 2008-06-26 22:12 ` Segmentation Faults with both 062608 snapshots Gary Hawco 2008-07-01 2:32 ` Theodore Tso 2008-07-01 0:00 ` More ext4dev snapshot weirdness Gary Hawco 2008-07-01 16:02 ` Theodore Tso 2008-07-01 10:54 ` delalloc filesystem corruption Gary Hawco 2008-07-01 23:00 ` Mingming Cao 2008-07-01 17:50 ` Gentoo with ext4-patch-queue snapshots Gary Hawco 2008-07-02 17:19 ` Mingming Cao 2008-07-02 20:33 ` Mingming Cao 2008-07-03 14:07 ` Aneesh Kumar 2008-07-03 17:38 ` Mingming Cao
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).