From: Andrew Morton <akpm@linux-foundation.org>
To: Bernd Schubert <bs@q-leap.de>
Cc: "Michal Piotrowski" <michal.k.k.piotrowski@gmail.com>,
"Bernd Schubert" <bschubert@q-leap.de>,
linux-kernel@vger.kernel.org
Subject: Re: mkfs.ext2 triggered softlockup
Date: Wed, 16 May 2007 11:41:00 -0700 [thread overview]
Message-ID: <20070516114100.9cd642b8.akpm@linux-foundation.org> (raw)
In-Reply-To: <200705161901.09072.bs@q-leap.de>
On Wed, 16 May 2007 19:01:08 +0200
Bernd Schubert <bs@q-leap.de> wrote:
> On Wednesday 16 May 2007 18:49:57 Michal Piotrowski wrote:
> > Hi Bernd,
> >
> > On 16/05/07, Bernd Schubert <bschubert@q-leap.de> wrote:
> > > Maybe you still remember my report about an mkfs.ext2 triggered ram disk
> > > corruption?
> > >
> > > http://lkml.org/lkml/2007/5/4/272
> > >
> > > Well, in principle I'm now doing the same stuff, only this time with
> > > another initrd, which mounts the root-fs over nfs.
> > >
> > > [ 1596.928552] BUG: soft lockup detected on CPU#2!
> > > [ 1596.933109]
> > > [ 1596.933110] Call Trace:
> > > [ 1596.933111] <IRQ> [<ffffffff8025167b>] softlockup_tick+0xd8/0xef
> > > [ 1596.933129] [<ffffffff802329f8>] run_local_timers+0x13/0x15
> > > [ 1596.933132] [<ffffffff80232a44>] update_process_times+0x4a/0x77
> > > [ 1596.933138] [<ffffffff8021434b>] smp_local_timer_interrupt+0x34/0x54
> > > [ 1596.933143] [<ffffffff802143cc>] smp_apic_timer_interrupt+0x61/0x78
> > > [ 1596.933147] [<ffffffff8020a29b>] apic_timer_interrupt+0x6b/0x70
> > > [ 1596.933151] <EOI> [<ffffffff80299dff>] free_buffer_head+0x24/0x3e
> > > [ 1596.933162] [<ffffffff80272a63>] kmem_cache_free+0x1f4/0x201
> > > [ 1596.933170] [<ffffffff80299dff>] free_buffer_head+0x24/0x3e
> > > [ 1596.933175] [<ffffffff80299ea1>] try_to_free_buffers+0x88/0x9f
> > > [ 1596.933181] [<ffffffff802565a9>] try_to_release_page+0x39/0x40
> > > [ 1596.933188] [<ffffffff8025b76d>] invalidate_mapping_pages+0x9d/0x121
> > > [ 1596.933196] [<ffffffff8025b800>] invalidate_inode_pages+0xf/0x11
> > > [ 1596.933200] [<ffffffff80299053>] invalidate_bdev+0x3b/0x3f
> > > [ 1596.933203] [<ffffffff8029c9ee>] kill_bdev+0x13/0x29
> > > [ 1596.933208] [<ffffffff8029d6e8>] __blkdev_put+0x62/0x141
> > > [ 1596.933213] [<ffffffff8029db62>] blkdev_put+0xb/0xd
> > > [ 1596.933218] [<ffffffff8029dbf7>] blkdev_close+0x2e/0x33
> > > [ 1596.933222] [<ffffffff8027a3c3>] __fput+0xc3/0x172
> > > [ 1596.933228] [<ffffffff8027a486>] fput+0x14/0x16
> > > [ 1596.933233] [<ffffffff80278c4f>] filp_close+0x61/0x6d
> > > [ 1596.933238] [<ffffffff80278ce7>] sys_close+0x8c/0xce
> > > [ 1596.933244] [<ffffffff8020965e>] system_call+0x7e/0x83
> > > [ 1596.933250]
> >
> > Can you tell me which kernel version you are using?
>
> Sorry, forgot that. I think 2.6.20.6 or 2.6.20.7 (I always rename them to .3,
> for some reasons thats easier than to change our tftp-rembo config). The
> kernel is patches with lustre patches, hmm, one of them also adds a read-only
> test to the block device layer.
> Probably I should test a vanilla kernel. Going to do that now...
>
Don't bother - it'll happen here too.
I assume the disk is large, and that the machine has a lot of RAM?
Root cause: I suck.
From: Andrew Morton <akpm@linux-foundation.org>
invalidate_mapping_pages() can sometimes take a long time (millions of pages
to free). Long enough for the softlockup detector to trigger.
We used to have a cond_resched() in there but I took it out because the
drop_caches code calls invalidate_mapping_pages() under inode_lock.
The patch adds a nasty flag and puts the cond_resched() back.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/drop_caches.c | 2 +-
include/linux/fs.h | 3 +++
mm/truncate.c | 38 +++++++++++++++++++++++---------------
3 files changed, 27 insertions(+), 16 deletions(-)
diff -puN fs/drop_caches.c~invalidate_mapping_pages-add-cond_resched fs/drop_caches.c
--- a/fs/drop_caches.c~invalidate_mapping_pages-add-cond_resched
+++ a/fs/drop_caches.c
@@ -20,7 +20,7 @@ static void drop_pagecache_sb(struct sup
list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
if (inode->i_state & (I_FREEING|I_WILL_FREE))
continue;
- invalidate_mapping_pages(inode->i_mapping, 0, -1);
+ __invalidate_mapping_pages(inode->i_mapping, 0, -1, true);
}
spin_unlock(&inode_lock);
}
diff -puN include/linux/fs.h~invalidate_mapping_pages-add-cond_resched include/linux/fs.h
--- a/include/linux/fs.h~invalidate_mapping_pages-add-cond_resched
+++ a/include/linux/fs.h
@@ -1583,6 +1583,9 @@ extern int __invalidate_device(struct bl
extern int invalidate_partition(struct gendisk *, int);
#endif
extern int invalidate_inodes(struct super_block *);
+unsigned long __invalidate_mapping_pages(struct address_space *mapping,
+ pgoff_t start, pgoff_t end,
+ bool be_atomic);
unsigned long invalidate_mapping_pages(struct address_space *mapping,
pgoff_t start, pgoff_t end);
diff -puN mm/truncate.c~invalidate_mapping_pages-add-cond_resched mm/truncate.c
--- a/mm/truncate.c~invalidate_mapping_pages-add-cond_resched
+++ a/mm/truncate.c
@@ -253,21 +253,8 @@ void truncate_inode_pages(struct address
}
EXPORT_SYMBOL(truncate_inode_pages);
-/**
- * invalidate_mapping_pages - Invalidate all the unlocked pages of one inode
- * @mapping: the address_space which holds the pages to invalidate
- * @start: the offset 'from' which to invalidate
- * @end: the offset 'to' which to invalidate (inclusive)
- *
- * This function only removes the unlocked pages, if you want to
- * remove all the pages of one inode, you must call truncate_inode_pages.
- *
- * invalidate_mapping_pages() will not block on IO activity. It will not
- * invalidate pages which are dirty, locked, under writeback or mapped into
- * pagetables.
- */
-unsigned long invalidate_mapping_pages(struct address_space *mapping,
- pgoff_t start, pgoff_t end)
+unsigned long __invalidate_mapping_pages(struct address_space *mapping,
+ pgoff_t start, pgoff_t end, bool be_atomic)
{
struct pagevec pvec;
pgoff_t next = start;
@@ -308,9 +295,30 @@ unlock:
break;
}
pagevec_release(&pvec);
+ if (likely(!be_atomic))
+ cond_resched();
}
return ret;
}
+
+/**
+ * invalidate_mapping_pages - Invalidate all the unlocked pages of one inode
+ * @mapping: the address_space which holds the pages to invalidate
+ * @start: the offset 'from' which to invalidate
+ * @end: the offset 'to' which to invalidate (inclusive)
+ *
+ * This function only removes the unlocked pages, if you want to
+ * remove all the pages of one inode, you must call truncate_inode_pages.
+ *
+ * invalidate_mapping_pages() will not block on IO activity. It will not
+ * invalidate pages which are dirty, locked, under writeback or mapped into
+ * pagetables.
+ */
+unsigned long invalidate_mapping_pages(struct address_space *mapping,
+ pgoff_t start, pgoff_t end)
+{
+ return __invalidate_mapping_pages(mapping, start, end, false);
+}
EXPORT_SYMBOL(invalidate_mapping_pages);
/*
_
prev parent reply other threads:[~2007-05-16 18:43 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-16 16:43 mkfs.ext2 triggered softlockup Bernd Schubert
2007-05-16 16:49 ` Michal Piotrowski
2007-05-16 17:01 ` Bernd Schubert
2007-05-16 18:41 ` Andrew Morton [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070516114100.9cd642b8.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=bs@q-leap.de \
--cc=bschubert@q-leap.de \
--cc=linux-kernel@vger.kernel.org \
--cc=michal.k.k.piotrowski@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.