From: "Vladimir V. Saveliev" <vs@namesys.com>
To: Linus Torvalds <torvalds@osdl.org>
Cc: reiserfs-dev@namesys.com, "Malte Schröder" <MalteSch@gmx.de>,
"Adrian Bunk" <bunk@stusta.de>, "Andrew Morton" <akpm@osdl.org>,
"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>
Subject: Re: 2.6.20-rc4: known unfixed regressions (v2)
Date: Thu, 11 Jan 2007 03:24:42 +0300 [thread overview]
Message-ID: <200701110324.42920.vs@namesys.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0701091022180.3594@woody.osdl.org>
Hello
On Tuesday 09 January 2007 21:30, Linus Torvalds wrote:
>
> On Tue, 9 Jan 2007, Malte Schröder wrote:
> >
> > > So something interesting is definitely going on, but I don't know exactly
> > > what it is. Why does reiserfs do the truncate as part of a close, if the
> > > same inode is actually mapped somewhere else?
on file close reiserfs tries to "pack" content of last incomplete page of file into metadata blocks.
It should not if that page is still mapped somewhere.
It does not actually truncate, it calls the same function which does truncate, but file size does not change.
Please consider the below patch.
From: Vladimir Saveliev <vs@namesys.com>
This patch fixes a confusion reiserfs has for a long time.
On release file operation reiserfs used to try to pack file data stored in last incomplete page of some files
into metadata blocks. After packing the page got cleared with clear_page_dirty.
It did not take into account that the page may be mmaped into
other process's address space. Recent replacement for clear_page_dirty cancel_dirty_page found the confusion
with sanity check that page has to be not mapped.
The patch fixes the confusion by making reiserfs to avoid tail packing if an inode was ever mmapped.
reiserfs_mmap and reiserfs_file_release are serialized with mutex in reiserfs specific inode.
reiserfs_mmap locks the mutex and sets a bit in reiserfs specific inode flags.
reiserfs_file_release checks the bit having the mutex locked. If bit is set - tail packing is avoided.
This eliminates a possibility that mmapped page gets cancel_page_dirty-ed.
Signed-off-by: Vladimir Saveliev <vs@namesys.com>
diff -puN fs/reiserfs/file.c~reiserfs-dont-convert-if-tail-page-mapped fs/reiserfs/file.c
--- linux-2.6.20-rc4/fs/reiserfs/file.c~reiserfs-dont-convert-if-tail-page-mapped 2007-01-11 02:09:19.000000000 +0300
+++ linux-2.6.20-rc4-vs/fs/reiserfs/file.c 2007-01-11 02:09:19.000000000 +0300
@@ -48,6 +48,11 @@ static int reiserfs_file_release(struct
}
mutex_lock(&inode->i_mutex);
+
+ mutex_lock(&(REISERFS_I(inode)->i_mmap));
+ if (REISERFS_I(inode)->i_flags & i_ever_mapped)
+ REISERFS_I(inode)->i_flags &= ~i_pack_on_close_mask;
+
reiserfs_write_lock(inode->i_sb);
/* freeing preallocation only involves relogging blocks that
* are already in the current transaction. preallocation gets
@@ -100,11 +105,24 @@ static int reiserfs_file_release(struct
err = reiserfs_truncate_file(inode, 0);
}
out:
+ mutex_unlock(&(REISERFS_I(inode)->i_mmap));
mutex_unlock(&inode->i_mutex);
reiserfs_write_unlock(inode->i_sb);
return err;
}
+static int reiserfs_file_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ struct inode *inode;
+
+ inode = file->f_path.dentry->d_inode;
+ mutex_lock(&(REISERFS_I(inode)->i_mmap));
+ REISERFS_I(inode)->i_flags |= i_ever_mapped;
+ mutex_unlock(&(REISERFS_I(inode)->i_mmap));
+
+ return generic_file_mmap(file, vma);
+}
+
static void reiserfs_vfs_truncate_file(struct inode *inode)
{
reiserfs_truncate_file(inode, 1);
@@ -1527,7 +1545,7 @@ const struct file_operations reiserfs_fi
#ifdef CONFIG_COMPAT
.compat_ioctl = reiserfs_compat_ioctl,
#endif
- .mmap = generic_file_mmap,
+ .mmap = reiserfs_file_mmap,
.open = generic_file_open,
.release = reiserfs_file_release,
.fsync = reiserfs_sync_file,
diff -puN fs/reiserfs/inode.c~reiserfs-dont-convert-if-tail-page-mapped fs/reiserfs/inode.c
--- linux-2.6.20-rc4/fs/reiserfs/inode.c~reiserfs-dont-convert-if-tail-page-mapped 2007-01-11 02:09:19.000000000 +0300
+++ linux-2.6.20-rc4-vs/fs/reiserfs/inode.c 2007-01-11 02:14:57.000000000 +0300
@@ -1125,6 +1125,7 @@ static void init_inode(struct inode *ino
REISERFS_I(inode)->i_prealloc_count = 0;
REISERFS_I(inode)->i_trans_id = 0;
REISERFS_I(inode)->i_jl = NULL;
+ mutex_init(&(REISERFS_I(inode)->i_mmap));
reiserfs_init_acl_access(inode);
reiserfs_init_acl_default(inode);
reiserfs_init_xattr_rwsem(inode);
@@ -1832,6 +1833,7 @@ int reiserfs_new_inode(struct reiserfs_t
REISERFS_I(inode)->i_attrs =
REISERFS_I(dir)->i_attrs & REISERFS_INHERIT_MASK;
sd_attrs_to_i_attrs(REISERFS_I(inode)->i_attrs, inode);
+ mutex_init(&(REISERFS_I(inode)->i_mmap));
reiserfs_init_acl_access(inode);
reiserfs_init_acl_default(inode);
reiserfs_init_xattr_rwsem(inode);
diff -puN include/linux/reiserfs_fs_i.h~reiserfs-dont-convert-if-tail-page-mapped include/linux/reiserfs_fs_i.h
--- linux-2.6.20-rc4/include/linux/reiserfs_fs_i.h~reiserfs-dont-convert-if-tail-page-mapped 2007-01-11 02:09:19.000000000 +0300
+++ linux-2.6.20-rc4-vs/include/linux/reiserfs_fs_i.h 2007-01-11 02:09:19.000000000 +0300
@@ -25,6 +25,7 @@ typedef enum {
i_link_saved_truncate_mask = 0x0020,
i_has_xattr_dir = 0x0040,
i_data_log = 0x0080,
+ i_ever_mapped = 0x0100
} reiserfs_inode_flags;
struct reiserfs_inode_info {
@@ -52,6 +53,7 @@ struct reiserfs_inode_info {
** flushed */
unsigned long i_trans_id;
struct reiserfs_journal_list *i_jl;
+ struct mutex i_mmap;
#ifdef CONFIG_REISERFS_FS_POSIX_ACL
struct posix_acl *i_acl_access;
struct posix_acl *i_acl_default;
_
> > > And if it's a race with two
> > > different CPU's (one doing a "munmap()" and the other doing a "close()",
> > > then the unmap should _still_ have actually unmapped the pages before it
> > > actually did _its_ "release()" call.
> >
> > This was on a single core. But with CONFIG_PREEMPT_VOLUNTARY=y.
> > It didn't happen again since then.
>
> Yeah, PREEMPT would be able to show most races like this too. In fact,
> some races show up much better with preemption than they do with real SMP.
>
> But I haven't looked at what exactly reiserfs does. I did check that the
> VM layer definitely does the remove_vma() stuff (that actually closes the
> files) _after_ it has unmapped everything. It would have surprised me if
> we had had that kind of bug, but still..
>
> Linus
next prev parent reply other threads:[~2007-01-11 0:25 UTC|newest]
Thread overview: 126+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-01-07 6:19 Linux 2.6.20-rc4 Linus Torvalds
2007-01-07 10:56 ` Jan Engelhardt
2007-01-07 11:44 ` Russell King
2007-01-07 13:06 ` OT: character encodings (was: Linux 2.6.20-rc4) Tilman Schmidt
2007-01-07 15:13 ` David Woodhouse
2007-01-07 15:38 ` Russell King
2007-01-07 16:29 ` David Woodhouse
2007-01-07 17:06 ` Russell King
2007-01-07 19:11 ` Jan Engelhardt
2007-01-07 19:20 ` Russell King
2007-01-07 20:48 ` Willy Tarreau
2007-01-07 23:37 ` Adrian Bunk
2007-01-08 0:38 ` Willy Tarreau
2007-01-08 1:03 ` Adrian Bunk
2007-01-08 1:14 ` Willy Tarreau
2007-01-08 1:45 ` Adrian Bunk
2007-01-08 6:52 ` Jan Engelhardt
2007-01-08 8:02 ` Adrian Bunk
2007-01-08 1:32 ` OT: character encodings Tilman Schmidt
2007-01-08 1:59 ` Adrian Bunk
2007-01-08 19:53 ` OT: character encodings (was: Linux 2.6.20-rc4) Valdis.Kletnieks
2007-01-07 19:29 ` OT: character encodings Tilman Schmidt
[not found] ` <20070107195051.GF21133@flint.arm.linux.org.uk>
[not found] ` <45A17645.1030905@imap.cc>
2007-01-08 1:53 ` David Woodhouse
2007-01-07 18:21 ` OT: character encodings (was: Linux 2.6.20-rc4) Alan
2007-01-07 19:12 ` Jan Engelhardt
2007-01-07 22:30 ` Alan
2007-01-08 1:22 ` Jan Engelhardt
2007-01-08 20:17 ` Jan Engelhardt
2007-01-08 22:00 ` Ken Moffat
2007-01-08 23:21 ` Jan Engelhardt
2007-01-08 23:34 ` Eberhard Moenkeberg
2007-01-08 16:14 ` Pavel Machek
2007-01-08 22:17 ` Tim Pepper
2007-01-08 23:30 ` Jan Engelhardt
2007-01-07 19:17 ` Russell King
2007-01-07 19:58 ` Robin Rosenberg
2007-01-07 20:05 ` Dave Jones
2007-01-07 20:15 ` Sean
2007-01-07 20:40 ` Jan Engelhardt
2007-01-07 21:07 ` Xavier Bestel
2007-01-08 4:42 ` David Woodhouse
2007-01-08 1:40 ` Horst H. von Brand
2007-01-07 13:23 ` Linux 2.6.20-rc4 Alan
2007-01-07 12:15 ` Akula2
2007-01-07 12:55 ` Russell King
2007-01-07 13:38 ` Akula2
2007-01-07 13:53 ` Willy Tarreau
2007-01-07 14:23 ` Akula2
2007-01-07 20:57 ` Peter Osterlund
2007-01-07 21:04 ` Peter Osterlund
2007-01-08 15:50 ` Dmitry Torokhov
2007-01-07 22:50 ` Linus Torvalds
2007-01-08 1:00 ` David Miller
2007-01-08 6:38 ` Peter Osterlund
2007-01-08 20:49 ` Peter Osterlund
2007-01-08 21:52 ` David Miller
2007-01-08 22:33 ` Patrick McHardy
2007-01-08 22:33 ` Patrick McHardy
2007-01-08 23:02 ` Peter Osterlund
2007-01-08 23:12 ` Linus Torvalds
2007-01-09 3:42 ` Adrian Bunk
2007-01-09 7:39 ` David Miller
2007-01-09 7:39 ` David Miller
2007-01-07 21:22 ` Gene Heskett
2007-01-08 0:22 ` 2.6.20-rc4: known unfixed regressions Adrian Bunk
2007-01-08 0:22 ` Adrian Bunk
2007-01-08 1:20 ` Bernhard Schmidt
2007-01-08 1:20 ` Bernhard Schmidt
2007-01-08 0:25 ` 2.6.20-rc4: known regressions with patches available Adrian Bunk
2007-01-08 0:25 ` Adrian Bunk
2007-01-08 0:33 ` [Bluez-devel] " Marcel Holtmann
2007-01-08 0:33 ` Marcel Holtmann
2007-01-08 0:33 ` Marcel Holtmann
2007-01-08 14:50 ` Linux 2.6.20-rc4 Mariusz Kozlowski
2007-01-08 14:50 ` Mariusz Kozlowski
2007-01-08 14:58 ` Sylvain Munaut
2007-01-08 14:58 ` Sylvain Munaut
2007-01-08 15:03 ` Mariusz Kozlowski
2007-01-08 15:03 ` Mariusz Kozlowski
2007-01-08 19:11 ` Jean Delvare
2007-01-08 19:11 ` Jean Delvare
2007-01-09 0:38 ` Benjamin Herrenschmidt
2007-01-09 0:38 ` Benjamin Herrenschmidt
2007-01-09 0:56 ` Greg KH
2007-01-09 0:56 ` Greg KH
2007-01-09 2:05 ` Benjamin Herrenschmidt
2007-01-09 2:05 ` Benjamin Herrenschmidt
2007-01-09 7:04 ` David Woodhouse
2007-01-09 7:04 ` David Woodhouse
2007-01-09 7:04 ` Sylvain Munaut
2007-01-09 7:04 ` Sylvain Munaut
2007-01-09 9:04 ` Benjamin Herrenschmidt
2007-01-09 9:04 ` Benjamin Herrenschmidt
2007-01-09 7:14 ` Sylvain Munaut
2007-01-09 7:14 ` Sylvain Munaut
2007-01-09 7:28 ` David Woodhouse
2007-01-09 7:28 ` David Woodhouse
2007-01-09 9:08 ` Benjamin Herrenschmidt
2007-01-09 9:08 ` Benjamin Herrenschmidt
2007-01-09 9:07 ` Benjamin Herrenschmidt
2007-01-09 9:07 ` Benjamin Herrenschmidt
2007-01-09 7:18 ` Greg KH
2007-01-09 7:18 ` Greg KH
2007-01-09 5:25 ` 2.6.20-rc4: known unfixed regressions (v2) Adrian Bunk
2007-01-09 5:25 ` Adrian Bunk
2007-01-09 17:58 ` Linus Torvalds
2007-01-09 18:08 ` Malte Schröder
2007-01-09 18:30 ` Linus Torvalds
2007-01-11 0:24 ` Vladimir V. Saveliev [this message]
2007-01-11 1:00 ` Nick Piggin
2007-01-11 13:12 ` Vladimir V. Saveliev
2007-01-11 23:53 ` Nick Piggin
2007-01-09 20:28 ` Adrian Bunk
2007-01-09 5:51 ` 2.6.20-rc4: known regressions with patches (v2) Adrian Bunk
2007-01-09 5:51 ` Adrian Bunk
2007-01-11 5:10 ` 2.6.20-rc4: known unfixed regressions (v3) Adrian Bunk
2007-01-11 6:43 ` Nick Piggin
2007-01-11 8:45 ` Adrian Bunk
2007-01-11 10:21 ` Jiri Kosina
2007-01-11 10:54 ` Adrian Bunk
2007-01-11 11:08 ` CIJOML
[not found] ` <Pine.LNX.4.64.0701062216210.3661-AgDkxUvNf0y7TbgM5vRIOg@public.gmane.org>
2007-01-11 5:13 ` 2.6.20-rc4: known regressions with patches (v3) Adrian Bunk
2007-01-11 5:13 ` Adrian Bunk
2007-01-11 21:39 ` David Chinner
2007-01-11 22:02 ` Andrew Morton
2007-01-11 23:05 ` David Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200701110324.42920.vs@namesys.com \
--to=vs@namesys.com \
--cc=MalteSch@gmx.de \
--cc=akpm@osdl.org \
--cc=bunk@stusta.de \
--cc=linux-kernel@vger.kernel.org \
--cc=reiserfs-dev@namesys.com \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.