linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PULL REQUEST] ext3, jbd, ext2, and quota fixes for 3.1-rc1
@ 2011-07-26 18:14 Jan Kara
  2011-07-26 18:36 ` Linus Torvalds
  0 siblings, 1 reply; 8+ messages in thread
From: Jan Kara @ 2011-07-26 18:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: LKML, linux-fsdevel


  Hello Linus,

  could you please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6.git for_linus

Interesting patches in the queue:
  Fix of a long standing data corruption in data=journal mode (or a symlink
corruption in all journalling modes).
  Fix of a NULL pointer dereference in ext3 block reservation code.
  Fix of a NULL pointer dereference in jbd.
The rest is mostly minor fixes and cleanups to ext2, ext3, jbd, and quota.

The full shortlog is:

Akinobu Mita (1):
      ext3: use proper little-endian bitops

Bernd Schubert (1):
      ext3: Fix compilation with -DDX_DEBUG

Ding Dinghua (1):
      jbd: fix a bug of leaking jh->b_jcount

H Hartley Sweeten (1):
      ext3/ioctl.c: silence sparse warnings about different address spaces

Jan Kara (7):
      ext3: Convert ext3 to new truncate calling convention
      jbd: remove dependency on __GFP_NOFAIL
      ext3: Fix oops in ext3_try_to_allocate_with_rsv()
      ext3: Improve truncate error handling
      jbd: Fix oops in journal_remove_journal_head()
      quota: Remove unused declaration
      ext3: Fix data corruption in inodes with journalled data

Lukas Czerner (4):
      ext3: Add fixed tracepoints
      jbd: Add fixed tracepoints
      ext3/ext4 Documentation: remove bh/nobh since it has been deprecated
      ext3: Return -EINVAL when start is beyond the end of fs in ext3_trim_fs()

Petr Uzel (1):
      ext2: include fs.h into ext2_fs.h

Tao Ma (1):
      jbd: Use WRITE_SYNC in journal checkpoint.

Wang Sheng-Hui (3):
      ext2: check xattr name_len before acquiring xattr_sem in ext2_xattr_get
      ext3.txt: update the links in the section "useful links" to the latest ones
      jbd: change the field "b_cow_tid" of struct journal_head from type unsigned to tid_t

The diffstat is

 Documentation/filesystems/ext3.txt |   13 +-
 Documentation/filesystems/ext4.txt |   23 +-
 fs/ext2/xattr.c                    |   10 +-
 fs/ext3/balloc.c                   |   38 +-
 fs/ext3/file.c                     |    1 -
 fs/ext3/fsync.c                    |   15 +-
 fs/ext3/ialloc.c                   |    4 +
 fs/ext3/inode.c                    |  193 ++++++---
 fs/ext3/ioctl.c                    |    4 +-
 fs/ext3/namei.c                    |    7 +-
 fs/ext3/super.c                    |   13 +
 fs/ext3/xattr.c                    |   12 +-
 fs/jbd/checkpoint.c                |   37 +-
 fs/jbd/commit.c                    |   57 ++-
 fs/jbd/journal.c                   |   99 ++---
 fs/jbd/transaction.c               |   83 ++--
 include/linux/ext2_fs.h            |    1 +
 include/linux/ext3_fs.h            |    7 +-
 include/linux/jbd.h                |    1 -
 include/linux/journal-head.h       |    2 +-
 include/linux/quota.h              |    8 -
 include/trace/events/ext3.h        |  864 ++++++++++++++++++++++++++++++++++++
 include/trace/events/jbd.h         |  203 +++++++++
 23 files changed, 1423 insertions(+), 272 deletions(-)
 create mode 100644 include/trace/events/ext3.h
 create mode 100644 include/trace/events/jbd.h

							Thanks
								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PULL REQUEST] ext3, jbd, ext2, and quota fixes for 3.1-rc1
  2011-07-26 18:14 [PULL REQUEST] ext3, jbd, ext2, and quota fixes for 3.1-rc1 Jan Kara
@ 2011-07-26 18:36 ` Linus Torvalds
  2011-07-26 18:52   ` Al Viro
  2011-07-26 20:10   ` Jan Kara
  0 siblings, 2 replies; 8+ messages in thread
From: Linus Torvalds @ 2011-07-26 18:36 UTC (permalink / raw)
  To: Jan Kara, Al Viro, Josef Bacik; +Cc: LKML, linux-fsdevel

On Tue, Jul 26, 2011 at 11:14 AM, Jan Kara <jack@suse.cz> wrote:
>
>  could you please pull from
>
> git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6.git for_linus

Ok, this clashed with the fsync mutex pushdown, and the whole addition
of fixed tracepoints.

Quite frankly, I think the fixed tracepoints are broken and make the
code unreadable (why have them?) but I fixed it up.

Somebody should really double-check the resolve. That's especially
true since the whole i_mutex thing is *also* rather dubious. The
comment that moved that down says:

+       /*
+        * Taking the mutex here just to keep consistent with how fsync was
+        * called previously, however it looks like we don't need to take
+        * i_mutex at all.
+        */

but in fact it is *not* consistent with how fsync() used to be called,
since we then drop the mutex *before* doing

    return ext3_force_commit(inode->i_sb);

for the should_journal_data case.

See commit 02c24a82187 ("fs: push i_mutex and filemap_write_and_wait
down into ->fsync() handlers").

I resolved it with the mutex still dropped early (especially since the
comment implies it may not matter at all), but quite frankly,
everything I did around that resolve made me go "that code is just
WRONG". Both wrt the tracepoints and wrt the i_mutex.

So I think my resolution is "correct" from a merge standpoint, but I
think the code is total crap. I also wonder whether you can really do
that

    J_ASSERT(ext3_journal_current_handle() == NULL);

without holding the i_mutex, so I moved that back down again.

So I *really* want people to take a look at that ext3_sync_file()
function. Please?

                      Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PULL REQUEST] ext3, jbd, ext2, and quota fixes for 3.1-rc1
  2011-07-26 18:36 ` Linus Torvalds
@ 2011-07-26 18:52   ` Al Viro
  2011-07-26 18:59     ` Christoph Hellwig
  2011-07-26 20:16     ` Jan Kara
  2011-07-26 20:10   ` Jan Kara
  1 sibling, 2 replies; 8+ messages in thread
From: Al Viro @ 2011-07-26 18:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jan Kara, Josef Bacik, LKML, linux-fsdevel

On Tue, Jul 26, 2011 at 11:36:29AM -0700, Linus Torvalds wrote:

> So I *really* want people to take a look at that ext3_sync_file()
> function. Please?

While we are at it, could somebody please explain what the hell is ext4
doing in
static int ext4_sync_parent(struct inode *inode)
{
        struct writeback_control wbc;
        struct dentry *dentry = NULL;
        int ret = 0;

        while (inode && ext4_test_inode_state(inode, EXT4_STATE_NEWENTRY)) {
                ext4_clear_inode_state(inode, EXT4_STATE_NEWENTRY);
                dentry = list_entry(inode->i_dentry.next,
                                    struct dentry, d_alias);
                if (!dentry || !dentry->d_parent || !dentry->d_parent->d_inode)
                        break;
                inode = dentry->d_parent->d_inode;
                ret = sync_mapping_buffers(inode->i_mapping);
		...
Note that dentry obviously can't be NULL there.  dentry->d_parent is never
NULL.  And dentry->d_parent would better not be negative, for crying out
loud!  What's worse, there's no guarantees that dentry->d_parent will
remain our parent over that sync_mapping_buffers() *and* that inode won't
just be freed under us (after rename() and memory pressure leading to
eviction of what used to be our dentry->d_parent).  Moreover, even if
inode survives in icache, there is no promise that it will have an alias
in dcache by the time we get to the next iteration of the loop, so this
list_entry() next time around can bloody well happen to &inode->i_dentry,
dentry being a garbage address somewhere inside that struct inode (or a
bit above it - I hadn't compared offsets).

What the hell is going on there?  It appeared more than a year ago in commit
14ece1028b3ed53ffec1b1213ffc6acaf79ad77c
Author: Frank Mayhar <fmayhar@google.com>
Date:   Mon May 17 08:00:00 2010 -0400
ext4: Make fsync sync new parent directories in no-journal mode

and it had remained broken ever after...

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PULL REQUEST] ext3, jbd, ext2, and quota fixes for 3.1-rc1
  2011-07-26 18:52   ` Al Viro
@ 2011-07-26 18:59     ` Christoph Hellwig
  2011-07-26 19:14       ` Linus Torvalds
  2011-07-26 20:16     ` Jan Kara
  1 sibling, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2011-07-26 18:59 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, Jan Kara, Josef Bacik, LKML, linux-fsdevel

On Tue, Jul 26, 2011 at 07:52:20PM +0100, Al Viro wrote:
> Note that dentry obviously can't be NULL there.  dentry->d_parent is never
> NULL.  And dentry->d_parent would better not be negative, for crying out
> loud!  What's worse, there's no guarantees that dentry->d_parent will
> remain our parent over that sync_mapping_buffers() *and* that inode won't
> just be freed under us (after rename() and memory pressure leading to
> eviction of what used to be our dentry->d_parent).  Moreover, even if
> inode survives in icache, there is no promise that it will have an alias
> in dcache by the time we get to the next iteration of the loop, so this
> list_entry() next time around can bloody well happen to &inode->i_dentry,
> dentry being a garbage address somewhere inside that struct inode (or a
> bit above it - I hadn't compared offsets).

In addition to beeing bogus the code also is useless.  fsync on a file
explicitly does not guarantee anything at all about the parent, and
never really has on Linux either.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PULL REQUEST] ext3, jbd, ext2, and quota fixes for 3.1-rc1
  2011-07-26 18:59     ` Christoph Hellwig
@ 2011-07-26 19:14       ` Linus Torvalds
  2011-07-27  0:31         ` Ted Ts'o
  0 siblings, 1 reply; 8+ messages in thread
From: Linus Torvalds @ 2011-07-26 19:14 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Al Viro, Jan Kara, Josef Bacik, LKML, linux-fsdevel

On Tue, Jul 26, 2011 at 11:59 AM, Christoph Hellwig <hch@infradead.org> wrote:
>
> In addition to beeing bogus the code also is useless.  fsync on a file
> explicitly does not guarantee anything at all about the parent, and
> never really has on Linux either.

Well, it may never have done that, but it might still be a case of
quality-of-implementation.

The data blocks and inode indirect blocks being stable on disk doesn't
help hugely if you cannnot actually reach the inode itself.

But yeah, I suspect it's not worth the bother. For many common
situations, it's the rename that moves it to the final place that is
the critical one as far as parent directory is concerned, not the
fsync of the file itself.

                     Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PULL REQUEST] ext3, jbd, ext2, and quota fixes for 3.1-rc1
  2011-07-26 18:36 ` Linus Torvalds
  2011-07-26 18:52   ` Al Viro
@ 2011-07-26 20:10   ` Jan Kara
  1 sibling, 0 replies; 8+ messages in thread
From: Jan Kara @ 2011-07-26 20:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jan Kara, Al Viro, Josef Bacik, LKML, linux-fsdevel

On Tue 26-07-11 11:36:29, Linus Torvalds wrote:
> On Tue, Jul 26, 2011 at 11:14 AM, Jan Kara <jack@suse.cz> wrote:
> >
> >  could you please pull from
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6.git for_linus
> 
> Ok, this clashed with the fsync mutex pushdown, and the whole addition
> of fixed tracepoints.
> 
> Quite frankly, I think the fixed tracepoints are broken and make the
> code unreadable (why have them?) but I fixed it up.
> 
> Somebody should really double-check the resolve. That's especially
> true since the whole i_mutex thing is *also* rather dubious. The
> comment that moved that down says:
> 
> +       /*
> +        * Taking the mutex here just to keep consistent with how fsync was
> +        * called previously, however it looks like we don't need to take
> +        * i_mutex at all.
> +        */
> 
> but in fact it is *not* consistent with how fsync() used to be called,
> since we then drop the mutex *before* doing
> 
>     return ext3_force_commit(inode->i_sb);
> 
> for the should_journal_data case.
> 
> See commit 02c24a82187 ("fs: push i_mutex and filemap_write_and_wait
> down into ->fsync() handlers").
> 
> I resolved it with the mutex still dropped early (especially since the
> comment implies it may not matter at all), but quite frankly,
> everything I did around that resolve made me go "that code is just
> WRONG". Both wrt the tracepoints and wrt the i_mutex.
> 
> So I think my resolution is "correct" from a merge standpoint, but I
> think the code is total crap. I also wonder whether you can really do
> that
> 
>     J_ASSERT(ext3_journal_current_handle() == NULL);
> 
> without holding the i_mutex, so I moved that back down again.
> 
> So I *really* want people to take a look at that ext3_sync_file()
> function. Please?
  ext3_sync_file() really does not need i_mutex for anything. I've just
checked how you resolved the conflict and it looks nice. I'll queue a patch
which just removes i_mutex from that function altogether. Thanks for the
resolve.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PULL REQUEST] ext3, jbd, ext2, and quota fixes for 3.1-rc1
  2011-07-26 18:52   ` Al Viro
  2011-07-26 18:59     ` Christoph Hellwig
@ 2011-07-26 20:16     ` Jan Kara
  1 sibling, 0 replies; 8+ messages in thread
From: Jan Kara @ 2011-07-26 20:16 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, Jan Kara, Josef Bacik, LKML, linux-fsdevel,
	Ted Tso

On Tue 26-07-11 19:52:20, Al Viro wrote:
> On Tue, Jul 26, 2011 at 11:36:29AM -0700, Linus Torvalds wrote:
> 
> > So I *really* want people to take a look at that ext3_sync_file()
> > function. Please?
> 
> While we are at it, could somebody please explain what the hell is ext4
> doing in
> static int ext4_sync_parent(struct inode *inode)
> {
>         struct writeback_control wbc;
>         struct dentry *dentry = NULL;
>         int ret = 0;
> 
>         while (inode && ext4_test_inode_state(inode, EXT4_STATE_NEWENTRY)) {
>                 ext4_clear_inode_state(inode, EXT4_STATE_NEWENTRY);
>                 dentry = list_entry(inode->i_dentry.next,
>                                     struct dentry, d_alias);
>                 if (!dentry || !dentry->d_parent || !dentry->d_parent->d_inode)
>                         break;
>                 inode = dentry->d_parent->d_inode;
>                 ret = sync_mapping_buffers(inode->i_mapping);
> 		...
> Note that dentry obviously can't be NULL there.  dentry->d_parent is never
> NULL.  And dentry->d_parent would better not be negative, for crying out
> loud!  What's worse, there's no guarantees that dentry->d_parent will
> remain our parent over that sync_mapping_buffers() *and* that inode won't
> just be freed under us (after rename() and memory pressure leading to
> eviction of what used to be our dentry->d_parent).  Moreover, even if
> inode survives in icache, there is no promise that it will have an alias
> in dcache by the time we get to the next iteration of the loop, so this
> list_entry() next time around can bloody well happen to &inode->i_dentry,
> dentry being a garbage address somewhere inside that struct inode (or a
> bit above it - I hadn't compared offsets).
> 
> What the hell is going on there?  It appeared more than a year ago in commit
> 14ece1028b3ed53ffec1b1213ffc6acaf79ad77c
> Author: Frank Mayhar <fmayhar@google.com>
> Date:   Mon May 17 08:00:00 2010 -0400
> ext4: Make fsync sync new parent directories in no-journal mode
> 
> and it had remained broken ever after...
  It really looks broken. Added Ted to CC...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PULL REQUEST] ext3, jbd, ext2, and quota fixes for 3.1-rc1
  2011-07-26 19:14       ` Linus Torvalds
@ 2011-07-27  0:31         ` Ted Ts'o
  0 siblings, 0 replies; 8+ messages in thread
From: Ted Ts'o @ 2011-07-27  0:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Hellwig, Al Viro, Jan Kara, Josef Bacik, LKML,
	linux-fsdevel

On Tue, Jul 26, 2011 at 12:14:28PM -0700, Linus Torvalds wrote:
> On Tue, Jul 26, 2011 at 11:59 AM, Christoph Hellwig <hch@infradead.org> wrote:
> >
> > In addition to beeing bogus the code also is useless.  fsync on a file
> > explicitly does not guarantee anything at all about the parent, and
> > never really has on Linux either.
> 
> Well, it may never have done that, but it might still be a case of
> quality-of-implementation.
> 
> The data blocks and inode indirect blocks being stable on disk doesn't
> help hugely if you cannnot actually reach the inode itself.

Yeah, that's why it was done.  Frank found that with power-fail
testing, a large number of files that were freshly created and then
fsync()'ed would disappear.  If the data is supposed to be available
after a power failure, then you have to be able to get to it somehow.

I agree what's there isn't safe, and needs to be fixed.  I'll deal
with it.

						- Ted

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-07-27  0:31 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-26 18:14 [PULL REQUEST] ext3, jbd, ext2, and quota fixes for 3.1-rc1 Jan Kara
2011-07-26 18:36 ` Linus Torvalds
2011-07-26 18:52   ` Al Viro
2011-07-26 18:59     ` Christoph Hellwig
2011-07-26 19:14       ` Linus Torvalds
2011-07-27  0:31         ` Ted Ts'o
2011-07-26 20:16     ` Jan Kara
2011-07-26 20:10   ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).