Re: [Patch] ext3_journal_stop inode access

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [Patch] ext3_journal_stop inode access
       [not found] <1048185825.2491.386.camel@sisko.scot.redhat.com>
@ 2003-03-20 21:15 ` Andrew Morton
  2003-03-20 21:36   ` Stephen C. Tweedie
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2003-03-20 21:15 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: ext3-users, linux-kernel, linux-fsdevel

"Stephen C. Tweedie" <sct@redhat.com> wrote:
>
> Hi Andrew,
> 
> The patch below addresses the problem we were talking about earlier
> where ext3_writepage ends up accessing the inode after the page lock has
> been dropped (and hence at a point where it is possible for the inode to
> have been reclaimed.)  Tested minimally (it builds and boots.)

Burton has confirmed that removing the

	inode->i_sb->s_dirt = 1;

line makes the oopses go away, so this will fix it.

> It makes ext3_journal_stop take an sb, not an inode, as its final
> parameter.

argh.  I wrote and tested a patch too.  That patch puts the superblock
pointer into the new journal->j_private and removes the second arg to
ext3_journal_start altogether.

I went that way just to save a little text.  Because ext3_journal_start/stop
need to be uninlined - that saves 5.5 kbytes of text.

Which do you think is best?  If you're planning on patching 2.4 and if you
want to do that by passing the superblock pointer in, then we should go that
way in 2.5 too, keep things in sync.

> It also sets sb->s_need_sync_fs, not sb->s_dirt, as setting
> s_dirt was only ever a workaround for the lack of a proper sync-fs
> mechanism.
> 
> Btw, we clear s_need_sync_fs in sync_filesystems().  Don't we also need
> to do the same in fsync_super()?

The intent of s_need_sync_fs is to avoid livelock in sync_filesystems().

Imagine that two filesytems are being continually dirtied.  It would be very,
very easy for a sync_filesystems() caller to never terminate.  This is a
repeated problem with lists of dirty objects which need cleaning, in which
only the head-of-list is stable outside the lock.

So sync_filesystems() will tag all the filesystems on the first pass and then
only sync tagged filesytems on the second pass.  No livelock.  (This still
means that new sync() callers will cause older sync() callers to perform a
second round of syncing.  I guess I should slap a mutex around the whole
thing to prevent that).

So s_need_sync_fs is "private to sync_fileystems".  I should have commented
that, sorry.

sync_filesystems() will even call call ->sync_fs against non-s_dirt
filesystems, which seems a little odd.

The reason for this is that ext3_write_super() will clear s_dirt and _not_
wait on the writeout.  So if sync_filesystems() happened to be called against
a filesystem shortly after its ->write_super() had been called,
sync_filesystems() would incorrectly assume that the filesystem was fully
synced.

I shall comment that, too.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Patch] ext3_journal_stop inode access
  2003-03-20 21:15 ` [Patch] ext3_journal_stop inode access Andrew Morton
@ 2003-03-20 21:36   ` Stephen C. Tweedie
  2003-03-21  0:12     ` Andrew Morton
  0 siblings, 1 reply; 4+ messages in thread
From: Stephen C. Tweedie @ 2003-03-20 21:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: ext3 users list, linux-kernel, linux-fsdevel, Stephen Tweedie

Hi,

On Thu, 2003-03-20 at 21:15, Andrew Morton wrote:

> Burton has confirmed that removing the
> 
> 	inode->i_sb->s_dirt = 1;
> 
> line makes the oopses go away, so this will fix it.

Good.

> > It makes ext3_journal_stop take an sb, not an inode, as its final
> > parameter.
> 
> argh.  I wrote and tested a patch too.  That patch puts the superblock
> pointer into the new journal->j_private and removes the second arg to
> ext3_journal_start altogether.

Well, there's still the

	if (err)
		__ext3_std_error(inode->i_sb, where, err);

case in ext3_journal_stop() to worry about, so we still need it; and I'd
much rather not hack this via j_private, when what we're doing at this
point is most definitely a fs-specific, not journal-related, operation.

> I went that way just to save a little text.  Because ext3_journal_start/stop
> need to be uninlined - that saves 5.5 kbytes of text.

Agreed.

> Which do you think is best?  If you're planning on patching 2.4 and if you
> want to do that by passing the superblock pointer in, then we should go that
> way in 2.5 too, keep things in sync.

I was wondering why we've never seen this on 2.4, even with slab
poisoning enabled.  But I think the vulnerability exists on 2.4 too, so
yes, we ought to keep the two in sync.

> > It also sets sb->s_need_sync_fs, not sb->s_dirt, as setting
> > s_dirt was only ever a workaround for the lack of a proper sync-fs
> > mechanism.
> > 
> > Btw, we clear s_need_sync_fs in sync_filesystems().  Don't we also need
> > to do the same in fsync_super()?
> 
> The intent of s_need_sync_fs is to avoid livelock in sync_filesystems().
...

Well, the intent of the s_dirt was to force a call to ext3_write_super
when the fs was dirty, back before the days when we had a sync_fs()
method at all.  Now that we have the latter, it sounds like we should
actually just drop the line which sets s_dirt in ext3_journal_stop
entirely, because sync will always call the new sync_fs which will do
the commit that we need.

We still have the error handling path in ext3_journal_stop so we can't
avoid having to find the sb, so _some_ rejigging is still needed.

Cheers,
 Stephen

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Patch] ext3_journal_stop inode access
  2003-03-20 21:36   ` Stephen C. Tweedie
@ 2003-03-21  0:12     ` Andrew Morton
  2003-03-20 22:18       ` Stephen C. Tweedie
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2003-03-21  0:12 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: ext3-users, linux-kernel, linux-fsdevel

"Stephen C. Tweedie" <sct@redhat.com> wrote:
>
> Well, there's still the
> 
> 	if (err)
> 		__ext3_std_error(inode->i_sb, where, err);
> 
> case in ext3_journal_stop() to worry about

We already have that.

int __ext3_journal_stop(const char *where, handle_t *handle)
{
	struct super_block *sb;
	int err;
	int rc;

	sb = handle->h_transaction->t_journal->j_private;
	err = handle->h_err;
	rc = journal_stop(handle);

	sb->s_dirt = 1;
	if (!err)
		err = rc;
	if (err)
		__ext3_std_error(sb, where, err);
	return err;
}

> , so we still need it; and I'd
> much rather not hack this via j_private, when what we're doing at this
> point is most definitely a fs-specific, not journal-related, operation.

I don't think it's a hack at all.  ext3 owns the journal and there is plenty
of precedent for putting owner-private things into owned objects.

But I'm not particularly fussed either way - it will only be 100-200 bytes of
code saved.

> I was wondering why we've never seen this on 2.4, even with slab
> poisoning enabled.  But I think the vulnerability exists on 2.4 too, so
> yes, we ought to keep the two in sync.

It could be due to differences in inode reclaim.  If 2.4 sees an inode with
attached pages it will skip it.  2.5 will instead detach the pages and free
the inode.

And writepage() doesn't get used much in 2.4.

Plus this bug was hit on a preemptible kernel where the timing windows are
much wider.

> Well, the intent of the s_dirt was to force a call to ext3_write_super
> when the fs was dirty, back before the days when we had a sync_fs()
> method at all.  Now that we have the latter, it sounds like we should
> actually just drop the line which sets s_dirt in ext3_journal_stop
> entirely, because sync will always call the new sync_fs which will do
> the commit that we need.

OK.

> We still have the error handling path in ext3_journal_stop so we can't
> avoid having to find the sb, so _some_ rejigging is still needed.

That is available from the handle.  (And via
ext3_journal_current_handle()->j_private, even).

The journal and the superblock have a definite one-to-one relationship - I think the
backpointer makes sense.  But whatever - I'll let you flip that coin.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Patch] ext3_journal_stop inode access
  2003-03-21  0:12     ` Andrew Morton
@ 2003-03-20 22:18       ` Stephen C. Tweedie
  0 siblings, 0 replies; 4+ messages in thread
From: Stephen C. Tweedie @ 2003-03-20 22:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: ext3 users list, linux-kernel, linux-fsdevel, Stephen Tweedie

Hi,On Fri, 2003-03-21 at 00:12, Andrew Morton wrote:

> > Well, there's still the
> > 	if (err)
> > 		__ext3_std_error(inode->i_sb, where, err);
> > case in ext3_journal_stop() to worry about
> 
> We already have that.

Only if we fix the underlying problem --- I was only pointing out that
even if we drop the setting of s_dirt entirely, which was what we were
trying to fix, we can't avoid having to find the sb.

> But I'm not particularly fussed either way - it will only be 100-200 bytes of
> code saved.

Yep, but there are probably other places we can find where we can avoid
passing the sb around too if we have the back-pointer. I guess it makes
sense to go ahead with that.

> The journal and the superblock have a definite one-to-one relationship - I think the
> backpointer makes sense.  But whatever - I'll let you flip that coin.

OK, go for it and I'll merge for 2.4.

Cheers,
 Stephen

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2003-03-21  0:12 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1048185825.2491.386.camel@sisko.scot.redhat.com>
2003-03-20 21:15 ` [Patch] ext3_journal_stop inode access Andrew Morton
2003-03-20 21:36   ` Stephen C. Tweedie
2003-03-21  0:12     ` Andrew Morton
2003-03-20 22:18       ` Stephen C. Tweedie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).