From: Dave Chinner <david@fromorbit.com>
To: Ben Myers <bpm@sgi.com>
Cc: xfs@oss.sgi.com, Eric Sandeen <sandeen@sandeen.net>,
Mark Tinguely <tinguely@sgi.com>,
sekharan@us.ibm.com
Subject: Re: [PATCH] xfs: shutdown filesystem if xfs_perag_get fails
Date: Tue, 30 Apr 2013 08:30:01 +1000 [thread overview]
Message-ID: <20130429223001.GC23072@dastard> (raw)
In-Reply-To: <20130426160704.GF29359@sgi.com>
On Fri, Apr 26, 2013 at 11:07:04AM -0500, Ben Myers wrote:
> Hi Mark and Chandra,
>
> On Fri, Apr 26, 2013 at 10:32:34AM -0500, Mark Tinguely wrote:
> > On 04/25/13 17:41, Chandra Seetharaman wrote:
> > >In which case something along the lines of
> > >
> > >---
> > >diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> > >index 3806088..3fb2fa6 100644
> > >--- a/fs/xfs/xfs_mount.c
> > >+++ b/fs/xfs/xfs_mount.c
> > >@@ -203,7 +203,13 @@ xfs_perag_get(struct xfs_mount *mp, xfs_agnumber_t
> > >agno)
> > > if (pag) {
> > > ASSERT(atomic_read(&pag->pag_ref)>= 0);
> > > ref = atomic_inc_return(&pag->pag_ref);
> > >- }
> > >+ } else
> > >+ /*
> > >+ * xfs_perag_get() is called with invalid agno,
> > >+ * which cannot happen. This indicates a problem
> > >+ * in the calling code.
> > >+ */
> > >+ BUG();
> > > rcu_read_unlock();
> > > trace_xfs_perag_get(mp, agno, ref, _RET_IP_);
> > > return pag;
> > >--------
> > >
> > >would be useful ?. Since we have a NULL pag, we will trip somewhere
> > >else. At least with this, there is a pointer to the debugger/sysadmin
> > >about where/what to look for (may be with more valuable/correct comment
> > >than above).
> > >
> >
> > We will have to make sure the callers of xfs_perag_get() handle the NULL
> > before dereferencing it. Sometimes the NULL is normal and just means the
> > perag structure has not been initialize yet.
> >
> > Properly handling the NULL from xfs_perag_get() in the caller will also
> > mean that the callers of the callers of xfs_perag_get() have to handle
> > the NULL returned to them. I will come back to this once the CRC stuff
> > has been put to rest.
>
> I agree that we want to address this. Our worst case should be a forced
> shutdown, rather than a NULL ptr deref, or a BUG(). Ideally one corrupted
> filesystem does not result in a full system outage, right? ;)
A BUG() or null pointer deref simply segv's the current process. It
doesn't cause a reboot, hang or crash dump unless the system is
configured to do so.
But, as I've already stated, checking the return of xfs_perag_get()
is not answer to this problem - it's just a band-aid. Lots of them,
and mostly unneccessary, too. We need to address the input
validation problem prior to calling xfs_perag_get() to catch the
error at the source, not somewhere in the downstream call chain when
an invalid agno is tripped over.
Indeed, the design of the code is such that the agno is *trusted* to
be correct, just like we trust inode numbers coming from on-disk
structures to be correct. We validate inode numbers properly, but we
aren't validating block numbers returned from extent records
completely. That's the source of the problem we are seeing -
xfs_perag_get() returning NULL is just a symptom. Put simply:
no-one should *ever* pass an invalid agno to xfs_perag_get().
> There are some others like this. e.g. xfs_da_read_buf can return 0 with a
> null buffer pointer, and we rarely check for that before using bp.
I've also pointed out that the case where that can occur is handled
by the callers that trigger it. It does not need to be checked by
every caller, because most of them can't trigger this return case.
Let's not make a mountain out of a molehill....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
prev parent reply other threads:[~2013-04-29 22:30 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20130419204102.736961610@sgi.com>
2013-04-21 17:41 ` [PATCH] xfs: shutdown filesystem if xfs_perag_get fails Mark Tinguely
2013-04-21 21:55 ` Eric Sandeen
2013-04-22 13:45 ` Mark Tinguely
2013-04-22 14:32 ` Eric Sandeen
2013-04-22 15:11 ` Mark Tinguely
2013-04-22 23:30 ` Dave Chinner
2013-04-23 13:48 ` Mark Tinguely
2013-04-23 15:54 ` Chandra Seetharaman
2013-04-23 20:49 ` Dave Chinner
2013-04-25 22:41 ` Chandra Seetharaman
2013-04-26 1:32 ` Dave Chinner
2013-04-26 15:32 ` Mark Tinguely
2013-04-26 16:07 ` Ben Myers
2013-04-29 22:30 ` Dave Chinner [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130429223001.GC23072@dastard \
--to=david@fromorbit.com \
--cc=bpm@sgi.com \
--cc=sandeen@sandeen.net \
--cc=sekharan@us.ibm.com \
--cc=tinguely@sgi.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox