linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Sandeen <sandeen@sandeen.net>
To: Alex Lyakas <alex@zadarastorage.com>, Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
Subject: Re: xfs_growfs_data_private memory leak
Date: Mon, 04 Aug 2014 13:15:00 -0500	[thread overview]
Message-ID: <53DFCDA4.9000605@sandeen.net> (raw)
In-Reply-To: <D8356F34DAF14CF5AC6D204541C261B3@alyakaslap>

On 7/2/14, 7:27 AM, Alex Lyakas wrote:
> Hi Dave,
> Thank you for your comments.
> 
> I realize that secondary superblocks are needed mostly for repairing

s/mostly/only/

> a broken filesystem. However, I don't see that they get updated
> regularly, i.e., during normal operation they don't seem to get
> updated at all. I put a print in xfs_sb_write_verify, and it gets
> called only with: bp->b_bn==XFS_SB_DADDR.

See the comments above verify_sb(): not everything is validated, so
not everything needs to be constantly updated.  It's just the basic
fs geometry, not counters, etc.

/*
 * verify a superblock -- does not verify root inode #
 *      can only check that geometry info is internally
 *      consistent.  because of growfs, that's no guarantee
 *      of correctness (e.g. geometry may have changed)
 *
 * fields verified or consistency checked:
 *
 *                      sb_magicnum
 *
 *                      sb_versionnum
 *
 *                      sb_inprogress
 *
 *                      sb_blocksize    (as a group)
 *                      sb_blocklog
 *
 * geometry info -      sb_dblocks      (as a group)
 *                      sb_agcount
 *                      sb_agblocks
 *                      sb_agblklog
 *
 * inode info -         sb_inodesize    (x-checked with geo info)
 *                      sb_inopblock
 *
 * sector size info -
 *                      sb_sectsize
 *                      sb_sectlog
 *                      sb_logsectsize
 *                      sb_logsectlog
 *
 * not checked here -
 *                      sb_rootino
 *                      sb_fname
 *                      sb_fpack
 *                      sb_logstart
 *                      sb_uuid
 *
 *                      ALL real-time fields
 *                      final 4 summary counters
 */


> So do I understand correctly (also from comments in
> xfs_growfs_data_private), that it is safe to operate a filesystem
> while having broken secondary superblocks? For me, it appears to
> mount properly, and all the data seems to be there, but xfs_check
> complains like:

> bad sb magic # 0xc2a4baf2 in ag 6144
> bad sb version # 0x4b5d in ag 6144
> blocks 6144/65536..2192631388 out of range
> blocks 6144/65536..2192631388 claimed by block 6144/0
> bad sb magic # 0xb20f3079 in ag 6145
> bad sb version # 0x6505 in ag 6145
> blocks 6145/65536..3530010017 out of range
> blocks 6145/65536..3530010017 claimed by block 6145/0
> ...

some of that looks more serious than "just" bad backup sb's.
But the bad secondaries shouldn't cause runtime problems AFAIK.

> Also, if secondary superblocks do not get updated regularly, and
> there is no way to ask an operational XFS to update them, then during
> repair we may not find a good secondary superblock.

You seem to have 6144 (!) allocation groups; one would hope that a
majority of those supers would be "good" and the others will be
properly corrected by an xfs_repair.

> As for the patch, I cannot post a patch against the upstream kernel,
> because I am running an older kernel. Unfortunately, I cannot qualify
> an upstream patch properly in a reasonable time. Is there a value in
> posting a patch against 3.8.13? Otherwise, it's fine by me if
> somebody else posts it and takes the credit.

If the patch applies cleanly to both kernels, probably fine to
go ahead and post it, with that caveat.

-Eric

> Thanks,
> Alex.
> 
> 
> 
> -----Original Message----- From: Dave Chinner
> Sent: 02 July, 2014 12:56 AM
> To: Alex Lyakas
> Cc: xfs@oss.sgi.com
> Subject: Re: xfs_growfs_data_private memory leak
> 
> On Tue, Jul 01, 2014 at 06:06:38PM +0300, Alex Lyakas wrote:
>> Greetings,
>>
>> It appears that if xfs_growfs_data_private fails during the "new AG
>> headers" loop, it does not free all the per-AG structures for the
>> new AGs. When XFS is unmounted later, they are not freed as well,
>> because xfs_growfs_data_private did not update the "sb_agcount"
>> field, so xfs_free_perag will not free them. This happens on 3.8.13,
>> but looking at the latest master branch, it seems to have the same
>> issue.
>>
>> Code like [1] in xfs_growfs_data, seems to fix the issue.
> 
> Why not just do this in the appropriate error stack, like is
> done inside xfs_initialize_perag() on error?
> 
>        for (i = oagcount; i < nagcount; i++) {
>                pag = radix_tree_delete(&mp->m_perag_tree, index);
>                kmem_free(pag);
>        }
> 
> (though it might need RCU freeing)
> 
> When you have a fix, can you send a proper patch with a sign-off on
> it?
> 
>> A follow-up question: if xfs_grows_data_private fails during the
>> loop that updates all the secondary superblocks, what is the
>> consequence? (I am aware that in the latest master branch, the loop
>> is not broken on first error, but attempts to initialize whatever
>> possible). When these secondary superblocks will get updated? Is
>> there a way to force-update them? Otherwise, what can be the
>> consequence of leaving them not updated?
> 
> The consequence is documented in mainline tree - if we don't update
> them all, then repair will do the wrong thing.  Repair requires a
> majority iof identical secondaries to determine if the primary is
> correct or out of date. The old behaviour of not updating after the
> first error meant that the majority were old superblocks and so at
> some time in the future repair could decide your filesystem is
> smaller than it really is and hence truncate away the grown section
> of the filesystem. i.e. trigger catastrophic, unrecoverable data
> loss.
> 
> Hence it's far better to write every seconday we can than to leave
> a majority in a bad state....
> 
> Cheers,
> 
> Dave.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2014-08-04 18:15 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-18 18:37 Questions about XFS discard and xfs_free_extent() code (newbie) Alex Lyakas
2013-12-18 23:06 ` Dave Chinner
2013-12-19  9:24   ` Alex Lyakas
2013-12-19 10:55     ` Dave Chinner
2013-12-19 19:24       ` Alex Lyakas
2013-12-21 17:03         ` Chris Murphy
2013-12-24 18:21       ` Alex Lyakas
2013-12-26 23:00         ` Dave Chinner
2014-01-08 18:13           ` Alex Lyakas
2014-01-13  3:02             ` Dave Chinner
2014-01-13 17:44               ` Alex Lyakas
2014-01-13 20:43                 ` Dave Chinner
2014-01-14 13:48                   ` Alex Lyakas
2014-01-15  1:45                     ` Dave Chinner
2014-01-19  9:38                       ` Alex Lyakas
2014-01-19 23:17                         ` Dave Chinner
2014-07-01 15:06                           ` xfs_growfs_data_private memory leak Alex Lyakas
2014-07-01 21:56                             ` Dave Chinner
2014-07-02 12:27                               ` Alex Lyakas
2014-08-04 18:15                                 ` Eric Sandeen [this message]
2014-08-06  8:56                                   ` Alex Lyakas
2014-08-04 11:00                             ` use-after-free on log replay failure Alex Lyakas
2014-08-04 14:12                               ` Brian Foster
2014-08-04 23:07                               ` Dave Chinner
2014-08-06 10:05                                 ` Alex Lyakas
2014-08-06 12:32                                   ` Dave Chinner
2014-08-06 14:43                                     ` Alex Lyakas
2014-08-10 16:26                                     ` Alex Lyakas
2014-08-06 12:52                                 ` Alex Lyakas
2014-08-06 15:20                                   ` Brian Foster
2014-08-06 15:28                                     ` Alex Lyakas
2014-08-10 12:20                                     ` Alex Lyakas
2014-08-11 13:20                                       ` Brian Foster
2014-08-11 21:52                                         ` Dave Chinner
2014-08-12 12:03                                           ` Brian Foster
2014-08-12 12:39                                             ` Alex Lyakas
2014-08-12 19:31                                               ` Brian Foster
2014-08-12 23:56                                               ` Dave Chinner
2014-08-13 12:59                                                 ` Brian Foster
2014-08-13 20:59                                                   ` Dave Chinner
2014-08-13 23:21                                                     ` Brian Foster
2014-08-14  6:14                                                       ` Dave Chinner
2014-08-14 19:05                                                         ` Brian Foster
2014-08-14 22:27                                                           ` Dave Chinner
2014-08-13 17:07                                                 ` Alex Lyakas
2014-08-13  0:03                                               ` Dave Chinner
2014-08-13 13:11                                                 ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53DFCDA4.9000605@sandeen.net \
    --to=sandeen@sandeen.net \
    --cc=alex@zadarastorage.com \
    --cc=david@fromorbit.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).