All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Sandeen <sandeen@sandeen.net>
To: Alex Lyakas <alex@zadarastorage.com>, Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
Subject: Re: xfs_growfs_data_private memory leak
Date: Mon, 04 Aug 2014 13:15:00 -0500	[thread overview]
Message-ID: <53DFCDA4.9000605@sandeen.net> (raw)
In-Reply-To: <D8356F34DAF14CF5AC6D204541C261B3@alyakaslap>

On 7/2/14, 7:27 AM, Alex Lyakas wrote:
> Hi Dave,
> Thank you for your comments.
> 
> I realize that secondary superblocks are needed mostly for repairing

s/mostly/only/

> a broken filesystem. However, I don't see that they get updated
> regularly, i.e., during normal operation they don't seem to get
> updated at all. I put a print in xfs_sb_write_verify, and it gets
> called only with: bp->b_bn==XFS_SB_DADDR.

See the comments above verify_sb(): not everything is validated, so
not everything needs to be constantly updated.  It's just the basic
fs geometry, not counters, etc.

/*
 * verify a superblock -- does not verify root inode #
 *      can only check that geometry info is internally
 *      consistent.  because of growfs, that's no guarantee
 *      of correctness (e.g. geometry may have changed)
 *
 * fields verified or consistency checked:
 *
 *                      sb_magicnum
 *
 *                      sb_versionnum
 *
 *                      sb_inprogress
 *
 *                      sb_blocksize    (as a group)
 *                      sb_blocklog
 *
 * geometry info -      sb_dblocks      (as a group)
 *                      sb_agcount
 *                      sb_agblocks
 *                      sb_agblklog
 *
 * inode info -         sb_inodesize    (x-checked with geo info)
 *                      sb_inopblock
 *
 * sector size info -
 *                      sb_sectsize
 *                      sb_sectlog
 *                      sb_logsectsize
 *                      sb_logsectlog
 *
 * not checked here -
 *                      sb_rootino
 *                      sb_fname
 *                      sb_fpack
 *                      sb_logstart
 *                      sb_uuid
 *
 *                      ALL real-time fields
 *                      final 4 summary counters
 */


> So do I understand correctly (also from comments in
> xfs_growfs_data_private), that it is safe to operate a filesystem
> while having broken secondary superblocks? For me, it appears to
> mount properly, and all the data seems to be there, but xfs_check
> complains like:

> bad sb magic # 0xc2a4baf2 in ag 6144
> bad sb version # 0x4b5d in ag 6144
> blocks 6144/65536..2192631388 out of range
> blocks 6144/65536..2192631388 claimed by block 6144/0
> bad sb magic # 0xb20f3079 in ag 6145
> bad sb version # 0x6505 in ag 6145
> blocks 6145/65536..3530010017 out of range
> blocks 6145/65536..3530010017 claimed by block 6145/0
> ...

some of that looks more serious than "just" bad backup sb's.
But the bad secondaries shouldn't cause runtime problems AFAIK.

> Also, if secondary superblocks do not get updated regularly, and
> there is no way to ask an operational XFS to update them, then during
> repair we may not find a good secondary superblock.

You seem to have 6144 (!) allocation groups; one would hope that a
majority of those supers would be "good" and the others will be
properly corrected by an xfs_repair.

> As for the patch, I cannot post a patch against the upstream kernel,
> because I am running an older kernel. Unfortunately, I cannot qualify
> an upstream patch properly in a reasonable time. Is there a value in
> posting a patch against 3.8.13? Otherwise, it's fine by me if
> somebody else posts it and takes the credit.

If the patch applies cleanly to both kernels, probably fine to
go ahead and post it, with that caveat.

-Eric

> Thanks,
> Alex.
> 
> 
> 
> -----Original Message----- From: Dave Chinner
> Sent: 02 July, 2014 12:56 AM
> To: Alex Lyakas
> Cc: xfs@oss.sgi.com
> Subject: Re: xfs_growfs_data_private memory leak
> 
> On Tue, Jul 01, 2014 at 06:06:38PM +0300, Alex Lyakas wrote:
>> Greetings,
>>
>> It appears that if xfs_growfs_data_private fails during the "new AG
>> headers" loop, it does not free all the per-AG structures for the
>> new AGs. When XFS is unmounted later, they are not freed as well,
>> because xfs_growfs_data_private did not update the "sb_agcount"
>> field, so xfs_free_perag will not free them. This happens on 3.8.13,
>> but looking at the latest master branch, it seems to have the same
>> issue.
>>
>> Code like [1] in xfs_growfs_data, seems to fix the issue.
> 
> Why not just do this in the appropriate error stack, like is
> done inside xfs_initialize_perag() on error?
> 
>        for (i = oagcount; i < nagcount; i++) {
>                pag = radix_tree_delete(&mp->m_perag_tree, index);
>                kmem_free(pag);
>        }
> 
> (though it might need RCU freeing)
> 
> When you have a fix, can you send a proper patch with a sign-off on
> it?
> 
>> A follow-up question: if xfs_grows_data_private fails during the
>> loop that updates all the secondary superblocks, what is the
>> consequence? (I am aware that in the latest master branch, the loop
>> is not broken on first error, but attempts to initialize whatever
>> possible). When these secondary superblocks will get updated? Is
>> there a way to force-update them? Otherwise, what can be the
>> consequence of leaving them not updated?
> 
> The consequence is documented in mainline tree - if we don't update
> them all, then repair will do the wrong thing.  Repair requires a
> majority iof identical secondaries to determine if the primary is
> correct or out of date. The old behaviour of not updating after the
> first error meant that the majority were old superblocks and so at
> some time in the future repair could decide your filesystem is
> smaller than it really is and hence truncate away the grown section
> of the filesystem. i.e. trigger catastrophic, unrecoverable data
> loss.
> 
> Hence it's far better to write every seconday we can than to leave
> a majority in a bad state....
> 
> Cheers,
> 
> Dave.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2014-08-04 18:15 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-18 18:37 Questions about XFS discard and xfs_free_extent() code (newbie) Alex Lyakas
2013-12-18 23:06 ` Dave Chinner
2013-12-19  9:24   ` Alex Lyakas
2013-12-19 10:55     ` Dave Chinner
2013-12-19 19:24       ` Alex Lyakas
2013-12-21 17:03         ` Chris Murphy
2013-12-24 18:21       ` Alex Lyakas
2013-12-26 23:00         ` Dave Chinner
2014-01-08 18:13           ` Alex Lyakas
2014-01-13  3:02             ` Dave Chinner
2014-01-13 17:44               ` Alex Lyakas
2014-01-13 20:43                 ` Dave Chinner
2014-01-14 13:48                   ` Alex Lyakas
2014-01-15  1:45                     ` Dave Chinner
2014-01-19  9:38                       ` Alex Lyakas
2014-01-19 23:17                         ` Dave Chinner
2014-07-01 15:06                           ` xfs_growfs_data_private memory leak Alex Lyakas
2014-07-01 21:56                             ` Dave Chinner
2014-07-02 12:27                               ` Alex Lyakas
2014-08-04 18:15                                 ` Eric Sandeen [this message]
2014-08-06  8:56                                   ` Alex Lyakas
2014-08-04 11:00                             ` use-after-free on log replay failure Alex Lyakas
2014-08-04 14:12                               ` Brian Foster
2014-08-04 23:07                               ` Dave Chinner
2014-08-06 10:05                                 ` Alex Lyakas
2014-08-06 12:32                                   ` Dave Chinner
2014-08-06 14:43                                     ` Alex Lyakas
2014-08-10 16:26                                     ` Alex Lyakas
2014-08-06 12:52                                 ` Alex Lyakas
2014-08-06 15:20                                   ` Brian Foster
2014-08-06 15:28                                     ` Alex Lyakas
2014-08-10 12:20                                     ` Alex Lyakas
2014-08-11 13:20                                       ` Brian Foster
2014-08-11 21:52                                         ` Dave Chinner
2014-08-12 12:03                                           ` Brian Foster
2014-08-12 12:39                                             ` Alex Lyakas
2014-08-12 19:31                                               ` Brian Foster
2014-08-12 23:56                                               ` Dave Chinner
2014-08-13 12:59                                                 ` Brian Foster
2014-08-13 20:59                                                   ` Dave Chinner
2014-08-13 23:21                                                     ` Brian Foster
2014-08-14  6:14                                                       ` Dave Chinner
2014-08-14 19:05                                                         ` Brian Foster
2014-08-14 22:27                                                           ` Dave Chinner
2014-08-13 17:07                                                 ` Alex Lyakas
2014-08-13  0:03                                               ` Dave Chinner
2014-08-13 13:11                                                 ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53DFCDA4.9000605@sandeen.net \
    --to=sandeen@sandeen.net \
    --cc=alex@zadarastorage.com \
    --cc=david@fromorbit.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.