From: Mark Mielke <mark.mielke@gmail.com>
To: LVM general discussion and development <linux-lvm@redhat.com>
Subject: Re: [linux-lvm] LVM archive management ( /etc/lvm/archives) expiry / retention misbehaves after index #100, 000.
Date: Sun, 13 Aug 2017 23:49:21 -0400 [thread overview]
Message-ID: <CALm7yL0ePnQh04ztPDzTOoGgTpDLUbcfKY6tPh_gDF3jay3V5Q@mail.gmail.com> (raw)
In-Reply-To: <CALm7yL3gMJ6Aech3juPbP+5nK-nOZr1H6PZBdsf-hP6-vrzwqA@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 3640 bytes --]
I opened this Bugzilla issue for tracking purposes:
https://bugzilla.redhat.com/show_bug.cgi?id=1481085
On Sun, Aug 13, 2017 at 8:05 AM, Mark Mielke <mark.mielke@gmail.com> wrote:
> I searched around for this a bit, and although other users may have hit
> this, I didn't find a good explanation offered. I suspect the users clean
> it up manually and then it disappears for another 2 years. I hope this
> message will get captured by Google, and help somebody else out. Also, I
> hope to have some discussion about this as it seems like an easily
> preventable problem.
>
> The archive file names are generated like:
>
> if (dm_snprintf(archive_name, sizeof(archive_name),
> "%s/%s_%05u-%d.vg",
> dir, vg->name, ix, rnum) < 0) {
>
> The directory scanning code that loads the archive file names into memory
> recognizes a problem, although it isn't explicit about what the problem is:
>
> /* Sort fails beyond 5-digit indexes */
> if ((count = scandir(dir, &dirent, NULL, alphasort)) < 0) {
> log_error("Couldn't scan the archive directory (%s).",
> dir);
> return 0;
> }
>
> The file names encode the index like "00000". The sorting code uses
> "alphasort", which will only work properly as long as the index stays
> within 5 digits. As soon as it exceeds 5 digits, it begins to sort the
> "100000" to the beginning, and "99999" to the end. Then, new archives seems
> to *all* be "100000". We had some 40,000 indexes with "100000" before we
> noticed. And, because the index is followed by a random number, it would
> only expire a few of the "100000" before it would hit one that was younger
> than the 30 days retention period set by default. When I reduced the
> retention period to 7 days, it expired only about 12 archive files of
> 40,000 archive files. This behaviour is probably due to random number
> distribution ensuring that there are always some recent records near 0?
>
> This issue eventually affects everyone, although obviously the people that
> use features like snapshots more frequently (we use it every 15 minutes,
> across multiple volumes) will hit it sooner,
>
> There are a few fixes possible... Probably, "alphasort" should not be used
> at all, but a context aware sort should be used, that can filter and sort
> as it goes, decoding the index correctly as a number, and comparing it as a
> number. Then, if performance is desirable, and scalability, it would be
> ideal if it did it in a single pass, and buffering only the minimum needed
> to expire the correct archive files.
>
> We hit this on RHEL 7.2. I wasn't surprised to find it in RHEL 7.2, but I
> was surprised that it still exists on "master". "git blame" says this has
> been an issue since 2002:
>
> 5be981bab5 (Alasdair Kergon 2002-05-07 12:47:11 +0000 139) /* Sort
> fails beyond 5-digit indexes */
> 59d6420b9a (Joe Thornber 2002-02-08 11:58:18 +0000 140) if ((count
> = scandir(dir, &dirent, NULL, alphasort)) < 0) {
> b8f47d5f69 (Alasdair Kergon 2009-07-15 20:02:46 +0000 141)
> log_error("Couldn't scan the archive directory (%s).", dir);
> 952d12a5f5 (Alasdair Kergon 2002-01-09 19:16:48 +0000 142)
> return 0;
> 952d12a5f5 (Alasdair Kergon 2002-01-09 19:16:48 +0000 143) }
>
> Ouch... :-)
>
> For anybody that does hit this.... Prune the archive files with index <
> 100000 is effective. It starts counting from 100000, and you now have 9X
> more life before it will happen again... :-)
>
> --
> Mark Mielke <mark.mielke@gmail.com>
>
>
--
Mark Mielke <mark.mielke@gmail.com>
[-- Attachment #2: Type: text/html, Size: 5584 bytes --]
prev parent reply other threads:[~2017-08-14 3:49 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-13 12:05 [linux-lvm] LVM archive management ( /etc/lvm/archives) expiry / retention misbehaves after index #100, 000 Mark Mielke
2017-08-14 3:49 ` Mark Mielke [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CALm7yL0ePnQh04ztPDzTOoGgTpDLUbcfKY6tPh_gDF3jay3V5Q@mail.gmail.com \
--to=mark.mielke@gmail.com \
--cc=linux-lvm@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).