From: Dave Anderson <anderson@redhat.com>
To: Cliff Wickman <cpw@sgi.com>
Cc: kexec@lists.infradead.org
Subject: Re: [PATCH 2/2] makedumpfile: exclude unused vmemmap pages > (Cliff Wickman)
Date: Thu, 2 Jan 2014 14:00:34 -0500 (EST) [thread overview]
Message-ID: <821191738.23743294.1388689234324.JavaMail.root@redhat.com> (raw)
In-Reply-To: <20140102182430.GA23662@sgi.com>
----- Original Message -----
> On Thu, Jan 02, 2014 at 11:50:14AM -0500, Dave Anderson wrote:
> >
> >
> > ----- Original Message -----
> > > Date: Tue, 31 Dec 2013 17:36:02 -0600
> > > From: Cliff Wickman <cpw@sgi.com>
> > > To: kexec@lists.infradead.org, d.hatayama@jp.fujitsu.com,
> > > kumagai-atsushi@mxc.nes.nec.co.jp
> > > Subject: [PATCH 2/2] makedumpfile: exclude unused vmemmap pages
> > > Message-ID: <20131231233602.GB18522@sgi.com>
> > > Content-Type: text/plain; charset=us-ascii
> > >
> > > On Tue, Dec 31, 2013 at 05:30:01PM -0600, cpw wrote:
> > >
> > > Exclude kernel pages that contain nothing but page structures for pages
> > > that are not being included in the dump.
> > > These can amount to 3.67 million pages per terabyte of system memory!
> > >
> > > The kernel's page table, starting at virtual address 0xffffea0000000000,
> > > is
> > > searched to find the actual pages containing the vmemmap page structures.
> > >
> > > Bitmap1 is a map of dumpable (i.e existing) pages. Bitmap2 is a map
> > > of pages not to be excluded.
> > > To speed the search of bitmaps only whole 64-bit words of 1's in
> > > bitmap1 and 0's in bitmap2 are tested to see if they are vmemmap pages.
> > >
> > > The list of vmemmap pfn's to be excluded is written to a small file in
> > > order
> > > to conserve crash kernel memory.
> > >
> > > In practice, this whole procedure only takes about 10 seconds on a
> > > 16TB machine.
> > >
> > > The effect of omitting unused page structures from the dump has only
> > > one, minimal side effect that I can find: the crash command "kmem -f"
> > > will
> > > fail when attempting to walk through free pages. This seems to me to be
> > > a trivial negative when weighed against the enabling and acceleration
> > > of dumps on large systems.
> > >
> > > This patch includes -e and -N options to exclude or include unneeded
> > > vmemmap pages regardless of system size (see flag_includevm and
> > > flag_excludvm). By default the exclusion of such pages is only
> > > done on a system of a terabyte or more.
> >
> > Hi Cliff,
> >
> > I understand the reason behind this, but the default exclusion
> > (even @ 1TB) makes me a little nervous.
> >
> > Although I'm sure you tested this, I find it amazing that
> > only the "kmem -[fF]" option is the only command option
> > that is affected?
>
> Hi Dave,
>
> Maybe I missed some kmem options that walk free page lists.
> If a crash command is walking a page freelist it would use the
> list_head named 'lru' would it not? I only find lru references
> in crash's memory.c unwind.c gdb-7.6/sim/frv/cache.c gdb-7.6/bfd/cache.c
> I didn't do extensive tests of crash, but the kmem command was
> all I found.
Right, but look at all of the other page struct offsets in addition to
page.lru that are used. The page.flags usage comes to mind, and for
example, what would "kmem -p" display for the missing pages?
Or "kmem <address>"? And would "kmem -i" display invalid data?
I'm just speculating off the top of my head, but the page structure is
such a fundamental data structure with several of its fields being used,
just enter "help -o page_" to see all of its potential member usages.
> >
> > If I'm not mistaken, this would be the first time that legitimate
> > kernel data would be excluded from the dump, and the user would
> > have no obvious way of knowing that it had been done, correct?
> > If it were encoded in the in the header somewhere, at least a
> > warning message could be printed during crash initialization.
>
> Agreed, it is legitimate kernel data. But it is data that represents
> memory that we are not capturing. So it would seem to me to be of
> little use. And on the other hand if we do capture that data the time
> to take the dump would be so long as to make the whole notion of doing
> a dump prohibitive.
> (Even with this patch it took 40 minutes to dump a system of 16TB.
> Without the patch that might be 5 hours. And soon there will be
> 64TB systems.)
>
> When kmem -f fails it does say that a needed page has been excluded
> from the dump.
> But an up-front message would be reasonable.
Perhaps the disk_dump_header.status field could be used? Currently only
the 3 DUMP_DH_COMPRESSED_xxx bits are used.
> >
> > In any case, given that this can change traditional behavior,
> > I would prefer that the full set of pages be copied by default,
> > and only be excluded if the user configures it to do so.
>
> That could be easily done. It's not unreasonable to make the very large
> system require the special option. I just thought that the check of system
> size would be doing the system administrator a favor.
Yeah, I understand, but we don't do any other kind of restrictions without
purposefully specifying them with the -d arguments. IMHO it just seems to be
heading down a slippery slope that presumes makedumpfile "knows better"
than the administrator.
Dave
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
prev parent reply other threads:[~2014-01-02 19:00 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <mailman.18142.1388532985.1059.kexec@lists.infradead.org>
2014-01-02 16:50 ` [PATCH 2/2] makedumpfile: exclude unused vmemmap pages > (Cliff Wickman) Dave Anderson
2014-01-02 18:24 ` Cliff Wickman
2014-01-02 19:00 ` Dave Anderson [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=821191738.23743294.1388689234324.JavaMail.root@redhat.com \
--to=anderson@redhat.com \
--cc=cpw@sgi.com \
--cc=kexec@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.