Re: Segmentation fault with drgn + libkdumpfile

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Stephen Brennan <stephen.s.brennan@oracle.com>
To: "Petr Tesařík" <petr@tesarici.cz>
Cc: Omar Sandoval <osandov@osandov.com>, linux-debuggers@vger.kernel.org
Subject: Re: Segmentation fault with drgn + libkdumpfile
Date: Tue, 09 Jan 2024 17:40:15 -0800	[thread overview]
Message-ID: <87sf36yni8.fsf@oracle.com> (raw)
In-Reply-To: <20240109100609.4e956beb@meshulam.tesarici.cz>

Petr Tesařík <petr@tesarici.cz> writes:

> On Mon, 8 Jan 2024 21:40:08 +0100
> Petr Tesařík <petr@tesarici.cz> wrote:
>
>> On Fri, 05 Jan 2024 13:53:15 -0800
>> Stephen Brennan <stephen.s.brennan@oracle.com> wrote:
>> 
>> > Petr Tesařík <petr@tesarici.cz> writes:  
>> > > On Fri, 05 Jan 2024 10:38:16 -0800
>> > > Stephen Brennan <stephen.s.brennan@oracle.com> wrote:
>> > >    
>> > >> Hi Petr,
>> > >> 
>> > >> I recently encountered a segmentation fault with libkdumpfile & drgn
>> > >> which appears to be related to the cache implementation. I've included
>> > >> the stack trace at the end of this message, since it's a bit of a longer
>> > >> one. The exact issue occurred with a test vmcore that I could probably
>> > >> share with you privately if you'd like. In any case, the reproducer is
>> > >> fairly straightforward in drgn code:
>> > >> 
>> > >> for t in for_each_task(prog):
>> > >>     prog.stack_trace(t)
>> > >> for t in for_each_task(prog):
>> > >>     prog.stack_trace(t)
>> > >> 
>> > >> The repetition is required, the segfault only occurs on the second
>> > >> iteration of the loop. Which, in hindsight, is a textbook sign that the
>> > >> issue has to do with caching. I'd expect that the issue is specific to
>> > >> this vmcore, it doesn't reproduce on others.
>> > >> 
>> > >> I stuck that into a git bisect script and bisected the libkdumpfile
>> > >> commit that introduced it:
>> > >> 
>> > >> commit 487a8042ea5da580e1fdb5b8f91c8bd7cad05cd6
>> > >> Author: Petr Tesarik <petr@tesarici.cz>
>> > >> Date:   Wed Jan 11 22:53:01 2023 +0100
>> > >> 
>> > >>     Cache: Calculate eprobe in reinit_entry()
>> > >> 
>> > >>     If this function is called to reuse a ghost entry, the probe list
>> > >>     has not been walked yet, so eprobe is left uninitialized.
>> > >> 
>> > >>     This passed the test case, because the correct old value was left
>> > >>     on stack. Modify the test case to poison the stack.
>> > >> 
>> > >>     Signed-off-by: Petr Tesarik <petr@tesarici.cz>
>> > >> 
>> > >>  src/kdumpfile/cache.c      |  6 +++++-
>> > >>  src/kdumpfile/test-cache.c | 13 +++++++++++++
>> > >>  2 files changed, 18 insertions(+), 1 deletion(-)    
>> > >
>> > > This looks like a red herring to me. The cache most likely continues in
>> > > a corrupted state without this commit, which may mask the issue (until
>> > > it resurfaces later).    
>> > 
>> > I see, that makes a lot of sense.
>> >   
>> > >> I haven't yet tried to debug the logic of the cache implementation and
>> > >> create a patch. I'm totally willing to try that, but I figured I would
>> > >> send this report to you first, to see if there's something obvious that
>> > >> sticks out to your eyes.    
>> > >
>> > > No, but I should be able to recreate the issue if I get a log of the
>> > > cache API calls:
>> > >
>> > > - cache_alloc() - to know the number of elements
>> > > - cache_get_entry()
>> > > - cache_put_entry()
>> > > - cache_insert()
>> > > - cache_discard()
>> > > - cache_flush() - not likely after initialization, but...    
>> > 
>> > I went ahead and logged each of these calls as you suggested, I tried to
>> > log them at the beginning of the function call and always include the
>> > cache pointer, cache_entry, and the key. I took the resulting log and
>> > filtered it to just contain the most recently logged cache prior to the
>> > crash, compressed it, and attached it. For completeness, the patch
>> > I used is below (applies to tip branch 8254897 ("Merge pull request #78
>> > from fweimer-rh/c99")).
>> > 
>> > I'll also see if I can reproduce it based on the log.  
>> 
>> Thank you for the log. I haven't had much time to look at it, but the
>> first line is a good hint already:
>> 
>> 0x56098b68c4c0: cache_alloc(1024, 0)
>> 
>> Zero size means the data pointers are managed by the caller, so this
>> must be the cache of mmap()'ed segments. That's the only cache which
>> installs a cleanup callback with set_cache_entry_cleanup(). There is
>> only one call to the cleanup callback for evicted entries in cache.c:
>> 
>> 		/* Get an unused cached entry. */
>> 		if (cs->nuprobe != 0 &&
>> 		    (cs->nuprec == 0 || cache->nprobe + bias > cache->dprobe))
>> 			evict = evict_probe(cache, cs);
>> 		else
>> 			evict = evict_prec(cache, cs);
>> 		if (cache->entry_cleanup)
>> 			cache->entry_cleanup(cache->cleanup_data, evict);
>> 
>> The entries can be evicted from the probe partition or from the precious
>> partition. This might be relevant. Please, can you re-run and log where
>> the evict entry comes from?
>
> I found some time this morning, and it wouldn't help. Because of a bug
> in fcache_new(), the number of elements in the cache is big enough that
> cache entries are never evicted in your case. It's quite weird to hit a
> cache metadata bug after elements have been inserted. FWIW I am not
> able to reproduce the bug by replaying the logged file read pattern.
>
> Since you have a reliable reproducer, it cannot be a Heisenbug. But it
> could be caused by the other cache - the cache of decompressed pages.
> Do you know for sure that lzo1x_decompress_safe() crashes while trying
> to _read_ from the input buffer, and not while trying to _write_ to the
> output buffer?

Hi Petr,

Sorry for the delay here, I got pulled into other issues and am trying
to attend to all my work in a round-robin fashion :)

The fault is definitely in lzo1x_decompress_safe() *writing* to address
0. I fetched debuginfo for all the necessary libraries and we see the
following stack trace:

%<-----------------------
#0  0x00007fcd9adddef3 in lzo1x_decompress_safe (in=<optimized out>,
    in_len=<optimized out>, out=0x0, out_len=0x7ffdee2c1388, wrkmem=<optimized out>)
    at src/lzo1x_d.ch:120
#1  0x00007fcd9ae25be1 in diskdump_read_page (pio=0x7ffdee2c1590) at diskdump.c:584
#2  0x00007fcd9ae32d4d in _kdumpfile_priv_cache_get_page (pio=0x7ffdee2c1590,
    fn=0x7fcd9ae257ae <diskdump_read_page>) at read.c:69
#3  0x00007fcd9ae25e44 in diskdump_get_page (pio=0x7ffdee2c1590) at diskdump.c:647
#4  0x00007fcd9ae32be0 in get_page (pio=0x7ffdee2c1590)
    at /home/stepbren/repos/libkdumpfile/src/kdumpfile/kdumpfile-priv.h:1512
#5  0x00007fcd9ae32ed4 in get_page_xlat (pio=0x7ffdee2c1590) at read.c:126
#6  0x00007fcd9ae32f22 in get_page_maybe_xlat (pio=0x7ffdee2c1590) at read.c:137
#7  0x00007fcd9ae32fb1 in _kdumpfile_priv_read_locked (ctx=0x55745bfca8f0,
    as=KDUMP_KVADDR, addr=18446612133360081960, buffer=0x7ffdee2c17df,
    plength=0x7ffdee2c1698) at read.c:169
#8  0x00007fcd9ae330dd in kdump_read (ctx=0x55745bfca8f0, as=KDUMP_KVADDR,
    addr=18446612133360081960, buffer=0x7ffdee2c17df, plength=0x7ffdee2c1698)
    at read.c:196
#9  0x00007fcd9afb0cc4 in drgn_read_kdump (buf=0x7ffdee2c17df,
    address=18446612133360081960, count=4, offset=18446612133360081960,
    arg=0x55745bfca8f0, physical=false) at ../../libdrgn/kdump.c:73
%<-----------------------

In frame 1 where we are calling the decompressor:

%<-----------------------
(gdb) frame 1
#1  0x00007fcd9ae25be1 in diskdump_read_page (pio=0x7ffdee2c1590) at diskdump.c:584
584                     int ret = lzo1x_decompress_safe(fch.data, pd.size,
(gdb) list
579                     if (ret != KDUMP_OK)
580                             return ret;
581             } else if (pd.flags & DUMP_DH_COMPRESSED_LZO) {
582     #if USE_LZO
583                     lzo_uint retlen = get_page_size(ctx);
584                     int ret = lzo1x_decompress_safe(fch.data, pd.size,
585                                                     pio->chunk.data,
586                                                     &retlen,
587                                                     LZO1X_MEM_DECOMPRESS);
588                     fcache_put_chunk(&fch);
(gdb) p retlen
$7 = 0
(gdb) p pio->chunk.data
$8 = (void *) 0x0
(gdb) p fch.data
$9 = (void *) 0x7fcd7cc33da4
(gdb) p pd.size
$10 = 816
%<-----------------------

As far as I can tell, pio->chunk.data comes directly from the
cache_get_page() function in frame 2:

%<-----------------------
(gdb) up
#2  0x00007fcd9ae32d4d in _kdumpfile_priv_cache_get_page (pio=0x7ffdee2c1590,
    fn=0x7fcd9ae257ae <diskdump_read_page>) at read.c:69
69              ret = fn(pio);
(gdb) list
64              pio->chunk.data = entry->data;
65              pio->chunk.embed_fces->ce = entry;
66              if (cache_entry_valid(entry))
67                      return KDUMP_OK;
68
69              ret = fn(pio);
70              mutex_lock(&ctx->shared->cache_lock);
71              if (ret == KDUMP_OK)
72                      cache_insert(pio->chunk.embed_fces->cache, entry);
73              else
(gdb) p *entry
$11 = {key = 1045860353, state = cs_precious, next = 626, prev = 626, refcnt = 1,
  data = 0x0}
(gdb) p *pio
$12 = {ctx = 0x55745bfca8f0, addr = {addr = 1045860352, as = ADDRXLAT_MACHPHYSADDR},
  chunk = {data = 0x0, nent = 1, {embed_fces = {{data = 0xffff880ff1470788,
          len = 140728599320032, ce = 0x55745c1003d8, cache = 0x55745c0fb540}, {
          data = 0x55745bfd42f0, len = 140728599320112,
          ce = 0x7fcd9ae330ef <kdump_read+102>, cache = 0xffff88003e569c28}},
      fces = 0xffff880ff1470788}}}
%<-----------------------

And here is the cache structure:

%<-----------------------
(gdb) p *pio->chunk.embed_fces->cache
$16 = {split = 487, nprec = 1020, ngprec = 248, nprobe = 3, ngprobe = 239, dprobe = 2,
  cap = 1024, inflight = 626, ninflight = 1, hits = {number = 168473, address = 168473,
    string = 0x29219 <error: Cannot access memory at address 0x29219>,
    bitmap = 0x29219, blob = 0x29219}, misses = {number = 1913, address = 1913,
    string = 0x779 <error: Cannot access memory at address 0x779>, bitmap = 0x779,
    blob = 0x779}, elemsize = 4096, data = 0x7fcd997fe010, entry_cleanup = 0x0,
  cleanup_data = 0x0, ce = 0x55745c0fb598}
%<-----------------------

Thanks for looking into this! I'll continue investigating more as well.

Stephen

next prev parent reply	other threads:[~2024-01-10  1:40 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-05 18:38 Segmentation fault with drgn + libkdumpfile Stephen Brennan
2024-01-05 19:23 ` Petr Tesařík
2024-01-05 21:53   ` Stephen Brennan
2024-01-08 20:40     ` Petr Tesařík
2024-01-09  9:06       ` Petr Tesařík
2024-01-10  1:40         ` Stephen Brennan [this message]
2024-01-10  8:36           ` Petr Tesařík
2024-01-10 13:49             ` Petr Tesařík
2024-01-10 18:03               ` Petr Tesařík
2024-01-10 19:48                 ` Stephen Brennan
2024-01-10 19:58                   ` Petr Tesařík

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87sf36yni8.fsf@oracle.com \
    --to=stephen.s.brennan@oracle.com \
    --cc=linux-debuggers@vger.kernel.org \
    --cc=osandov@osandov.com \
    --cc=petr@tesarici.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.