From: Laurence Oberman <loberman@redhat.com>
To: Jaco Kroon <jaco@uls.co.za>, Bart Van Assche <bvanassche@acm.org>,
"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>
Subject: Re: LVM kernel lockup scenario during lvcreate
Date: Thu, 24 Aug 2023 16:19:51 -0400 [thread overview]
Message-ID: <69227e4091f3d9b05e739f900340f11afacdd91f.camel@redhat.com> (raw)
In-Reply-To: <977a1223-a543-a6ca-4a6c-0cf0fc6f84a0@uls.co.za>
On Thu, 2023-08-24 at 22:01 +0200, Jaco Kroon wrote:
> Hi,
>
> On 2023/08/24 19:29, Laurence Oberman wrote:
>
> > On Mon, 2023-06-12 at 11:40 -0700, Bart Van Assche wrote:
> > > On 6/9/23 00:29, Jaco Kroon wrote:
> > > > I'm attaching dmesg -T and ps axf. dmesg in particular may
> > > > provide
> > > > clues as it provides a number of stack traces indicating
> > > > stalling
> > > > at
> > > > IO time.
> > > >
> > > > Once this has triggered, even commands such as "lvs" goes into
> > > > uninterruptable wait, I unfortunately didn't test "dmsetup ls"
> > > > now
> > > > and triggered a reboot already (system needs to be up).
> > > To me the call traces suggest that an I/O request got stuck.
> > > Unfortunately call traces are not sufficient to identify the root
> > > cause
> > > in case I/O gets stuck. Has debugfs been mounted? If so, how
> > > about
> > > dumping the contents of /sys/kernel/debug/block/ into a tar file
> > > after
> > > the lockup has been reproduced and sharing that information?
> > >
> > > tar -czf- -C /sys/kernel/debug/block . >block.tgz
> > >
> > > Thanks,
> > >
> > > Bart.
> > >
> > One I am aware of is this
> > commit 106397376c0369fcc01c58dd189ff925a2724a57
> > Author: David Jeffery <djeffery@redhat.com>
> >
> > Can we try get a vmcore (assuming its not a secure site)
>
> Certainly. Obviously on any host handling any kind of sensitive data
> there is a likelihood that sensitive data may be present in the
> vmcore,
> as such I more than happy to create a vmcore, I'm assuming this will
> create a kernel version of a core dump ... with 256GB of RAM (most of
> which goes towards disk caches) I'm further assuming this file can be
> potentially large. Where will this get stored should the capture be
> made? (I need to ensure that the filesystem has sufficient storage
> available)
>
> >
> > Add these to /etc/sysctl.conf
> >
> > kernel.panic_on_io_nmi = 1
> > kernel.panic_on_unrecovered_nmi = 1
> > kernel.unknown_nmi_panic = 1
> >
> > Run sysctl -p
> > Ensure kdump is running and can capture a vmcore
> Done. Had to enable a few extra kernel options to get all the other
> requirements, so scheduled a reboot to activate the new kernel. This
> will happen on Saturday morning very early.
> >
> > When it locks up again
> > send an NMI via the SuperMicro Web Managemnt interface
>
> Possible to send from sysrq at the keyboard? Otherwise I'll just
> need
> to set up the RMI, will just be easier to do this from the keyboard
> if
> possible, it's not always if it's left too late.
>
> >
> > Share the vmcore, or we can have you capture some specifics from it
> > to
> > triage.
>
> I'd prefer you let me know what you need ... security concerns and
> all
> ... frankly, I highly doubt there is any data that is really so
> sensitive that it can be classified as "top secret" but we do have
> NDAs
> in place prohibiting me from sharing anything that may potentially
> contain customer related data ...
>
> Kind regards,
> Jaco
>
Hello, this would usually need an NMI sent from a management interface
as with it locked up no guarantee a sysrq c will get there from the
keyboard.
You could try though.
As long as you have in /etc/kdump.conf
path /var/crash
core_collector makedumpfile -l --message-level 7 -d 31
This will get kernel only pages and would not be very big.
I could work with you privately to get what we need out of the vmcore
and we would avoid transferring it.
Thanks
Laurence
next prev parent reply other threads:[~2023-08-24 20:21 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-09 7:29 LVM kernel lockup scenario during lvcreate Jaco Kroon
2023-06-12 18:40 ` Bart Van Assche
2023-06-12 19:36 ` Jaco Kroon
2023-06-18 23:28 ` Bart Van Assche
2023-06-18 19:34 ` Jaco Kroon
2023-06-18 23:56 ` Bart Van Assche
2023-06-19 6:06 ` Jaco Kroon
2023-06-26 8:30 ` Jaco Kroon
2023-06-26 16:42 ` Bart Van Assche
2023-06-26 23:29 ` Jaco Kroon
2023-07-11 13:22 ` Jaco Kroon
2023-07-11 14:45 ` Bart Van Assche
2023-07-12 10:12 ` Jaco Kroon
2023-07-12 13:43 ` Bart Van Assche
[not found] ` <ef2812b4-7853-9dda-85dd-210636840a59@uls.co.za>
2023-08-24 7:29 ` Jaco Kroon
2023-08-24 17:13 ` Bart Van Assche
2023-08-24 20:16 ` Jaco Kroon
2023-08-24 17:29 ` Laurence Oberman
2023-08-24 20:01 ` Jaco Kroon
2023-08-24 20:19 ` Laurence Oberman [this message]
2023-08-24 23:40 ` Jaco Kroon
2023-08-25 12:01 ` Laurence Oberman
2023-08-26 18:18 ` Jaco Kroon
2023-09-06 21:03 ` Jaco Kroon
2023-09-06 21:22 ` Jens Axboe
2023-09-06 22:05 ` Jaco Kroon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=69227e4091f3d9b05e739f900340f11afacdd91f.camel@redhat.com \
--to=loberman@redhat.com \
--cc=bvanassche@acm.org \
--cc=jaco@uls.co.za \
--cc=linux-block@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).