All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Jürgen Groß" <jgross@suse.com>
To: "Jason Andryuk" <jason.andryuk@amd.com>,
	"Marek Marczykowski-Górecki" <marmarek@invisiblethingslab.com>,
	"Jason Andryuk" <jandryuk@gmail.com>
Cc: Julien Grall <julien@xen.org>,
	xen-devel <xen-devel@lists.xenproject.org>
Subject: Re: NULL pointer dereference in xenbus_thread->...
Date: Wed, 30 Apr 2025 17:43:58 +0200	[thread overview]
Message-ID: <ea422f7c-64fb-4b19-b953-cb3d0404222a@suse.com> (raw)
In-Reply-To: <641aef98-5dde-4618-9fa4-7ccfa2e1989d@amd.com>


[-- Attachment #1.1.1: Type: text/plain, Size: 3473 bytes --]

On 30.04.25 16:29, Jason Andryuk wrote:
> On 2025-04-30 06:56, Marek Marczykowski-Górecki wrote:
>> On Tue, Apr 29, 2025 at 08:59:45PM -0400, Jason Andryuk wrote:
>>> Hi Marek,
>>>
>>> On Wed, Apr 23, 2025 at 8:42 AM Marek Marczykowski-Górecki
>>> <marmarek@invisiblethingslab.com> wrote:
> 
>>>> I've got some more report confirming it's still happening on Linux
>>>> 6.12.18. Is there anything I can do to help fixing this? Maybe ask users
>>>> to enable some extra logging?
>>>
>>> Have you been able to capture a crash with debug symbols and run it
>>> through scripts/decode_stacktrace.sh?
>>
>> Not really, as I don't have debug symbols for this kernel. And I can't
>> reliably reproduce it myself (for me it happens about once in a
>> month...). I can try reproducing debug symbols, theoretically I should
>> have all ingredients for it.
>>
>>> I'm curious what process_msg+0x18e/0x2f0 is.  process_writes() has a
>>> direct call to wake_up(), but process_msg() calling req->cb(req) may
>>> be xs_wake_up() which is a thin wrapper over wake_up().
>>
>> There is a code dump in the crash message, does it help?
> 
> That's a little deeper in the call chain.  If you have a vmlinux or bzImage with 
> a matching stacktrace, that would work to look up the address in the 
> disassembly.  So if you don't have a matching pair, maybe try to catch it the 
> next time.
> 
>>> They make me wonder if req has been free()ed and at least partially
>>> zero-ed, but it still has wake_up() called.  The call stack here is
>>> reminiscent of the one here
>>> https://lore.kernel.org/xen-devel/Z_lJTyVipJJEpWg2@mail-itl/ and the
>>> unexpected value there is 0.
>>
>> That's interesting idea, the one above I've seen only on 6.15-rc1 (and
>> no latter rc). But maybe?
> 
> I am guessing, so I could be wrong.  NULL pointer and unexpected zero value are 
> both 0 at least.  Also Whonix looks like it may use init_on_free=1 to zero 
> memory at free time.

I have looked at this issue multiple times now.

Just some remarks what IMO could go wrong (I didn't find any proof that
this really happened, though), in case someone wants to double check:

The most probably candidate for something going wrong is a use-after-free
of a struct xb_req_data element (normally named "req" in the related code).

Some words about the not really obvious locking scheme used for those
elements: A "req" is owned by a thread as long as it isn't in any of the
lists it can live in (xs_reply_list or xb_write_list). Putting it into one
of the lists or removing it again requires to hold the xb_write_mutex.

A "req" needs to be in a certain state when either in one of the lists or
when being owned by a worker thread.

I'm wondering whether it could happen that a thread waiting for a "req"
could be woken up and the "req" is being freed before the waiting thread
can react. Normally this shouldn't be possible, but "never say never".
What catched my eye today is the test of req->state == xb_req_state_wait_reply
in process_msg() just after dropping the xb_write_mutex. This looks a little
bit fishy, but OTOH the request has been just removed from the xs_reply_list,
so no mutex should be needed for that test.

Possible candidates for such an "impossible" scenario include a wrap of
xs_request_id (not very probable, though, as having 4 billion Xenstore
requests "in flight" is rather unlikely IMHO).


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

  reply	other threads:[~2025-04-30 15:44 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-28 21:50 NULL pointer dereference in xenbus_thread-> Marek Marczykowski-Górecki
2023-10-22 14:14 ` Marek Marczykowski-Górecki
2024-03-25 16:17   ` Marek Marczykowski-Górecki
2024-03-26 11:00     ` Julien Grall
2024-05-31 22:48       ` Marek Marczykowski-Górecki
2025-04-23 12:41         ` Marek Marczykowski-Górecki
2025-04-30  0:59           ` Jason Andryuk
2025-04-30 10:56             ` Marek Marczykowski-Górecki
2025-04-30 14:29               ` Jason Andryuk
2025-04-30 15:43                 ` Jürgen Groß [this message]
2025-05-03 15:08                   ` Jason Andryuk
2025-05-02  0:01             ` Marek Marczykowski-Górecki
2025-05-03 14:59               ` Jason Andryuk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ea422f7c-64fb-4b19-b953-cb3d0404222a@suse.com \
    --to=jgross@suse.com \
    --cc=jandryuk@gmail.com \
    --cc=jason.andryuk@amd.com \
    --cc=julien@xen.org \
    --cc=marmarek@invisiblethingslab.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.