All of lore.kernel.org
 help / color / mirror / Atom feed
From: Roman Gushchin <roman.gushchin@linux.dev>
To: Parav Pandit <parav@mellanox.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>, Leon Romanovsky <leon@kernel.org>,
	Maher Sanalla <msanalla@nvidia.com>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] RDMA/core: fix a NULL-pointer dereference in hw_stat_device_show()
Date: Sat, 22 Feb 2025 20:50:41 +0000	[thread overview]
Message-ID: <Z7o4ofdXZ-6w-jO0@google.com> (raw)
In-Reply-To: <CY8PR12MB7195D4AD9844C91313955AFCDCC62@CY8PR12MB7195.namprd12.prod.outlook.com>

On Sat, Feb 22, 2025 at 06:36:46PM +0000, Parav Pandit wrote:
> Hi Roman,
> 
> > From: Roman Gushchin <roman.gushchin@linux.dev>
> > Sent: Saturday, February 22, 2025 1:33 AM
> > 
> > On Fri, Feb 21, 2025 at 08:03:33AM +0000, Parav Pandit wrote:
> > > > From: Roman Gushchin <roman.gushchin@linux.dev>
> > > > Sent: Friday, February 21, 2025 10:20 AM
> > > >
> > > > On Fri, Feb 21, 2025 at 04:34:25AM +0000, Parav Pandit wrote:
> > > > >
> > > > > > From: Roman Gushchin <roman.gushchin@linux.dev>
> > > > > > Sent: Friday, February 21, 2025 9:56 AM
> > > > > >
> > > > > > On Fri, Feb 21, 2025 at 03:14:16AM +0000, Parav Pandit wrote:
> > > > > > >
> > > > > > > > From: Roman Gushchin <roman.gushchin@linux.dev>
> > > > > > > > Sent: Friday, February 21, 2025 7:36 AM
> > > > > > > >
> > > > > > > > Commit 54747231150f ("RDMA: Introduce and use
> > > > > > > > rdma_device_to_ibdev()") introduced rdma_device_to_ibdev()
> > > > > > > > helper which has to be used to obtain an ib_device pointer
> > > > > > > > from a
> > > > device pointer.
> > > > > > > >
> > > > > > >
> > > > > > > > hw_stat_device_show() and hw_stat_device_store() were missed.
> > > > > > > >
> > > > > > > > It causes a NULL pointer dereference panic on an attempt to
> > > > > > > > read hw counters from a namespace, when the device structure
> > > > > > > > is not embedded into the ib_device structure.
> > > > > > > Do you mean net namespace other than default init_net?
> > > > > > > Assuming the answer is yes, some question below.
> > > > > > >
> > > > > > > > In this case casting the device pointer into the ib_device
> > > > > > > > pointer using container_of() is wrong.
> > > > > > > > Instead, rdma_device_to_ibdev() should be used, which uses
> > > > > > > > the
> > > > > > > > back- reference (container_of(device, struct ib_core_device,
> > > > > > > > dev))-
> > > > >owner.
> > > > > > > >
> > > > > > > > [42021.807566] BUG: kernel NULL pointer dereference, address:
> > > > > > > > 0000000000000028 [42021.814463] #PF: supervisor read access
> > > > > > > > in kernel mode [42021.819549] #PF: error_code(0x0000) -
> > > > > > > > not-present page [42021.824636] PGD 0 P4D 0 [42021.827145]
> > > > > > > > Oops: 0000 [#1] SMP PTI [42021.830598] CPU: 82 PID: 2843922
> > > > > > > > Comm: switchto-
> > > > defaul Kdump:
> > > > > > loaded
> > > > > > > > Tainted: G S      W I        XXX
> > > > > > > > [42021.841697] Hardware name: XXX [42021.849619] RIP:
> > > > > > > > 0010:hw_stat_device_show+0x1e/0x40 [ib_core] [42021.855362]
> > > > > > > > Code: 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f
> > > > > > > > 44 00 00 49 89 d0 4c 8b 5e 20 48 8b 8f b8 04 00 00 48 81 c7
> > > > > > > > f0 fa ff ff <48> 8b
> > > > > > > > 41 28 48 29 ce 48 83 c6 d0 48 c1 ee 04 69 d6 ab aa aa aa 48
> > > > > > > > [42021.873931]
> > > > > > > > RSP: 0018:ffff97fe90f03da0 EFLAGS: 00010287 [42021.879108]
> > RAX:
> > > > > > > > ffff9406988a8c60 RBX: ffff940e1072d438 RCX: 0000000000000000
> > > > > > > > [42021.886169] RDX: ffff94085f1aa000 RSI: ffff93c6cbbdbcb0 RDI:
> > > > > > > > ffff940c7517aef0 [42021.893230] RBP: ffff97fe90f03e70 R08:
> > > > > > > > ffff94085f1aa000 R09: 0000000000000000 [42021.900294] R10:
> > > > > > > > ffff94085f1aa000 R11: ffffffffc0775680 R12: ffffffff87ca2530
> > > > > > > > [42021.907355]
> > > > > > > > R13: ffff940651602840 R14: ffff93c6cbbdbcb0 R15:
> > > > > > > > ffff94085f1aa000 [42021.914418] FS:  00007fda1a3b9700(0000)
> > > > > > GS:ffff94453fb80000(0000)
> > > > > > > > knlGS:0000000000000000 [42021.922423] CS:  0010 DS: 0000 ES:
> > > > > > > > 0000
> > > > > > CR0:
> > > > > > > > 0000000080050033 [42021.928130] CR2: 0000000000000028
> > CR3:
> > > > > > > > 00000042dcfb8003 CR4: 00000000003726f0 [42021.935194] DR0:
> > > > > > > > 0000000000000000 DR1: 0000000000000000 DR2:
> > > > 0000000000000000
> > > > > > > > [42021.942257] DR3: 0000000000000000 DR6: 00000000fffe0ff0
> > DR7:
> > > > > > > > 0000000000000400 [42021.949324] Call Trace:
> > > > > > > > [42021.951756]  <TASK>
> > > > > > > > [42021.953842]  [<ffffffff86c58674>] ? show_regs+0x64/0x70
> > > > > > > > [42021.959030] [<ffffffff86c58468>] ? __die+0x78/0xc0
> > > > > > > > [42021.963874]
> > > > > > [<ffffffff86c9ef75>] ?
> > > > > > > > page_fault_oops+0x2b5/0x3b0 [42021.969749]
> > [<ffffffff87674b92>] ?
> > > > > > > > exc_page_fault+0x1a2/0x3c0 [42021.975549]  [<ffffffff87801326>]
> > ?
> > > > > > > > asm_exc_page_fault+0x26/0x30 [42021.981517]
> > > > > > > > [<ffffffffc0775680>]
> > > > ?
> > > > > > > > __pfx_show_hw_stats+0x10/0x10 [ib_core] [42021.988482]
> > > > > > > > [<ffffffffc077564e>] ? hw_stat_device_show+0x1e/0x40
> > > > > > > > [ib_core] [42021.995438]  [<ffffffff86ac7f8e>]
> > > > > > > > dev_attr_show+0x1e/0x50 [42022.000803]  [<ffffffff86a3eeb1>]
> > > > > > > > sysfs_kf_seq_show+0x81/0xe0 [42022.006508]
> > > > > > > > [<ffffffff86a11134>] seq_read_iter+0xf4/0x410 [42022.011954]
> > > > > > > > [<ffffffff869f4b2e>] vfs_read+0x16e/0x2f0 [42022.017058]
> > > > > > > > [<ffffffff869f50ee>] ksys_read+0x6e/0xe0 [42022.022073]
> > > > > > > > [<ffffffff8766f1ca>]
> > > > > > > > do_syscall_64+0x6a/0xa0 [42022.027441]  [<ffffffff8780013b>]
> > > > > > > > entry_SYSCALL_64_after_hwframe+0x78/0xe2
> > > > > > > >
> > > > > > > > Fixes: 54747231150f ("RDMA: Introduce and use
> > > > > > > > rdma_device_to_ibdev()")
> > > > > > > Commit eb15c78b05bd9 eliminated hw_counters sysfs directory
> > > > > > > into the
> > > > > > net namespace.
> > > > > > > I don't see it created in any other net ns other than init_net
> > > > > > > with kernel
> > > > > > 6.12+.
> > > > > > >
> > > > > > > I am puzzled. Can you please explain/share the reproduction
> > > > > > > steps for
> > > > > > generating above call trace?
> > > > > >
> > > > > > Hi Parav!
> > > > > >
> > > > > > This bug was spotted in the production on a small number of
> > > > > > machines. They were running a 6.6-based kernel (with no changes
> > > > > > around this code). I don't have a reproducer (and there is no
> > > > > > simple way for me to reproduce the problem), but I've several
> > > > > > core dumps and from inspecting them it was clear that a
> > > > > > ib_device pointer obtained in hw_stat_device_show() was wrong.
> > > > > > At the same time the ib_pointer obtained in the way
> > > > > > rdma_device_to_ibdev() works was
> > > > correct.
> > > > > >
> > > > > I just tried reproducing now on 6.12+ kernel manually.
> > > >
> > > > Can you, please, share your steps? Or try the 6.6 kernel?
> > > >
> > > $ rdma system show to display 'netns shared'.
> > > $ ip netns add foo
> > > $ ip netns exec foo bash
> > > $ attempt to access the hw counters from the foo net namespace.
> > 
> > Ok, it worked well. The following commands
> > 
> >   $ ip netns add foo
> >   $ ip netns exec foo bash
> >   $ cat /sys/class/infiniband/mlx4_0/hw_counters/*
> > 
> > cause a panic on a vanilla v6.12.9 without my changes and work perfectly fine
> > with my patch.
> > 
> I dig further and I see the issue. Its not the per port counter.
> It's the per device counter which got broken by commit 467f432a521a2.
> It introduced the device counter unintentionally in non init net.
> And we need to fix that (instead of allowing it and opening more holes).
> I replied with more details to Jason G in previous reply.
> Please take a look.

I can prepare a patch like this, but I'm slightly worried about hiding
previously exposed counters. I don't think it's a problem for us (even though
I'm not 100% sure), but technically it can be seen as breaking the API.

How about merging 2 patches: my original patch which fixes the memory corruption
and a separate patch which hides those counters from non-init namespace?
The first can be safely propagated towards stable trees.

I'll prepate the second patch soon.

Thanks!

  reply	other threads:[~2025-02-22 20:50 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-21  2:05 [PATCH] RDMA/core: fix a NULL-pointer dereference in hw_stat_device_show() Roman Gushchin
2025-02-21  3:14 ` Parav Pandit
2025-02-21  4:25   ` Roman Gushchin
2025-02-21  4:34     ` Parav Pandit
2025-02-21  4:49       ` Roman Gushchin
2025-02-21  8:03         ` Parav Pandit
2025-02-21 20:02           ` Roman Gushchin
2025-02-22 18:36             ` Parav Pandit
2025-02-22 20:50               ` Roman Gushchin [this message]
2025-02-21 17:43       ` Jason Gunthorpe
2025-02-22 18:34         ` Parav Pandit
2025-02-24 15:11           ` Jason Gunthorpe
2025-02-24 15:16             ` Parav Pandit
2025-02-24 23:22               ` Roman Gushchin
2025-02-24 23:30                 ` Jason Gunthorpe
2025-02-25  3:42                   ` Roman Gushchin
2025-02-25  4:34                     ` Parav Pandit
2025-02-26  3:41                       ` Roman Gushchin
2025-02-25 13:16                     ` Jason Gunthorpe
2025-02-26  3:37                       ` Roman Gushchin
2025-02-24 23:31               ` Jason Gunthorpe
2025-02-25 10:38                 ` Parav Pandit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z7o4ofdXZ-6w-jO0@google.com \
    --to=roman.gushchin@linux.dev \
    --cc=jgg@ziepe.ca \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=msanalla@nvidia.com \
    --cc=parav@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.