From: Leon Romanovsky <leon@kernel.org>
To: Thierry Reding <thierry.reding@gmail.com>
Cc: Yishai Hadas <yishaih@nvidia.com>,
Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>,
Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>,
linux-rdma@vger.kernel.org, Robin Murphy <robin.murphy@arm.com>,
Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>,
linux-kernel@vger.kernel.org,
"iommu@lists.linux-foundation.org"
<iommu@lists.linux-foundation.org>,
Doug Ledford <dledford@redhat.com>,
Zhu Yanjun <zyjzyj2000@gmail.com>,
Jason Gunthorpe <jgg@nvidia.com>,
Maor Gottlieb <maorg@nvidia.com>, Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH rdma-next v1 1/2] lib/scatterlist: Fix wrong update of orig_nents
Date: Wed, 30 Jun 2021 20:49:41 +0300 [thread overview]
Message-ID: <YNyutTbAWRcK7Bgp@unreal> (raw)
In-Reply-To: <YNytdbEG9OSHOT1z@orome.fritz.box>
On Wed, Jun 30, 2021 at 07:44:21PM +0200, Thierry Reding wrote:
> On Wed, Jun 30, 2021 at 01:12:26PM +0200, Marek Szyprowski wrote:
> > Hi Leon,
> >
> > On 29.06.2021 10:40, Leon Romanovsky wrote:
> > > From: Maor Gottlieb <maorg@nvidia.com>
> > >
> > > orig_nents should represent the number of entries with pages,
> > > but __sg_alloc_table_from_pages sets orig_nents as the number of
> > > total entries in the table. This is wrong when the API is used for
> > > dynamic allocation where not all the table entries are mapped with
> > > pages. It wasn't observed until now, since RDMA umem who uses this
> > > API in the dynamic form doesn't use orig_nents implicit or explicit
> > > by the scatterlist APIs.
> > >
> > > Fix it by:
> > > 1. Set orig_nents as number of entries with pages also in
> > > __sg_alloc_table_from_pages.
> > > 2. Add a new field total_nents to reflect the total number of entries
> > > in the table. This is required for the release flow (sg_free_table).
> > > This filed should be used internally only by scatterlist.
> > >
> > > Fixes: 07da1223ec93 ("lib/scatterlist: Add support in dynamic allocation of SG table from pages")
> > > Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
> > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> >
> > This patch landed in linux-next 20210630 as commit a52724456928
> > ("lib/scatterlist: Fix wrong update of orig_nents"). It causes serious
> > regression in DMA-IOMMU integration, which can be observed for example
> > on ARM Juno board during boot:
> >
> > Unable to handle kernel paging request at virtual address 00376f42a6e40454
> > Mem abort info:
> > ESR = 0x96000004
> > EC = 0x25: DABT (current EL), IL = 32 bits
> > SET = 0, FnV = 0
> > EA = 0, S1PTW = 0
> > FSC = 0x04: level 0 translation fault
> > Data abort info:
> > ISV = 0, ISS = 0x00000004
> > CM = 0, WnR = 0
> > [00376f42a6e40454] address between user and kernel address ranges
> > Internal error: Oops: 96000004 [#1] PREEMPT SMP
> > Modules linked in:
> > CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.13.0-next-20210630+ #3585
> > Hardware name: ARM Juno development board (r1) (DT)
> > pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
> > pc : __sg_free_table+0x60/0xa0
> > lr : __sg_free_table+0x7c/0xa0
> > ..
> > Call trace:
> > __sg_free_table+0x60/0xa0
> > sg_free_table+0x1c/0x28
> > iommu_dma_alloc+0xc8/0x388
> > dma_alloc_attrs+0xcc/0xf0
> > dmam_alloc_attrs+0x68/0xb8
> > sil24_port_start+0x60/0xe0
> > ata_host_start.part.32+0xbc/0x208
> > ata_host_activate+0x64/0x150
> > sil24_init_one+0x1e8/0x268
> > local_pci_probe+0x3c/0xa0
> > pci_device_probe+0x128/0x1c8
> > really_probe+0x138/0x2d0
> > __driver_probe_device+0x78/0xd8
> > driver_probe_device+0x40/0x110
> > __driver_attach+0xcc/0x118
> > bus_for_each_dev+0x68/0xc8
> > driver_attach+0x20/0x28
> > bus_add_driver+0x168/0x1f8
> > driver_register+0x60/0x110
> > __pci_register_driver+0x5c/0x68
> > sil24_pci_driver_init+0x20/0x28
> > do_one_initcall+0x84/0x450
> > kernel_init_freeable+0x31c/0x38c
> > kernel_init+0x20/0x120
> > ret_from_fork+0x10/0x18
> > Code: d37be885 6b01007f 52800004 540000a2 (f8656813)
> > ---[ end trace 4ba4f0c9c48711a1 ]---
> > Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> >
> > It looks that some changes to the scatterlist structures are missing
> > outside of the lib/scatterlist.c.
> >
> > For now I would suggest to revert this change.
>
> I see a very similar crash on Tegra during the HDA driver's probe.
>
> Leon, let me know if you need a tester for a replacement patch.
With a great pleasure, I'll contact you offline when we prepare it.
For now, this patch will be dropped.
Thanks
>
> Thanks,
> Thierry
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
WARNING: multiple messages have this Message-ID (diff)
From: Leon Romanovsky <leon@kernel.org>
To: Thierry Reding <thierry.reding@gmail.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>,
Doug Ledford <dledford@redhat.com>,
Jason Gunthorpe <jgg@nvidia.com>,
Maor Gottlieb <maorg@nvidia.com>,
Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>,
linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>,
Yishai Hadas <yishaih@nvidia.com>,
Zhu Yanjun <zyjzyj2000@gmail.com>, Christoph Hellwig <hch@lst.de>,
Robin Murphy <robin.murphy@arm.com>,
"iommu@lists.linux-foundation.org"
<iommu@lists.linux-foundation.org>,
Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Subject: Re: [PATCH rdma-next v1 1/2] lib/scatterlist: Fix wrong update of orig_nents
Date: Wed, 30 Jun 2021 20:49:41 +0300 [thread overview]
Message-ID: <YNyutTbAWRcK7Bgp@unreal> (raw)
In-Reply-To: <YNytdbEG9OSHOT1z@orome.fritz.box>
On Wed, Jun 30, 2021 at 07:44:21PM +0200, Thierry Reding wrote:
> On Wed, Jun 30, 2021 at 01:12:26PM +0200, Marek Szyprowski wrote:
> > Hi Leon,
> >
> > On 29.06.2021 10:40, Leon Romanovsky wrote:
> > > From: Maor Gottlieb <maorg@nvidia.com>
> > >
> > > orig_nents should represent the number of entries with pages,
> > > but __sg_alloc_table_from_pages sets orig_nents as the number of
> > > total entries in the table. This is wrong when the API is used for
> > > dynamic allocation where not all the table entries are mapped with
> > > pages. It wasn't observed until now, since RDMA umem who uses this
> > > API in the dynamic form doesn't use orig_nents implicit or explicit
> > > by the scatterlist APIs.
> > >
> > > Fix it by:
> > > 1. Set orig_nents as number of entries with pages also in
> > > __sg_alloc_table_from_pages.
> > > 2. Add a new field total_nents to reflect the total number of entries
> > > in the table. This is required for the release flow (sg_free_table).
> > > This filed should be used internally only by scatterlist.
> > >
> > > Fixes: 07da1223ec93 ("lib/scatterlist: Add support in dynamic allocation of SG table from pages")
> > > Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
> > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> >
> > This patch landed in linux-next 20210630 as commit a52724456928
> > ("lib/scatterlist: Fix wrong update of orig_nents"). It causes serious
> > regression in DMA-IOMMU integration, which can be observed for example
> > on ARM Juno board during boot:
> >
> > Unable to handle kernel paging request at virtual address 00376f42a6e40454
> > Mem abort info:
> > ESR = 0x96000004
> > EC = 0x25: DABT (current EL), IL = 32 bits
> > SET = 0, FnV = 0
> > EA = 0, S1PTW = 0
> > FSC = 0x04: level 0 translation fault
> > Data abort info:
> > ISV = 0, ISS = 0x00000004
> > CM = 0, WnR = 0
> > [00376f42a6e40454] address between user and kernel address ranges
> > Internal error: Oops: 96000004 [#1] PREEMPT SMP
> > Modules linked in:
> > CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.13.0-next-20210630+ #3585
> > Hardware name: ARM Juno development board (r1) (DT)
> > pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
> > pc : __sg_free_table+0x60/0xa0
> > lr : __sg_free_table+0x7c/0xa0
> > ..
> > Call trace:
> > __sg_free_table+0x60/0xa0
> > sg_free_table+0x1c/0x28
> > iommu_dma_alloc+0xc8/0x388
> > dma_alloc_attrs+0xcc/0xf0
> > dmam_alloc_attrs+0x68/0xb8
> > sil24_port_start+0x60/0xe0
> > ata_host_start.part.32+0xbc/0x208
> > ata_host_activate+0x64/0x150
> > sil24_init_one+0x1e8/0x268
> > local_pci_probe+0x3c/0xa0
> > pci_device_probe+0x128/0x1c8
> > really_probe+0x138/0x2d0
> > __driver_probe_device+0x78/0xd8
> > driver_probe_device+0x40/0x110
> > __driver_attach+0xcc/0x118
> > bus_for_each_dev+0x68/0xc8
> > driver_attach+0x20/0x28
> > bus_add_driver+0x168/0x1f8
> > driver_register+0x60/0x110
> > __pci_register_driver+0x5c/0x68
> > sil24_pci_driver_init+0x20/0x28
> > do_one_initcall+0x84/0x450
> > kernel_init_freeable+0x31c/0x38c
> > kernel_init+0x20/0x120
> > ret_from_fork+0x10/0x18
> > Code: d37be885 6b01007f 52800004 540000a2 (f8656813)
> > ---[ end trace 4ba4f0c9c48711a1 ]---
> > Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> >
> > It looks that some changes to the scatterlist structures are missing
> > outside of the lib/scatterlist.c.
> >
> > For now I would suggest to revert this change.
>
> I see a very similar crash on Tegra during the HDA driver's probe.
>
> Leon, let me know if you need a tester for a replacement patch.
With a great pleasure, I'll contact you offline when we prepare it.
For now, this patch will be dropped.
Thanks
>
> Thanks,
> Thierry
next prev parent reply other threads:[~2021-06-30 17:49 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-29 8:40 [PATCH rdma-next v1 0/2] SG fix together with update to RDMA umem Leon Romanovsky
2021-06-29 8:40 ` [PATCH rdma-next v1 1/2] lib/scatterlist: Fix wrong update of orig_nents Leon Romanovsky
2021-06-30 5:59 ` Christoph Hellwig
2021-06-30 6:29 ` Leon Romanovsky
2021-06-30 6:33 ` Christoph Hellwig
2021-06-30 7:02 ` Leon Romanovsky
2021-06-30 7:16 ` Christoph Hellwig
2021-06-30 9:14 ` Maor Gottlieb
2021-06-30 11:12 ` Marek Szyprowski
2021-06-30 11:12 ` Marek Szyprowski
2021-06-30 11:16 ` Leon Romanovsky
2021-06-30 11:16 ` Leon Romanovsky
2021-06-30 17:44 ` Thierry Reding
2021-06-30 17:44 ` Thierry Reding
2021-06-30 17:49 ` Leon Romanovsky [this message]
2021-06-30 17:49 ` Leon Romanovsky
2021-06-29 8:40 ` [PATCH rdma-next v1 2/2] RDMA: Use dma_map_sgtable for map umem pages Leon Romanovsky
2021-06-29 23:08 ` [PATCH rdma-next v1 0/2] SG fix together with update to RDMA umem Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YNyutTbAWRcK7Bgp@unreal \
--to=leon@kernel.org \
--cc=b.zolnierkie@samsung.com \
--cc=dennis.dalessandro@cornelisnetworks.com \
--cc=dledford@redhat.com \
--cc=hch@lst.de \
--cc=iommu@lists.linux-foundation.org \
--cc=jgg@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=maorg@nvidia.com \
--cc=mike.marciniszyn@cornelisnetworks.com \
--cc=robin.murphy@arm.com \
--cc=thierry.reding@gmail.com \
--cc=yishaih@nvidia.com \
--cc=zyjzyj2000@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.