All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vasant Hegde <vasant.hegde@amd.com>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Baolu Lu <baolu.lu@linux.intel.com>,
	iommu@lists.linux.dev, joro@8bytes.org,
	suravee.suthikulpanit@amd.com,
	Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com>
Subject: Re: [PATCH v2 iommu/next] iommu: Fix default domain setup
Date: Fri, 23 Jun 2023 10:08:39 +0530	[thread overview]
Message-ID: <b69af01e-d2ee-c8cc-4e59-9ca890e9fe4a@amd.com> (raw)
In-Reply-To: <ZJRSuBsVBVSi8Swn@ziepe.ca>

Jason,


On 6/22/2023 7:25 PM, Jason Gunthorpe wrote:
> On Thu, Jun 22, 2023 at 10:29:58AM +0530, Vasant Hegde wrote:
>> Hi Baolu, Jason,
>>
>>
>> On 6/20/2023 5:12 PM, Jason Gunthorpe wrote:
>>> On Tue, Jun 20, 2023 at 02:31:43PM +0800, Baolu Lu wrote:
>>>>>   err_restore:
>>>>>   	if (old_dom) {
>>>>>   		__iommu_group_set_domain_internal(
>>>>>   			group, old_dom, IOMMU_SET_DOMAIN_MUST_SUCCEED);
>>>>> +		group->default_domain = old_dom;
>>>>>   		iommu_domain_free(dom);
>>>>>   		old_dom = NULL;
>>>>>   	}
>>>>
>>>> The err_restore branch doesn't work if old_dom is NULL. We have no means
>>>> to restore a group from a successful first-time attaching to NULL
>>>> attaching.
>>>
>>> Yes, this is what I fixed in my alternative version
>>
>> Not really. Even your version has
> 
> I though what Baolu means is that it crashes in some of the cases.
> 
>> +err_restore_domain:
>> +	if (old_dom)
>>
>>
>> Only first time we will have old_dom is NULL and this functions returns error
>> code. iommu_probe_device()/bus_iommu_probe() handler error path properly and
>> frees resources. So I think this is fine.
> 
> Yes, the design is to leave it as-is and know that the error unwind
> will fail to attach the device and free the domain after
> release. There is a comment explaining this.
> 
>> @Jason,
>>   I'm fine with your version of patch as well. Are you going to send proper
>> patch -OR- do you want me to respin?
> 
> I was hoping you'd test it and can you share the oops message?

This has been tested. Sorry. I should have mentioned it explicitly.


One of the call trace without this fix:

[  350.334395] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  350.341356] #PF: supervisor read access in kernel mode
[  350.346494] #PF: error_code(0x0000) - not-present page
[  350.351625] PGD 0 P4D 0
[  350.354164] Oops: 0000 [#1] PREEMPT SMP NOPTI
[  350.358523] CPU: 1 PID: 3417 Comm: avocado Not tainted 6.4.0-rc4-next-20230602 #3
[  350.366003] Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS 2.3.6
07/06/2021
[  350.373567] RIP: 0010:__iommu_attach_device+0xc/0xa0
[  350.378533] Code: c0 c3 cc cc cc cc 48 89 f0 c3 cc cc cc cc 90 90 90 90 90 90
90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 41 54 55 48 8b 47 08 <48> 8b 00 48
85 c0 74 74 48 89 f5 e8 64 12 49 00 41 89 c4 85 c0 74
[  350.397269] RSP: 0018:ffffabae0220bd48 EFLAGS: 00010246
[  350.402488] RAX: 0000000000000000 RBX: ffff9ac04f70e410 RCX: 0000000000000001
[  350.409620] RDX: ffff9ac044db20c0 RSI: ffff9ac044fa50d0 RDI: ffff9ac04f70e410
[  350.416751] RBP: ffff9ac044fa50d0 R08: 1000000100209001 R09: 00000000000002dc
[  350.423875] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9ac043d54700
[  350.431001] R13: ffff9ac043d54700 R14: 0000000000000001 R15: 0000000000000001
[  350.438133] FS:  00007f02e30ae000(0000) GS:ffff9afeb2440000(0000)
knlGS:0000000000000000
[  350.446217] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  350.451956] CR2: 0000000000000000 CR3: 000000012afca006 CR4: 0000000000770ee0
[  350.459086] PKRU: 55555554
[  350.461792] Call Trace:
[  350.464235]  <TASK>
[  350.466335]  ? __die+0x24/0x70
[  350.469392]  ? page_fault_oops+0x82/0x150
[  350.473404]  ? __iommu_queue_command_sync+0x80/0xc0
[  350.478283]  ? exc_page_fault+0x69/0x150
[  350.482210]  ? asm_exc_page_fault+0x26/0x30
[  350.486396]  ? __iommu_attach_device+0xc/0xa0
[  350.490756]  ? __iommu_attach_device+0x1c/0xa0
[  350.495201]  __iommu_device_set_domain+0x42/0x80
[  350.499820]  __iommu_group_set_domain_internal+0x5d/0x160
[  350.505220]  iommu_setup_default_domain+0x318/0x400
[  350.510097]  iommu_group_store_type+0xb1/0x200
[  350.514536]  kernfs_fop_write_iter+0x12f/0x1c0
[  350.518982]  vfs_write+0x2a2/0x3b0
[  350.522389]  ksys_write+0x63/0xe0
[  350.525707]  do_syscall_64+0x3f/0x90
[  350.529284]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  350.534338] RIP: 0033:0x7f02e2f14a6f
[  350.537918] Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 19 c0 f7 ff 48 8b
54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0
ff ff 77 31 44 89 c7 48 89 44 24 08 e8 5c c0 f7 ff 48
[  350.556663] RSP: 002b:00007ffe5451b6e0 EFLAGS: 00000293 ORIG_RAX:
0000000000000001
[  350.564226] RAX: ffffffffffffffda RBX: 000055f381c20a20 RCX: 00007f02e2f14a6f
[  350.571352] RDX: 0000000000000008 RSI: 000055f382f36220 RDI: 000000000000000d
[  350.578485] RBP: 000055f3829e0340 R08: 0000000000000000 R09: 0000000000000000
[  350.585615] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000008
[  350.592740] R13: 00007f02e30adf80 R14: 000000000000000d R15: 000055f382f36220
[  350.599865]  </TASK>
[  350.602057] Modules linked in: xt_CHECKSUM xt_MASQUERADE xt_conntrack
ipt_REJECT nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge stp llc binfmt_misc
ipmi_ssif vfat fat intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd
kvm_amd kvm irqbypass acpi_ipmi rapl wmi_bmof acpi_cpufreq ccp sg k10temp
ipmi_si acpi_power_meter squashfs loop dm_multipath ipmi_devintf ipmi_msghandler
ramoops reed_solomon fuse ip_tables ext4 mbcache jbd2 raid10 raid456
async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq
libcrc32c raid1 raid0 linear dm_mod sd_mod t10_pi crc64_rocksoft_generic
crc64_rocksoft crc64 mgag200 i2c_algo_bit drm_shmem_helper crct10dif_pclmul
crc32_pclmul ahci drm_kms_helper crc32c_intel libahci ghash_clmulni_intel
mpt3sas sha512_ssse3 drm tg3 raid_class libata i2c_piix4 scsi_transport_sas wmi
[  350.679856] CR2: 0000000000000000
[  350.683174] ---[ end trace 0000000000000000 ]---
[  350.687791] RIP: 0010:__iommu_attach_device+0xc/0xa0
[  350.692758] Code: c0 c3 cc cc cc cc 48 89 f0 c3 cc cc cc cc 90 90 90 90 90 90
90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 41 54 55 48 8b 47 08 <48> 8b 00 48
85 c0 74 74 48 89 f5 e8 64 12 49 00 41 89 c4 85 c0 74
[  350.711495] RSP: 0018:ffffabae0220bd48 EFLAGS: 00010246
[  350.716721] RAX: 0000000000000000 RBX: ffff9ac04f70e410 RCX: 0000000000000001
[  350.723844] RDX: ffff9ac044db20c0 RSI: ffff9ac044fa50d0 RDI: ffff9ac04f70e410
[  350.730968] RBP: ffff9ac044fa50d0 R08: 1000000100209001 R09: 00000000000002dc
[  350.738092] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9ac043d54700
[  350.745215] R13: ffff9ac043d54700 R14: 0000000000000001 R15: 0000000000000001
[  350.752339] FS:  00007f02e30ae000(0000) GS:ffff9afeb2440000(0000)
knlGS:0000000000000000
[  350.760417] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  350.766154] CR2: 0000000000000000 CR3: 000000012afca006 CR4: 0000000000770ee0
[  350.773279] PKRU: 55555554
[  350.775989] Kernel panic - not syncing: Fatal exception
[  351.399507] Kernel Offset: 0x2fe00000 from 0xffffffff81000000 (relocation
range: 0xffffffff80000000-0xffffffffbfffffff)
[  351.410284] ---[ end Kernel panic - not syncing: Fatal exception ]---


-Vasant

      parent reply	other threads:[~2023-06-23  4:38 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-19  8:49 [PATCH v2 iommu/next] iommu: Fix default domain setup Vasant Hegde
2023-06-19 11:31 ` Jason Gunthorpe
2023-06-20  5:11   ` Vasant Hegde
2023-06-20  6:31 ` Baolu Lu
2023-06-20 11:42   ` Jason Gunthorpe
2023-06-22  4:59     ` Vasant Hegde
2023-06-22 13:55       ` Jason Gunthorpe
2023-06-23  3:08         ` Baolu Lu
2023-06-23  4:38         ` Vasant Hegde [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b69af01e-d2ee-c8cc-4e59-9ca890e9fe4a@amd.com \
    --to=vasant.hegde@amd.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=dheerajkumar.srivastava@amd.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@ziepe.ca \
    --cc=joro@8bytes.org \
    --cc=suravee.suthikulpanit@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.