Linux IOMMU Development
 help / color / mirror / Atom feed
From: Robin Murphy <robin.murphy@arm.com>
To: Kunkun Jiang <jiangkunkun@huawei.com>, Will Deacon <will@kernel.org>
Cc: wanghaibin.wang@huawei.com, iommu@lists.linux-foundation.org,
	zhukeqian1@huawei.com
Subject: Re: Some questions about arm_lpae_install_table
Date: Wed, 4 Nov 2020 11:15:19 +0000	[thread overview]
Message-ID: <4fbe4ad5-2eef-c5a1-b60c-d00a274b5e35@arm.com> (raw)
In-Reply-To: <6ad9732c-caa4-0049-3fa2-c193e5dc6a88@huawei.com>

On 2020-11-04 07:17, Kunkun Jiang wrote:
> Hi Will and Robin,
> 
> Sorry for the late reply.
> 
> On 2020/11/3 18:21, Robin Murphy wrote:
>> On 2020-11-03 09:11, Will Deacon wrote:
>>> On Tue, Nov 03, 2020 at 11:00:27AM +0800, Kunkun Jiang wrote:
>>>> Recently, I have read and learned the code related to io-pgtable-arm.c.
>>>> There
>>>> are two question on arm_lpae_install_table.
>>>>
>>>> 1、the first
>>>>
>>>>> static arm_lpae_iopte arm_lpae_install_table(arm_lpae_iopte *table,
>>>>>                                               arm_lpae_iopte *ptep,
>>>>>                                               arm_lpae_iopte curr,
>>>>>                                               struct io_pgtable_cfg 
>>>>> *cfg)
>>>>> {
>>>>>          arm_lpae_iopte old, new;
>>>>>
>>>>>          new = __pa(table) | ARM_LPAE_PTE_TYPE_TABLE;
>>>>>          if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_NS)
>>>>>                  new |= ARM_LPAE_PTE_NSTABLE;
>>>>>
>>>>>         /*
>>>>>           * Ensure the table itself is visible before its PTE can be.
>>>>>           * Whilst we could get away with cmpxchg64_release below, 
>>>>> this
>>>>>           * doesn't have any ordering semantics when !CONFIG_SMP.
>>>>>           */
>>>>>          dma_wmb();
>>>>>
>>>>>          old = cmpxchg64_relaxed(ptep, curr, new);
>>>>>
>>>>>          if (cfg->coherent_walk || (old & ARM_LPAE_PTE_SW_SYNC))
>>>>>                  return old;
>>>>>
>>>>>          /* Even if it's not ours, there's no point waiting; just 
>>>>> kick it
>>>>> */
>>>>>          __arm_lpae_sync_pte(ptep, cfg);
>>>>>          if (old == curr)
>>>>>                  WRITE_ONCE(*ptep, new | ARM_LPAE_PTE_SW_SYNC);
>>>>>
>>>>>          return old;
>>>>> }
>>>>
>>>> If another thread changes the ptep between cmpxchg64_relaxed and
>>>> WRITE_ONCE(*ptep, new | ARM_LPAE_PTE_SW_SYNC), the operation
>>>> WRITE_ONCE will overwrite the change.
>>>
>>> Can you please provide an example of a code path where this happens? The
>>> idea is that CPUs can race on the cmpxchg(), but there will only be one
>>> winner.
>>
>> The only way a table entry can suddenly disappear is in a race that 
>> involves mapping or unmapping a whole block/table-sized region, while 
>> simultaneously mapping a page *within* that region. Yes, if someone 
>> uses the API in a nonsensical and completely invalid way that cannot 
>> have a predictable outcome, they get an unpredictable outcome. Whoop 
>> de do...
>>
> Yes, as Robin mentioned, there will be an unpredictable outcome in extreme
> cases. Now, we have two APIs mapping and unmapping to alter a block/page
> descriptor. I worry about that it may be  increasingly difficult to add 
> API in
> the future.

I still don't understand your concern. If two threads try to 
simultaneously create a block mapping and a page mapping *for the same 
virtual address*, the result is that one of those mappings will end up 
in place, depending on who managed to write to the PTE last. This code 
is self-consistent - it doesn't crash, nor end up with any mapping that 
*wasn't* requested - it just doesn't go far out of its way to 
accommodate obviously invalid usage. By comparison, say instead those 
two threads simultaneously call kfree() on the same pointer; the outcome 
of that is considerably worse.

What kind of additional new API are you imagining where such 
deliberately racy behaviour would be anything other than broken nonsense?

>>>> 2、the second
>>>>
>>>>> for (i = 0; i < tablesz / sizeof(pte); i++, blk_paddr += split_sz) {
>>>>>                  /* Unmap! */
>>>>>                  if (i == unmap_idx)
>>>>> continue;
>>>>>
>>>>>                  __arm_lpae_init_pte(data, blk_paddr, pte, lvl,
>>>>> &tablep[i]);
>>>>> }
>>>>>
>>>>> pte = arm_lpae_install_table(tablep, ptep, blk_pte, cfg);
>>>>
>>>> When altering a translation table descriptor include split a block into
>>>> constituent granules, the Armv8-A and SMMUv3 architectures require
>>>> a break-before-make procedure. But in the function 
>>>> arm_lpae_split_blk_unmap,
>>>> it changes a block descriptor to an equivalent span of page 
>>>> translations
>>>> directly. Is it appropriate to do so?
>>>
>>> Break-before-make doesn't really work for the SMMU because faults are
>>> generally fatal.
>>>
>>> Are you seeing problems in practice with this code?
>>
>> TBH I do still wonder if we shouldn't just get rid of split_blk_unmap 
>> and all its complexity. Other drivers treat an unmap of a page from a 
>> block entry as simply unmapping the whole block, and that's the 
>> behaviour VFIO seems to expect. My only worry is that it's been around 
>> long enough that there might be some horrible out-of-tree code that 
>> *is* relying on it, despite the fact that it's impossible to implement 
>> in a way that's definitely 100% safe :/
>>
>> Robin.
>> .
> I have not encountered a problem. I mean SMMU has supported BBML feature
> in SMMUv3.2. Maybe we can use this feature to update translation table 
> without
> break-before-make.

We could. But then what do we do for all the existing hardware without 
BBML, or on x86 or other drivers which don't split blocks anyway?

Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

      reply	other threads:[~2020-11-04 11:15 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-03  3:00 Some questions about arm_lpae_install_table Kunkun Jiang
2020-11-03  9:11 ` Will Deacon
2020-11-03 10:21   ` Robin Murphy
2020-11-04  7:17     ` Kunkun Jiang
2020-11-04 11:15       ` Robin Murphy [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4fbe4ad5-2eef-c5a1-b60c-d00a274b5e35@arm.com \
    --to=robin.murphy@arm.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jiangkunkun@huawei.com \
    --cc=wanghaibin.wang@huawei.com \
    --cc=will@kernel.org \
    --cc=zhukeqian1@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox