linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
To: Ryan Roberts <ryan.roberts@arm.com>,
	"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>
Cc: "joro@8bytes.org" <joro@8bytes.org>,
	"jgg@nvidia.com" <jgg@nvidia.com>,
	"kevin.tian@intel.com" <kevin.tian@intel.com>,
	"nicolinc@nvidia.com" <nicolinc@nvidia.com>,
	"mshavit@google.com" <mshavit@google.com>,
	"robin.murphy@arm.com" <robin.murphy@arm.com>,
	"will@kernel.org" <will@kernel.org>,
	"joao.m.martins@oracle.com" <joao.m.martins@oracle.com>,
	jiangkunkun <jiangkunkun@huawei.com>,
	zhukeqian <zhukeqian1@huawei.com>, Linuxarm <linuxarm@huawei.com>
Subject: RE: [PATCH v2 2/4] iommu/io-pgtable-arm: Add read_and_clear_dirty() support
Date: Wed, 24 Apr 2024 08:01:50 +0000	[thread overview]
Message-ID: <6449dc6823944d58b8713683fcc88a0e@huawei.com> (raw)
In-Reply-To: <dd475ac8-ad0f-4aae-8032-1218cafc7590@arm.com>



> -----Original Message-----
> From: Ryan Roberts <ryan.roberts@arm.com>
> Sent: Tuesday, April 23, 2024 4:56 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
> iommu@lists.linux.dev; linux-arm-kernel@lists.infradead.org
> Cc: joro@8bytes.org; jgg@nvidia.com; kevin.tian@intel.com;
> nicolinc@nvidia.com; mshavit@google.com; robin.murphy@arm.com;
> will@kernel.org; joao.m.martins@oracle.com; jiangkunkun
> <jiangkunkun@huawei.com>; zhukeqian <zhukeqian1@huawei.com>;
> Linuxarm <linuxarm@huawei.com>
> Subject: Re: [PATCH v2 2/4] iommu/io-pgtable-arm: Add
> read_and_clear_dirty() support
> 
> On 22/02/2024 09:49, Shameer Kolothum wrote:
> > .read_and_clear_dirty() IOMMU domain op takes care of reading the
> > dirty bits (i.e. PTE has both DBM and AP[2] set) and marshalling into
> > a bitmap of
> 
> Err... a PTE is HW-dirty if DBM is set and AP[2] is *clear*, right? (AP[2] is
> "RDONLY" bit, and is initially set, then the HW clears it on first write. See
> pte_hw_dirty() in linux/arch/arm64/include/asm/pgtable.h).

Oops..yes, the comment here is indeed wrong.

I will update and add something like below to make it clear: 

                                    DBM bit             AP[2]("RDONLY" bit)
1. writeable_clean         1                       1
2. writeable_dirty          1                       0
3. read-only                     0                      1

> 
> > a given page size.
> >
> > While reading the dirty bits we also clear the PTE AP[2] bit to mark
> > it as writable-clean depending on read_and_clear_dirty() flags.
> 
> You would need to *set* AP[2] to mark it clean.
> 
> >
> > Structure it in a way that the IOPTE walker is generic, and so we pass
> > a function pointer over what to do on a per-PTE basis.
> >
> > Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> > Signed-off-by: Shameer Kolothum
> <shameerali.kolothum.thodi@huawei.com>
> > ---
> >  drivers/iommu/io-pgtable-arm.c | 128
> > ++++++++++++++++++++++++++++++++-
> >  1 file changed, 127 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/io-pgtable-arm.c
> > b/drivers/iommu/io-pgtable-arm.c index f7828a7aad41..1ce7b7a3c1e8
> > 100644
> > --- a/drivers/iommu/io-pgtable-arm.c
> > +++ b/drivers/iommu/io-pgtable-arm.c
> > @@ -75,6 +75,7 @@
> >
> >  #define ARM_LPAE_PTE_NSTABLE		(((arm_lpae_iopte)1) << 63)
> >  #define ARM_LPAE_PTE_XN			(((arm_lpae_iopte)3) << 53)
> > +#define ARM_LPAE_PTE_DBM		(((arm_lpae_iopte)1) << 51)
> >  #define ARM_LPAE_PTE_AF			(((arm_lpae_iopte)1) << 10)
> >  #define ARM_LPAE_PTE_SH_NS		(((arm_lpae_iopte)0) << 8)
> >  #define ARM_LPAE_PTE_SH_OS		(((arm_lpae_iopte)2) << 8)
> > @@ -84,7 +85,7 @@
> >
> >  #define ARM_LPAE_PTE_ATTR_LO_MASK	(((arm_lpae_iopte)0x3ff) <<
> 2)
> >  /* Ignore the contiguous bit for block splitting */
> > -#define ARM_LPAE_PTE_ATTR_HI_MASK	(((arm_lpae_iopte)6) << 52)
> > +#define ARM_LPAE_PTE_ATTR_HI_MASK	(((arm_lpae_iopte)0xd) <<
> 51)
> 
> Perhaps this is easier to understand:
> 
> #define ARM_LPAE_PTE_ATTR_HI_MASK	(ARM_LPAE_PTE_XN |
> ARM_LPAE_PTE_DBM)

Ok.

> 
> >  #define ARM_LPAE_PTE_ATTR_MASK
> 	(ARM_LPAE_PTE_ATTR_LO_MASK |	\
> >  					 ARM_LPAE_PTE_ATTR_HI_MASK)
> >  /* Software bit for solving coherency races */ @@ -93,6 +94,9 @@
> >  /* Stage-1 PTE */
> >  #define ARM_LPAE_PTE_AP_UNPRIV		(((arm_lpae_iopte)1) << 6)
> >  #define ARM_LPAE_PTE_AP_RDONLY		(((arm_lpae_iopte)2)
> << 6)
> > +#define ARM_LPAE_PTE_AP_RDONLY_BIT	7
> 
> Perhaps:
> 
> #define ARM_LPAE_PTE_AP_RDONLY_BIT	7
> #define ARM_LPAE_PTE_AP_RDONLY		(((arm_lpae_iopte)1) <<
> ARM_LPAE_PTE_AP_RDONLY_BIT)

Ok.

> 
> > +#define ARM_LPAE_PTE_AP_WRITABLE	(ARM_LPAE_PTE_AP_RDONLY
> | \
> > +					 ARM_LPAE_PTE_DBM)
> 
> Perhaps: ARM_LPAE_PTE_AP_WRITABLE_CLEAN ?

Sure. That’s more clear.

 > 
> >  #define ARM_LPAE_PTE_ATTRINDX_SHIFT	2
> >  #define ARM_LPAE_PTE_nG			(((arm_lpae_iopte)1) << 11)
> >
> > @@ -729,6 +733,127 @@ static phys_addr_t
> arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
> >  	return iopte_to_paddr(pte, data) | iova;  }
> >
> > +struct arm_lpae_iopte_read_dirty {
> > +	unsigned long flags;
> > +	struct iommu_dirty_bitmap *dirty;
> > +};
> > +
> > +typedef int (*io_pgtable_visit_fn_t)(unsigned long iova, size_t size,
> > +				     arm_lpae_iopte *ptep, void *opaque);
> struct
> > +io_pgtable_walk_data {
> > +	const io_pgtable_visit_fn_t	fn;
> > +	void				*opaque;> +	u64
> 		addr;
> > +	const u64			end;
> > +};
> > +
> > +static int __arm_lpae_iopte_walk(struct arm_lpae_io_pgtable *data,
> > +				 struct io_pgtable_walk_data *walk_data,
> > +				 arm_lpae_iopte *ptep,
> > +				 int lvl);
> > +
> > +static int __arm_lpae_read_and_clear_dirty(unsigned long iova, size_t
> size,
> > +					   arm_lpae_iopte *ptep, void
> *opaque) {
> > +	struct arm_lpae_iopte_read_dirty *arg = opaque;
> > +	struct iommu_dirty_bitmap *dirty = arg->dirty;
> > +	arm_lpae_iopte pte;
> > +
> > +	pte = READ_ONCE(*ptep);
> > +	if (WARN_ON(!pte))
> > +		return -EINVAL;
> 
> You've already read and checked its not zero in io_pgtable_visit(). Either the
> walker is considered generic and probably shouldn't care if the pte is 0, (so
> just do check here). Or the walker is specific for this use case, in which case
> there is no need for the function pointer callback inefficiencies (and you've
> already READ_ONCE() the ptep so no need to repeat. (Multiple ptep_get()s
> does make a meaningful performance impact in the core-mm).

Yes, that check is a duplication. Will remove that. I do have plans to support block
page split/merge functionality during migration start. And with that in mind tried
to make it generic with callback ptr. May be it is not worth at this time and I will
revisit it when we add split/merge functionality.

 > > +
> > +	if ((pte & ARM_LPAE_PTE_AP_WRITABLE) ==
> ARM_LPAE_PTE_AP_WRITABLE)
> > +		return 0;
> 
> What about RO ptes? This check only checks that it is writable-clean. So you
> have 2 remaining options; writable-dirty and read-only. Suggest:

Ok. Got it.

> 
> if (pte_hw_dirty(pte)) {
> 	iommu_dirty_bitmap_record(dirty, iova, size);
> 	if (!(arg->flags & IOMMU_DIRTY_NO_CLEAR))
> 		set_bit(ARM_LPAE_PTE_AP_RDONLY_BIT, (unsigned long
> *)ptep); }
> 
> > +
> > +	iommu_dirty_bitmap_record(dirty, iova, size);
> > +	if (!(arg->flags & IOMMU_DIRTY_NO_CLEAR))
> > +		set_bit(ARM_LPAE_PTE_AP_RDONLY_BIT, (unsigned long
> *)ptep);
> > +	return 0;
> > +}
> > +
> > +static int io_pgtable_visit(struct arm_lpae_io_pgtable *data,
> > +			    struct io_pgtable_walk_data *walk_data,
> > +			    arm_lpae_iopte *ptep, int lvl) {
> > +	struct io_pgtable *iop = &data->iop;
> > +	arm_lpae_iopte pte = READ_ONCE(*ptep);
> > +
> > +	if (WARN_ON(!pte))
> > +		return -EINVAL;
> > +
> > +	if (iopte_leaf(pte, lvl, iop->fmt)) {
> > +		size_t size = ARM_LPAE_BLOCK_SIZE(lvl, data);
> > +
> > +		walk_data->fn(walk_data->addr, size, ptep, walk_data-
> >opaque);
> > +		walk_data->addr += size;
> > +		return 0;
> > +	}
> > +
> > +	ptep = iopte_deref(pte, data);
> > +	return __arm_lpae_iopte_walk(data, walk_data, ptep, lvl + 1); }
> > +
> > +static int __arm_lpae_iopte_walk(struct arm_lpae_io_pgtable *data,
> > +				 struct io_pgtable_walk_data *walk_data,
> > +				 arm_lpae_iopte *ptep,
> > +				 int lvl)
> > +{
> > +	u32 idx;
> > +	int max_entries, ret;
> > +
> > +	if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
> > +		return -EINVAL;
> > +
> > +	if (lvl == data->start_level)
> > +		max_entries = ARM_LPAE_PGD_SIZE(data) /
> sizeof(arm_lpae_iopte);
> > +	else
> > +		max_entries = ARM_LPAE_PTES_PER_TABLE(data);
> > +
> > +	for (idx = ARM_LPAE_LVL_IDX(walk_data->addr, lvl, data);
> > +	     (idx < max_entries) && (walk_data->addr < walk_data->end);
> ++idx) {
> > +		arm_lpae_iopte *pteref = &ptep[idx];
> 
> nit: Personally I would get rid of this and just pass `ptep + idx` to
> io_pgtable_visit()

Ack.

> > +
> > +		ret = io_pgtable_visit(data, walk_data, pteref, lvl);
> > +		if (ret)
> > +			return ret;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static int arm_lpae_read_and_clear_dirty(struct io_pgtable_ops *ops,
> > +					 unsigned long iova, size_t size,
> > +					 unsigned long flags,
> > +					 struct iommu_dirty_bitmap *dirty) {
> > +	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
> > +	struct io_pgtable_cfg *cfg = &data->iop.cfg;
> > +	struct arm_lpae_iopte_read_dirty arg = {
> > +		.flags = flags, .dirty = dirty,
> > +	};
> > +	struct io_pgtable_walk_data walk_data = {
> > +		.fn = __arm_lpae_read_and_clear_dirty,
> > +		.opaque = &arg,
> 
> Do you have plans to reuse the walker? If not, then I wonder if separating
> the argument and callback function are valuable? It will certainly be more
> efficient without the per-pte indirect call.

Ok. I will remove indirection for now and may revisit it when we add split/merge
functionality.

> 
> > +		.addr = iova,
> > +		.end = iova + size - 1,
> 
> Bug?: You are initializing end to be inclusive (i.e. last valid address in range).
> But __arm_lpae_iopte_walk() is using it as exclusive (i.e. first address after
> range) - see "walk_data->addr < walk_data->end".
> 
> mm code usually treats end as exclusive, so suggest removing the "- 1".

I will double check this.

> 
> > +	};
> > +	arm_lpae_iopte *ptep = data->pgd;
> > +	int lvl = data->start_level;
> > +	long iaext = (s64)iova >> cfg->ias;
> 
> Why cast this to signed? And why is the cast s64 and the result long? Suggest
> they should both be consistent at least. But probably clearer to just do:
> 
> WARN_ON((iova + size - 1) & (BIT(cfg->ias) - 1))
> 
> in the below if()? That way you are checking the full range too.

Ok.

Thanks,
Shameer
 
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2024-04-24  8:02 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-22  9:49 [PATCH v2 0/4] iommu/smmuv3: Add IOMMUFD dirty tracking support for SMMUv3 Shameer Kolothum
2024-02-22  9:49 ` [PATCH v2 1/4] iommu/arm-smmu-v3: Add feature detection for HTTU Shameer Kolothum
2024-04-23 14:41   ` Ryan Roberts
2024-04-23 14:52     ` Jason Gunthorpe
2024-04-24 10:04       ` Ryan Roberts
2024-04-24 12:23         ` Jason Gunthorpe
2024-04-24 12:59           ` Ryan Roberts
2024-04-24 13:20           ` Shameerali Kolothum Thodi
2024-04-24 13:32             ` Jason Gunthorpe
2024-04-24 13:43               ` Shameerali Kolothum Thodi
2024-04-24 14:21                 ` Jason Gunthorpe
2024-04-24  8:01     ` Shameerali Kolothum Thodi
2024-04-24  8:28       ` Ryan Roberts
2024-02-22  9:49 ` [PATCH v2 2/4] iommu/io-pgtable-arm: Add read_and_clear_dirty() support Shameer Kolothum
2024-04-23 15:56   ` Ryan Roberts
2024-04-24  8:01     ` Shameerali Kolothum Thodi [this message]
2024-04-24  8:36       ` Ryan Roberts
2024-02-22  9:49 ` [PATCH v2 3/4] iommu/arm-smmu-v3: Add support for dirty tracking in domain alloc Shameer Kolothum
2024-02-22 11:04   ` Joao Martins
2024-02-22 11:31     ` Shameerali Kolothum Thodi
2024-02-22 11:37       ` Joao Martins
2024-02-22 12:24         ` Shameerali Kolothum Thodi
2024-02-22 13:11           ` Jason Gunthorpe
2024-02-22 13:23           ` Joao Martins
2024-03-08 14:31   ` Jason Gunthorpe
2024-04-23 16:27   ` Ryan Roberts
2024-04-23 16:39     ` Jason Gunthorpe
2024-04-23 16:50       ` Ryan Roberts
2024-04-24  8:27     ` Shameerali Kolothum Thodi
2024-02-22  9:49 ` [PATCH v2 4/4] iommu/arm-smmu-v3: Enable HTTU for stage1 with io-pgtable mapping Shameer Kolothum
2024-03-08 14:32   ` Jason Gunthorpe
2024-04-23 16:45   ` Ryan Roberts
2024-04-23 17:32     ` Jason Gunthorpe
2024-04-24  7:58       ` Ryan Roberts
2024-04-24 12:15         ` Jason Gunthorpe
2024-04-24 12:45           ` Ryan Roberts

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6449dc6823944d58b8713683fcc88a0e@huawei.com \
    --to=shameerali.kolothum.thodi@huawei.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=jiangkunkun@huawei.com \
    --cc=joao.m.martins@oracle.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linuxarm@huawei.com \
    --cc=mshavit@google.com \
    --cc=nicolinc@nvidia.com \
    --cc=robin.murphy@arm.com \
    --cc=ryan.roberts@arm.com \
    --cc=will@kernel.org \
    --cc=zhukeqian1@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).