Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 1/7] [RFC] ARM: remove Intel iop33x and iop13xx support
From: Aaro Koskinen @ 2019-08-16 15:42 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Vinod Koul, Peter Teichmann, Arnd Bergmann, Bartosz Golaszewski,
	Linux Kernel Mailing List, Russell King, open list:GPIO SUBSYSTEM,
	soc, linux-i2c, dmaengine, Dan Williams, Martin Michlmayr,
	Linux ARM
In-Reply-To: <CACRpkdao8LF8g5qi_h+9BT9cHwmB4OadabkdGfP0sEFeLbmiLw@mail.gmail.com>

Hi,

On Wed, Aug 14, 2019 at 10:36:01AM +0200, Linus Walleij wrote:
> On Mon, Aug 12, 2019 at 11:45 AM Martin Michlmayr <tbm@cyrius.com> wrote:
> > As Arnd points out, Debian used to have support for various iop32x
> > devices.  While Debian hasn't supported iop32x in a number of years,
> > these devices are still usable and in use (RMK being a prime example).
> 
> I suppose it could be a good idea to add support for iop32x to
> OpenWrt and/or OpenEmbedded, both of which support some
> pretty constrained systems.

This platform is not really too constrained... E.g. on N2100 you have
512 MB RAM, SATA disks and gigabit ethernet. Not that different from
mvebu that Debian currently (?) supports. Maybe with multiplatform they
could support iop32x again.

A.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* RE: [PATCH 01/10] PCI: designware-ep: Add multiple PFs support for DWC
From: Xiaowei Bao @ 2019-08-16 15:11 UTC (permalink / raw)
  To: Andrew Murray
  Cc: mark.rutland@arm.com, Roy Zang, lorenzo.pieralisi@arm.com,
	arnd@arndb.de, gregkh@linuxfoundation.org, jingoohan1@gmail.com,
	Z.q. Hou, linuxppc-dev@lists.ozlabs.org,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	kishon@ti.com, M.h. Lian, devicetree@vger.kernel.org,
	gustavo.pimentel@synopsys.com, Leo Li, shawnguo@kernel.org,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <20190816123524.GE14111@e119886-lin.cambridge.arm.com>



> -----Original Message-----
> From: Andrew Murray <andrew.murray@arm.com>
> Sent: 2019年8月16日 20:35
> To: Xiaowei Bao <xiaowei.bao@nxp.com>
> Cc: jingoohan1@gmail.com; gustavo.pimentel@synopsys.com;
> mark.rutland@arm.com; shawnguo@kernel.org; Leo Li
> <leoyang.li@nxp.com>; kishon@ti.com; lorenzo.pieralisi@arm.com;
> arnd@arndb.de; gregkh@linuxfoundation.org; M.h. Lian
> <minghuan.lian@nxp.com>; Roy Zang <roy.zang@nxp.com>;
> linux-pci@vger.kernel.org; devicetree@vger.kernel.org;
> linux-kernel@vger.kernel.org; linux-arm-kernel@lists.infradead.org;
> linuxppc-dev@lists.ozlabs.org; Z.q. Hou <zhiqiang.hou@nxp.com>
> Subject: Re: [PATCH 01/10] PCI: designware-ep: Add multiple PFs support for
> DWC
> 
> On Fri, Aug 16, 2019 at 11:00:01AM +0000, Xiaowei Bao wrote:
> >
> >
> > > -----Original Message-----
> > > From: Andrew Murray <andrew.murray@arm.com>
> > > Sent: 2019年8月16日 17:45
> > > To: Xiaowei Bao <xiaowei.bao@nxp.com>
> > > Cc: jingoohan1@gmail.com; gustavo.pimentel@synopsys.com;
> > > mark.rutland@arm.com; shawnguo@kernel.org; Leo Li
> > > <leoyang.li@nxp.com>; kishon@ti.com; lorenzo.pieralisi@arm.com;
> > > arnd@arndb.de; gregkh@linuxfoundation.org; M.h. Lian
> > > <minghuan.lian@nxp.com>; Roy Zang <roy.zang@nxp.com>;
> > > linux-pci@vger.kernel.org; devicetree@vger.kernel.org;
> > > linux-kernel@vger.kernel.org; linux-arm-kernel@lists.infradead.org;
> > > linuxppc-dev@lists.ozlabs.org; Z.q. Hou <zhiqiang.hou@nxp.com>
> > > Subject: Re: [PATCH 01/10] PCI: designware-ep: Add multiple PFs
> > > support for DWC
> > >
> > > On Fri, Aug 16, 2019 at 02:55:41AM +0000, Xiaowei Bao wrote:
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Andrew Murray <andrew.murray@arm.com>
> > > > > Sent: 2019年8月15日 19:32
> > > > > To: Xiaowei Bao <xiaowei.bao@nxp.com>
> > > > > Cc: jingoohan1@gmail.com; gustavo.pimentel@synopsys.com;
> > > > > bhelgaas@google.com; robh+dt@kernel.org; mark.rutland@arm.com;
> > > > > shawnguo@kernel.org; Leo Li <leoyang.li@nxp.com>; kishon@ti.com;
> > > > > lorenzo.pieralisi@arm.com; arnd@arndb.de;
> > > > > gregkh@linuxfoundation.org; M.h. Lian <minghuan.lian@nxp.com>;
> > > > > Mingkai Hu <mingkai.hu@nxp.com>; Roy Zang <roy.zang@nxp.com>;
> > > > > linux-pci@vger.kernel.org; devicetree@vger.kernel.org;
> > > > > linux-kernel@vger.kernel.org;
> > > > > linux-arm-kernel@lists.infradead.org;
> > > > > linuxppc-dev@lists.ozlabs.org
> > > > > Subject: Re: [PATCH 01/10] PCI: designware-ep: Add multiple PFs
> > > > > support for DWC
> > > > >
> > > > > On Thu, Aug 15, 2019 at 04:37:07PM +0800, Xiaowei Bao wrote:
> > > > > > Add multiple PFs support for DWC, different PF have different
> > > > > > config space, we use pf-offset property which get from the DTS
> > > > > > to access the different pF config space.
> > > > >
> > > > > Thanks for the patch. I haven't seen a cover letter for this
> > > > > series, is there one missing?
> > > > Maybe I miss, I will add you to review next time, thanks a lot for
> > > > your
> > > comments.
> > > > >
> > > > >
> > > > > >
> > > > > > Signed-off-by: Xiaowei Bao <xiaowei.bao@nxp.com>
> > > > > > ---
> > > > > >  drivers/pci/controller/dwc/pcie-designware-ep.c |  97
> > > > > +++++++++++++---------
> > > > > >  drivers/pci/controller/dwc/pcie-designware.c    | 105
> > > > > ++++++++++++++++++++++--
> > > > > >  drivers/pci/controller/dwc/pcie-designware.h    |  10 ++-
> > > > > >  include/linux/pci-epc.h                         |   1 +
> > > > > >  4 files changed, 164 insertions(+), 49 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c
> > > > > > b/drivers/pci/controller/dwc/pcie-designware-ep.c
> > > > > > index 2bf5a35..75e2955 100644
> > > > > > --- a/drivers/pci/controller/dwc/pcie-designware-ep.c
> > > > > > +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
> > > > > > @@ -19,12 +19,14 @@ void dw_pcie_ep_linkup(struct dw_pcie_ep
> > > *ep)
> > > > > >  	pci_epc_linkup(epc);
> > > > > >  }
> > > > > >
> > > > > > -static void __dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum
> > > > > > pci_barno
> > > > > bar,
> > > > > > -				   int flags)
> > > > > > +static void __dw_pcie_ep_reset_bar(struct dw_pcie *pci, u8
> func_no,
> > > > > > +				   enum pci_barno bar, int flags)
> > > > > >  {
> > > > > >  	u32 reg;
> > > > > > +	struct pci_epc *epc = pci->ep.epc;
> > > > > > +	u32 pf_base = func_no * epc->pf_offset;
> > > > > >
> > > > > > -	reg = PCI_BASE_ADDRESS_0 + (4 * bar);
> > > > > > +	reg = pf_base + PCI_BASE_ADDRESS_0 + (4 * bar);
> > > > >
> > > > > I think I'd rather see this arithmetic (and the one for
> > > > > determining
> > > > > pf_base) inside a macro or inline header function. This would
> > > > > make this code more readable and reduce the chances of an error
> > > > > by avoiding
> > > duplication of code.
> > > > >
> > > > > For example look at cdns_pcie_ep_fn_writeb and
> > > > > ROCKCHIP_PCIE_EP_FUNC_BASE for examples of other EP drivers that
> > > > > do this.
> > > > Agree, this looks fine, thanks a lot for your comments, I will use
> > > > this way to access the registers in next version patch.
> > > > >
> > > > >
> > > > > >  	dw_pcie_dbi_ro_wr_en(pci);
> > > > > >  	dw_pcie_writel_dbi2(pci, reg, 0x0);
> > > > > >  	dw_pcie_writel_dbi(pci, reg, 0x0); @@ -37,7 +39,12 @@ static
> > > > > > void __dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum
> > > > > > pci_barno bar,
> > > > > >
> > > > > >  void dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum pci_barno
> > > > > > bar)
> > > {
> > > > > > -	__dw_pcie_ep_reset_bar(pci, bar, 0);
> > > > > > +	u8 func_no, funcs;
> > > > > > +
> > > > > > +	funcs = pci->ep.epc->max_functions;
> > > > > > +
> > > > > > +	for (func_no = 0; func_no < funcs; func_no++)
> > > > > > +		__dw_pcie_ep_reset_bar(pci, func_no, bar, 0);
> > > > > >  }
> > > > > >
> > > > > >  static u8 __dw_pcie_ep_find_next_cap(struct dw_pcie *pci, u8
> > > > > > cap_ptr, @@ -78,28 +85,29 @@ static int
> > > > > > dw_pcie_ep_write_header(struct pci_epc *epc, u8 func_no,  {
> > > > > >  	struct dw_pcie_ep *ep = epc_get_drvdata(epc);
> > > > > >  	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> > > > > > +	u32 pf_base = func_no * epc->pf_offset;
> > > > > >
> > > > > >  	dw_pcie_dbi_ro_wr_en(pci);
> > > > > > -	dw_pcie_writew_dbi(pci, PCI_VENDOR_ID, hdr->vendorid);
> > > > > > -	dw_pcie_writew_dbi(pci, PCI_DEVICE_ID, hdr->deviceid);
> > > > > > -	dw_pcie_writeb_dbi(pci, PCI_REVISION_ID, hdr->revid);
> > > > > > -	dw_pcie_writeb_dbi(pci, PCI_CLASS_PROG, hdr->progif_code);
> > > > > > -	dw_pcie_writew_dbi(pci, PCI_CLASS_DEVICE,
> > > > > > +	dw_pcie_writew_dbi(pci, pf_base + PCI_VENDOR_ID,
> > > hdr->vendorid);
> > > > > > +	dw_pcie_writew_dbi(pci, pf_base + PCI_DEVICE_ID,
> hdr->deviceid);
> > > > > > +	dw_pcie_writeb_dbi(pci, pf_base + PCI_REVISION_ID,
> hdr->revid);
> > > > > > +	dw_pcie_writeb_dbi(pci, pf_base + PCI_CLASS_PROG,
> > > hdr->progif_code);
> > > > > > +	dw_pcie_writew_dbi(pci, pf_base + PCI_CLASS_DEVICE,
> > > > > >  			   hdr->subclass_code | hdr->baseclass_code << 8);
> > > > > > -	dw_pcie_writeb_dbi(pci, PCI_CACHE_LINE_SIZE,
> > > > > > +	dw_pcie_writeb_dbi(pci, pf_base + PCI_CACHE_LINE_SIZE,
> > > > > >  			   hdr->cache_line_size);
> > > > > > -	dw_pcie_writew_dbi(pci, PCI_SUBSYSTEM_VENDOR_ID,
> > > > > > +	dw_pcie_writew_dbi(pci, pf_base +
> PCI_SUBSYSTEM_VENDOR_ID,
> > > > > >  			   hdr->subsys_vendor_id);
> > > > > > -	dw_pcie_writew_dbi(pci, PCI_SUBSYSTEM_ID,
> hdr->subsys_id);
> > > > > > -	dw_pcie_writeb_dbi(pci, PCI_INTERRUPT_PIN,
> > > > > > +	dw_pcie_writew_dbi(pci, pf_base + PCI_SUBSYSTEM_ID,
> > > > > hdr->subsys_id);
> > > > > > +	dw_pcie_writeb_dbi(pci, pf_base + PCI_INTERRUPT_PIN,
> > > > > >  			   hdr->interrupt_pin);
> > > > > >  	dw_pcie_dbi_ro_wr_dis(pci);
> > > > > >
> > > > > >  	return 0;
> > > > > >  }
> > > > > >
> > > > > > -static int dw_pcie_ep_inbound_atu(struct dw_pcie_ep *ep, enum
> > > > > pci_barno bar,
> > > > > > -				  dma_addr_t cpu_addr,
> > > > > > +static int dw_pcie_ep_inbound_atu(struct dw_pcie_ep *ep, u8
> > > func_no,
> > > > > > +				  enum pci_barno bar, dma_addr_t cpu_addr,
> > > > > >  				  enum dw_pcie_as_type as_type)  {
> > > > > >  	int ret;
> > > > > > @@ -112,7 +120,7 @@ static int dw_pcie_ep_inbound_atu(struct
> > > > > dw_pcie_ep *ep, enum pci_barno bar,
> > > > > >  		return -EINVAL;
> > > > > >  	}
> > > > > >
> > > > > > -	ret = dw_pcie_prog_inbound_atu(pci, free_win, bar, cpu_addr,
> > > > > > +	ret = dw_pcie_prog_inbound_atu(pci, func_no, free_win, bar,
> > > > > > +cpu_addr,
> > > > > >  				       as_type);
> > > > > >  	if (ret < 0) {
> > > > > >  		dev_err(pci->dev, "Failed to program IB window\n"); @@
> > > -125,7
> > > > > > +133,8 @@ static int dw_pcie_ep_inbound_atu(struct dw_pcie_ep
> > > > > > +*ep,
> > > > > enum pci_barno bar,
> > > > > >  	return 0;
> > > > > >  }
> > > > > >
> > > > > > -static int dw_pcie_ep_outbound_atu(struct dw_pcie_ep *ep,
> > > > > > phys_addr_t phys_addr,
> > > > > > +static int dw_pcie_ep_outbound_atu(struct dw_pcie_ep *ep, u8
> > > func_no,
> > > > > > +				   phys_addr_t phys_addr,
> > > > > >  				   u64 pci_addr, size_t size)  {
> > > > > >  	u32 free_win;
> > > > > > @@ -137,8 +146,8 @@ static int dw_pcie_ep_outbound_atu(struct
> > > > > dw_pcie_ep *ep, phys_addr_t phys_addr,
> > > > > >  		return -EINVAL;
> > > > > >  	}
> > > > > >
> > > > > > -	dw_pcie_prog_outbound_atu(pci, free_win,
> PCIE_ATU_TYPE_MEM,
> > > > > > -				  phys_addr, pci_addr, size);
> > > > > > +	dw_pcie_prog_ep_outbound_atu(pci, func_no, free_win,
> > > > > PCIE_ATU_TYPE_MEM,
> > > > > > +				     phys_addr, pci_addr, size);
> > > > > >
> > > > > >  	set_bit(free_win, ep->ob_window_map);
> > > > > >  	ep->outbound_addr[free_win] = phys_addr; @@ -154,7 +163,7
> > > @@
> > > > > static
> > > > > > void dw_pcie_ep_clear_bar(struct pci_epc *epc, u8 func_no,
> > > > > >  	enum pci_barno bar = epf_bar->barno;
> > > > > >  	u32 atu_index = ep->bar_to_atu[bar];
> > > > > >
> > > > > > -	__dw_pcie_ep_reset_bar(pci, bar, epf_bar->flags);
> > > > > > +	__dw_pcie_ep_reset_bar(pci, func_no, bar, epf_bar->flags);
> > > > > >
> > > > > >  	dw_pcie_disable_atu(pci, atu_index,
> > > DW_PCIE_REGION_INBOUND);
> > > > > >  	clear_bit(atu_index, ep->ib_window_map); @@ -170,14
> +179,16
> > > @@
> > > > > > static int dw_pcie_ep_set_bar(struct pci_epc *epc, u8 func_no,
> > > > > >  	size_t size = epf_bar->size;
> > > > > >  	int flags = epf_bar->flags;
> > > > > >  	enum dw_pcie_as_type as_type;
> > > > > > -	u32 reg = PCI_BASE_ADDRESS_0 + (4 * bar);
> > > > > > +	u32 pf_base = func_no * epc->pf_offset;
> > > > > > +	u32 reg = PCI_BASE_ADDRESS_0 + (4 * bar) + pf_base;
> > > > > >
> > > > > >  	if (!(flags & PCI_BASE_ADDRESS_SPACE))
> > > > > >  		as_type = DW_PCIE_AS_MEM;
> > > > > >  	else
> > > > > >  		as_type = DW_PCIE_AS_IO;
> > > > > >
> > > > > > -	ret = dw_pcie_ep_inbound_atu(ep, bar, epf_bar->phys_addr,
> > > as_type);
> > > > > > +	ret = dw_pcie_ep_inbound_atu(ep, func_no, bar,
> > > > > > +				     epf_bar->phys_addr, as_type);
> > > > > >  	if (ret)
> > > > > >  		return ret;
> > > > > >
> > > > > > @@ -235,7 +246,7 @@ static int dw_pcie_ep_map_addr(struct
> > > > > > pci_epc
> > > > > *epc, u8 func_no,
> > > > > >  	struct dw_pcie_ep *ep = epc_get_drvdata(epc);
> > > > > >  	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> > > > > >
> > > > > > -	ret = dw_pcie_ep_outbound_atu(ep, addr, pci_addr, size);
> > > > > > +	ret = dw_pcie_ep_outbound_atu(ep, func_no, addr, pci_addr,
> > > > > > +size);
> > > > > >  	if (ret) {
> > > > > >  		dev_err(pci->dev, "Failed to enable address\n");
> > > > > >  		return ret;
> > > > > > @@ -248,12 +259,13 @@ static int dw_pcie_ep_get_msi(struct
> > > > > > pci_epc *epc, u8 func_no)  {
> > > > > >  	struct dw_pcie_ep *ep = epc_get_drvdata(epc);
> > > > > >  	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> > > > > > +	u32 pf_base = func_no * epc->pf_offset;
> > > > > >  	u32 val, reg;
> > > > > >
> > > > > >  	if (!ep->msi_cap)
> > > > > >  		return -EINVAL;
> > > > > >
> > > > > > -	reg = ep->msi_cap + PCI_MSI_FLAGS;
> > > > > > +	reg = ep->msi_cap + pf_base + PCI_MSI_FLAGS;
> > > > > >  	val = dw_pcie_readw_dbi(pci, reg);
> > > > > >  	if (!(val & PCI_MSI_FLAGS_ENABLE))
> > > > > >  		return -EINVAL;
> > > > > > @@ -267,12 +279,13 @@ static int dw_pcie_ep_set_msi(struct
> > > > > > pci_epc *epc, u8 func_no, u8 interrupts)  {
> > > > > >  	struct dw_pcie_ep *ep = epc_get_drvdata(epc);
> > > > > >  	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> > > > > > +	u32 pf_base = func_no * epc->pf_offset;
> > > > > >  	u32 val, reg;
> > > > > >
> > > > > >  	if (!ep->msi_cap)
> > > > > >  		return -EINVAL;
> > > > > >
> > > > > > -	reg = ep->msi_cap + PCI_MSI_FLAGS;
> > > > > > +	reg = ep->msi_cap + pf_base + PCI_MSI_FLAGS;
> > > > > >  	val = dw_pcie_readw_dbi(pci, reg);
> > > > > >  	val &= ~PCI_MSI_FLAGS_QMASK;
> > > > > >  	val |= (interrupts << 1) & PCI_MSI_FLAGS_QMASK; @@
> -287,12
> > > > > > +300,13 @@ static int dw_pcie_ep_get_msix(struct pci_epc *epc,
> > > > > > +u8
> > > func_no)  {
> > > > > >  	struct dw_pcie_ep *ep = epc_get_drvdata(epc);
> > > > > >  	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> > > > > > +	u32 pf_base = func_no * epc->pf_offset;
> > > > > >  	u32 val, reg;
> > > > > >
> > > > > >  	if (!ep->msix_cap)
> > > > > >  		return -EINVAL;
> > > > > >
> > > > > > -	reg = ep->msix_cap + PCI_MSIX_FLAGS;
> > > > > > +	reg = ep->msix_cap + pf_base + PCI_MSIX_FLAGS;
> > > > > >  	val = dw_pcie_readw_dbi(pci, reg);
> > > > > >  	if (!(val & PCI_MSIX_FLAGS_ENABLE))
> > > > > >  		return -EINVAL;
> > > > > > @@ -306,12 +320,13 @@ static int dw_pcie_ep_set_msix(struct
> > > > > > pci_epc *epc, u8 func_no, u16 interrupts)  {
> > > > > >  	struct dw_pcie_ep *ep = epc_get_drvdata(epc);
> > > > > >  	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> > > > > > +	u32 pf_base = func_no * epc->pf_offset;
> > > > > >  	u32 val, reg;
> > > > > >
> > > > > >  	if (!ep->msix_cap)
> > > > > >  		return -EINVAL;
> > > > > >
> > > > > > -	reg = ep->msix_cap + PCI_MSIX_FLAGS;
> > > > > > +	reg = ep->msix_cap + pf_base + PCI_MSIX_FLAGS;
> > > > > >  	val = dw_pcie_readw_dbi(pci, reg);
> > > > > >  	val &= ~PCI_MSIX_FLAGS_QSIZE;
> > > > > >  	val |= interrupts;
> > > > > > @@ -400,6 +415,7 @@ int dw_pcie_ep_raise_msi_irq(struct
> > > dw_pcie_ep
> > > > > *ep, u8 func_no,
> > > > > >  	unsigned int aligned_offset;
> > > > > >  	u16 msg_ctrl, msg_data;
> > > > > >  	u32 msg_addr_lower, msg_addr_upper, reg;
> > > > > > +	u32 pf_base = func_no * epc->pf_offset;
> > > > > >  	u64 msg_addr;
> > > > > >  	bool has_upper;
> > > > > >  	int ret;
> > > > > > @@ -408,19 +424,19 @@ int dw_pcie_ep_raise_msi_irq(struct
> > > > > dw_pcie_ep *ep, u8 func_no,
> > > > > >  		return -EINVAL;
> > > > > >
> > > > > >  	/* Raise MSI per the PCI Local Bus Specification Revision 3.0,
> 6.8.1.
> > > */
> > > > > > -	reg = ep->msi_cap + PCI_MSI_FLAGS;
> > > > > > +	reg = ep->msi_cap + pf_base + PCI_MSI_FLAGS;
> > > > > >  	msg_ctrl = dw_pcie_readw_dbi(pci, reg);
> > > > > >  	has_upper = !!(msg_ctrl & PCI_MSI_FLAGS_64BIT);
> > > > > > -	reg = ep->msi_cap + PCI_MSI_ADDRESS_LO;
> > > > > > +	reg = ep->msi_cap + pf_base + PCI_MSI_ADDRESS_LO;
> > > > > >  	msg_addr_lower = dw_pcie_readl_dbi(pci, reg);
> > > > > >  	if (has_upper) {
> > > > > > -		reg = ep->msi_cap + PCI_MSI_ADDRESS_HI;
> > > > > > +		reg = ep->msi_cap + pf_base + PCI_MSI_ADDRESS_HI;
> > > > > >  		msg_addr_upper = dw_pcie_readl_dbi(pci, reg);
> > > > > > -		reg = ep->msi_cap + PCI_MSI_DATA_64;
> > > > > > +		reg = ep->msi_cap + pf_base + PCI_MSI_DATA_64;
> > > > > >  		msg_data = dw_pcie_readw_dbi(pci, reg);
> > > > > >  	} else {
> > > > > >  		msg_addr_upper = 0;
> > > > > > -		reg = ep->msi_cap + PCI_MSI_DATA_32;
> > > > > > +		reg = ep->msi_cap + pf_base + PCI_MSI_DATA_32;
> > > > > >  		msg_data = dw_pcie_readw_dbi(pci, reg);
> > > > > >  	}
> > > > > >  	aligned_offset = msg_addr_lower & (epc->mem->page_size -
> 1);
> > > @@
> > > > > > -439,7 +455,7 @@ int dw_pcie_ep_raise_msi_irq(struct
> > > > > > dw_pcie_ep *ep,
> > > > > > u8 func_no,  }
> > > > > >
> > > > > >  int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8
> func_no,
> > > > > > -			     u16 interrupt_num)
> > > > > > +			      u16 interrupt_num)
> > > > > >  {
> > > > > >  	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> > > > > >  	struct pci_epc *epc = ep->epc; @@ -447,16 +463,17 @@ int
> > > > > > dw_pcie_ep_raise_msix_irq(struct
> > > > > dw_pcie_ep *ep, u8 func_no,
> > > > > >  	u32 bar_addr_upper, bar_addr_lower;
> > > > > >  	u32 msg_addr_upper, msg_addr_lower;
> > > > > >  	u32 reg, msg_data, vec_ctrl;
> > > > > > +	u32 pf_base = func_no * epc->pf_offset;
> > > > > >  	u64 tbl_addr, msg_addr, reg_u64;
> > > > > >  	void __iomem *msix_tbl;
> > > > > >  	int ret;
> > > > > >
> > > > > > -	reg = ep->msix_cap + PCI_MSIX_TABLE;
> > > > > > +	reg = ep->msix_cap + pf_base + PCI_MSIX_TABLE;
> > > > > >  	tbl_offset = dw_pcie_readl_dbi(pci, reg);
> > > > > >  	bir = (tbl_offset & PCI_MSIX_TABLE_BIR);
> > > > > >  	tbl_offset &= PCI_MSIX_TABLE_OFFSET;
> > > > > >
> > > > > > -	reg = PCI_BASE_ADDRESS_0 + (4 * bir);
> > > > > > +	reg = PCI_BASE_ADDRESS_0 + pf_base + (4 * bir);
> > > > > >  	bar_addr_upper = 0;
> > > > > >  	bar_addr_lower = dw_pcie_readl_dbi(pci, reg);
> > > > > >  	reg_u64 = (bar_addr_lower &
> > > PCI_BASE_ADDRESS_MEM_TYPE_MASK);
> > > > > @@
> > > > > > -592,13 +609,17 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
> > > > > >  	ep->epc = epc;
> > > > > >  	epc_set_drvdata(epc, ep);
> > > > > >
> > > > > > -	if (ep->ops->ep_init)
> > > > > > -		ep->ops->ep_init(ep);
> > > > > > -
> > > > > >  	ret = of_property_read_u8(np, "max-functions",
> > > &epc->max_functions);
> > > > > >  	if (ret < 0)
> > > > > >  		epc->max_functions = 1;
> > > > > >
> > > > > > +	ret = of_property_read_u32(np, "pf-offset", &epc->pf_offset);
> > > > > > +	if (ret < 0)
> > > > > > +		epc->pf_offset = 0;
> > > > >
> > > > > Bad things will likely happen if max_functions > 1 and pf-offset isn't
> set.
> > > > > I think the driver should bail in this situation. It would be
> > > > > very easy for someone to misconfigure this.
> > > > Yes, you are right, but if the max-functions have defined in DTS,
> > > > require the pf-offset must define in DTS, I am not sure the
> > > > correct value of pf-offsetfor other platforms, so I think the
> > > > max-functions and
> > > pf-offset should not have the dependence.
> > >
> > > Yes you're correct. I hadn't really thought about this beyond
> > > layerscape. It's also possible that other hardware could support
> > > multiple PFs without relying on an offset and perhaps employ some
> > > other mechanism to access different functions. So whilst this
> > > property can be optional for the majority of dwc controllers - it must be
> set and cannot be zero for layerscape.
> > >
> > > Perhaps inside ls_pcie_ep_init, you can set max_functions to 1 if
> > > pf_offset is
> > > 0 and print a WARN to explain why? (Or ls_pcie_ep_init returns
> > > failure and dw_pcie_ep_init checks it and bails).
> > >
> > > The assumption is being made here that future dw controllers may
> > > also use pf_offset (is this likely?) - otherwise why is this in
> > > pcie-designware-ep.c and not pci-layerscape-ep.c and why is this value
> not just hard-coded for lp?
> >
> > Thanks a lot for your detail comments, this give me a lot of help.
> > Yes, I agree your point, and I will seriously consider a best way to fix this
> potential issue.
> > Based on your experience, how do other platforms implement the multiple
> functions?
> > The DWC core difference the different PF by signal
> "client0_tlp_func_num[(PF_WD-1):0]"
> 
> I don't know, though looking at the kernel drivers suggests that the existing EP
> controllers have a large address space which contains multiple PFs. They are
> accessed via macros (ROCKCHIP_PCIE_EP_FUNC_BASE(fn),
> CDNS_PCIE_EP_FUNC_BASE(fn)). It would be possible, but probably not
> desirable to have a smaller address space (window) and a register that selects
> which function the window refers to. This is why I'm slight nervous of
> assuming that a pf-offset will cover all future dw drivers - I may be wrong.
OK, thanks, maybe other people have good advice. I will use the macro to implement
the multiple function in v2 patch.
> 
> > >
> > >
> > > > even though I didn't define pf-offset when I defined
> > > > max-functions, the pf-offset is 0, the DWC ep driver can continue
> > > > run the progress of INIT but not return, of course, thus the PF1
> > > > will not work, I don't know which
> > > way is better.
> > Hi Andrew,
> > > > >
> > > > >
> > > > > > +
> > > > > > +	if (ep->ops->ep_init)
> > > > > > +		ep->ops->ep_init(ep);
> > > > > > +
> > > > > >  	ret = __pci_epc_mem_init(epc, ep->phys_base, ep->addr_size,
> > > > > >  				 ep->page_size);
> > > > > >  	if (ret < 0) {
> > > > > > diff --git a/drivers/pci/controller/dwc/pcie-designware.c
> > > > > > b/drivers/pci/controller/dwc/pcie-designware.c
> > > > > > index 7d25102..c99cee4 100644
> > > > > > --- a/drivers/pci/controller/dwc/pcie-designware.c
> > > > > > +++ b/drivers/pci/controller/dwc/pcie-designware.c
> > > > > > @@ -158,6 +158,43 @@ static void
> > > > > > dw_pcie_writel_ob_unroll(struct
> > > > > dw_pcie *pci, u32 index, u32 reg,
> > > > > >  	dw_pcie_writel_atu(pci, offset + reg, val);  }
> > > > > >
> > > > > > +static void dw_pcie_prog_ep_outbound_atu_unroll(struct
> > > > > > +dw_pcie *pci, u8
> > > > > func_no,
> > > > > > +						int index, int type,
> > > > > > +						u64 cpu_addr, u64 pci_addr,
> > > > > > +						u32 size)
> > > > > > +{
> > > > > > +	u32 retries, val;
> > > > > > +
> > > > > > +	dw_pcie_writel_ob_unroll(pci, index,
> > > PCIE_ATU_UNR_LOWER_BASE,
> > > > > > +				 lower_32_bits(cpu_addr));
> > > > > > +	dw_pcie_writel_ob_unroll(pci, index,
> > > PCIE_ATU_UNR_UPPER_BASE,
> > > > > > +				 upper_32_bits(cpu_addr));
> > > > > > +	dw_pcie_writel_ob_unroll(pci, index, PCIE_ATU_UNR_LIMIT,
> > > > > > +				 lower_32_bits(cpu_addr + size - 1));
> > > > > > +	dw_pcie_writel_ob_unroll(pci, index,
> > > PCIE_ATU_UNR_LOWER_TARGET,
> > > > > > +				 lower_32_bits(pci_addr));
> > > > > > +	dw_pcie_writel_ob_unroll(pci, index,
> > > PCIE_ATU_UNR_UPPER_TARGET,
> > > > > > +				 upper_32_bits(pci_addr));
> > > > > > +	dw_pcie_writel_ob_unroll(pci, index,
> > > PCIE_ATU_UNR_REGION_CTRL1,
> > > > > > +				 type | PCIE_ATU_FUNC_NUM(func_no));
> > > > >
> > > > > With the exception of this line, the rest of this function is
> > > > > identical to dw_pcie_prog_outbound_atu_unroll.
> > > > Yes, I can integrate the same code, but I think we'd better use
> > > > the different outbound window set function between RC and EP,
> > > > because the RC
> > > don't need the func_num parameter.
> > >
> > >
> > >
> > > > >
> > > > > > +	dw_pcie_writel_ob_unroll(pci, index,
> > > PCIE_ATU_UNR_REGION_CTRL2,
> > > > > > +				 PCIE_ATU_ENABLE);
> > > > > > +
> > > > > > +	/*
> > > > > > +	 * Make sure ATU enable takes effect before any subsequent
> config
> > > > > > +	 * and I/O accesses.
> > > > > > +	 */
> > > > > > +	for (retries = 0; retries < LINK_WAIT_MAX_IATU_RETRIES;
> > > > > > +retries++)
> > > {
> > > > > > +		val = dw_pcie_readl_ob_unroll(pci, index,
> > > > > > +					      PCIE_ATU_UNR_REGION_CTRL2);
> > > > > > +		if (val & PCIE_ATU_ENABLE)
> > > > > > +			return;
> > > > > > +
> > > > > > +		mdelay(LINK_WAIT_IATU);
> > > > > > +	}
> > > > > > +	dev_err(pci->dev, "Outbound iATU is not being enabled\n"); }
> > > > > > +
> > > > > >  static void dw_pcie_prog_outbound_atu_unroll(struct dw_pcie
> > > > > > *pci, int
> > > > > index,
> > > > > >  					     int type, u64 cpu_addr,
> > > > > >  					     u64 pci_addr, u32 size) @@ -194,6
> > > +231,51 @@ static
> > > > > > void
> > > > > dw_pcie_prog_outbound_atu_unroll(struct dw_pcie *pci, int index,
> > > > > >  	dev_err(pci->dev, "Outbound iATU is not being enabled\n");
> > > > > > }
> > > > > >
> > > > > > +void dw_pcie_prog_ep_outbound_atu(struct dw_pcie *pci, u8
> > > > > > +func_no, int
> > > > > index,
> > > > > > +				  int type, u64 cpu_addr, u64 pci_addr,
> > > > > > +				  u32 size)
> > > > > > +{
> > > > > > +	u32 retries, val;
> > > > > > +
> > > > > > +	if (pci->ops->cpu_addr_fixup)
> > > > > > +		cpu_addr = pci->ops->cpu_addr_fixup(pci, cpu_addr);
> > > > > > +
> > > > > > +	if (pci->iatu_unroll_enabled) {
> > > > > > +		dw_pcie_prog_ep_outbound_atu_unroll(pci, func_no,
> index,
> > > type,
> > > > > > +						    cpu_addr, pci_addr, size);
> > > > > > +		return;
> > > > > > +	}
> > > > > > +
> > > > > > +	dw_pcie_writel_dbi(pci, PCIE_ATU_VIEWPORT,
> > > > > > +			   PCIE_ATU_REGION_OUTBOUND | index);
> > > > > > +	dw_pcie_writel_dbi(pci, PCIE_ATU_LOWER_BASE,
> > > > > > +			   lower_32_bits(cpu_addr));
> > > > > > +	dw_pcie_writel_dbi(pci, PCIE_ATU_UPPER_BASE,
> > > > > > +			   upper_32_bits(cpu_addr));
> > > > > > +	dw_pcie_writel_dbi(pci, PCIE_ATU_LIMIT,
> > > > > > +			   lower_32_bits(cpu_addr + size - 1));
> > > > > > +	dw_pcie_writel_dbi(pci, PCIE_ATU_LOWER_TARGET,
> > > > > > +			   lower_32_bits(pci_addr));
> > > > > > +	dw_pcie_writel_dbi(pci, PCIE_ATU_UPPER_TARGET,
> > > > > > +			   upper_32_bits(pci_addr));
> > > > > > +	dw_pcie_writel_dbi(pci, PCIE_ATU_CR1, type |
> > > > > > +			   PCIE_ATU_FUNC_NUM(func_no));
> > > > >
> > > > > The same here, this is identical to dw_pcie_prog_outbound_atu
> > > > > with the exception of this line.
> > > > >
> > > > > Is there a way you can avoid all of this duplicated code?
> > > > As above, I can integrate the same code, but I keep to think the
> > > > different outbound Window set function should be used between RC and
> EP.
> > >
> > > Or, is it possible to keep and use the existing functions, but use
> > > them differently, e.g:
> > >
> > >
> > > @@ -137,8 +146,8 @@ static int dw_pcie_ep_outbound_atu(struct
> > > dw_pcie_ep *ep, phys_addr_t phys_addr,
> > >                 return -EINVAL;
> > >         }
> > >
> > > -       dw_pcie_prog_outbound_atu(pci, free_win,
> PCIE_ATU_TYPE_MEM,
> > > -                                 phys_addr, pci_addr, size);
> > > +       dw_pcie_prog_outbound_atu(pci, free_win,
> > > PCIE_ATU_TYPE_MEM_FUNC(func_no),
> > > +                                    phys_addr, pci_addr, size);
> > >
> > >         set_bit(free_win, ep->ob_window_map);
> > >         ep->outbound_addr[free_win] = phys_addr;
> > >
> > >
> > > Supported with:
> > >
> > > #define PCIE_ATU_TYPE_MEM               0x0
> > > #define PCIE_ATU_TYPE_MEM_FUNC(func_no) (PCIE_ATU_TYPE_MEM |
> > > PCIE_ATU_FUNC_NUM(func_no))
> > >
> > >
> > > This is just a suggestion, but I'm keen to avoid code duplication.
> > Thanks, I have think of a way as follow:
> >
> > This is a good way, but I think PCIE_ATU_TYPE_MEM_FUNC(func_no) will
> > give Someone confused meaning, because PCIE_ATU_TYPE_MEM indicate
> the
> > type of TLP, and the location in the bit[0:3] of register CR1, but the
> > PCIE_ATU_FUNC_NUM is bit[20:24], I have another way:
> > @@ -137,8 +146,8 @@ static int dw_pcie_ep_outbound_atu(struct
> > dw_pcie_ep *ep, phys_addr_t phys_addr,
> >                  return -EINVAL;
> >         }
> >
> > 		dw_pcie_prog_outbound_atu(pci, free_win, PCIE_ATU_TYPE_MEM,
> >                                 phys_addr, pci_addr, size);
> > +		val = dw_pcie_readl_dbi(pci, PCIE_ATU_CR1);
> > +		dw_pcie_writel_dbi(pci, PCIE_ATU_CR1, val |
> > +PCIE_ATU_FUNC_NUM(func_no));
> > or
> > +void dw_pcie_prog_ep_outbound_atu(struct dw_pcie *pci, u8 func_no, int
> index,
> > +                                   int type, u64 cpu_addr, u64
> pci_addr,
> > +                                   u32 size) {
> > +		dw_pcie_prog_outbound_atu(pci, index, type, cpu_addr, pci_addr,
> size);
> > +		val = dw_pcie_readl_dbi(pci, PCIE_ATU_CR1);
> > +		dw_pcie_writel_dbi(pci, PCIE_ATU_CR1, val |
> > +PCIE_ATU_FUNC_NUM(func_no)); }
> >
> > Which do you think is better of these three ways?
> 
> Building upon your idea, how about:
> 
> 
>  @@ -137,8 +146,8 @@ static int dw_pcie_ep_outbound_atu(struct
> dw_pcie_ep *ep, phys_addr_t phys_addr,
>                  return -EINVAL;
>          }
> 
>  -       dw_pcie_prog_outbound_atu(pci, free_win,
> PCIE_ATU_TYPE_MEM,
>  -                                 phys_addr, pci_addr, size);
>  +       dw_pcie_prog_ep_outbound_atu(pci, func_no, free_win,
> PCIE_ATU_TYPE_MEM
>  +                                    phys_addr, pci_addr, size);
> 
>          set_bit(free_win, ep->ob_window_map);
>          ep->outbound_addr[free_win] = phys_addr;
> 
> 
>  +void dw_pcie_prog_ep_outbound_atu(struct dw_pcie *pci, u8 func_no, int
> index,
>  +                                   int type, u64 cpu_addr, u64
> pci_addr,
>  +                                   u32 size)
>  +{
>  +		__dw_pcie_prog_outbound_atu(pci, func_no, index, type, cpu_addr,
> pci_addr, size);
>  +}
>  +
>  +void dw_pcie_prog_outbound_atu(struct dw_pcie *pci, u8 func_no, int
> index,
>  +                                   int type, u64 cpu_addr, u64
> pci_addr,
>  +                                   u32 size)
>  +{
>  +		__dw_pcie_prog_outbound_atu(pci, 0, index, type, cpu_addr,
> pci_addr, size);
>  +}
> 
> In other words dw_pcie_prog_outbound_atu is updated (and renamed) to
> always take a func_no and for host controllers this is always set to zero. Or
> you could follow the approach taken in the cadence drivers for their
> implementation of cdns_pcie_set_outbound_region - this always takes a
> func_no and is used by host controller and endpoint drivers (except they don't
> have the helper wrapper functions above thus exposing fn=0 to host
> controllers).
You're correct, I think this way is better, thanks.
> 
> > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Andrew Murray
> > > > >
> > > > > > +	dw_pcie_writel_dbi(pci, PCIE_ATU_CR2, PCIE_ATU_ENABLE);
> > > > > > +
> > > > > > +	/*
> > > > > > +	 * Make sure ATU enable takes effect before any subsequent
> config
> > > > > > +	 * and I/O accesses.
> > > > > > +	 */
> > > > > > +	for (retries = 0; retries < LINK_WAIT_MAX_IATU_RETRIES;
> > > > > > +retries++)
> > > {
> > > > > > +		val = dw_pcie_readl_dbi(pci, PCIE_ATU_CR2);
> > > > > > +		if (val & PCIE_ATU_ENABLE)
> > > > > > +			return;
> > > > > > +
> > > > > > +		mdelay(LINK_WAIT_IATU);
> > > > > > +	}
> > > > > > +	dev_err(pci->dev, "Outbound iATU is not being enabled\n"); }
> > > > > > +
> > > > > >  void dw_pcie_prog_outbound_atu(struct dw_pcie *pci, int
> > > > > > index, int
> > > type,
> > > > > >  			       u64 cpu_addr, u64 pci_addr, u32 size)
> { @@
> > > -252,8
> > > > > +334,8
> > > > > > @@ static void dw_pcie_writel_ib_unroll(struct dw_pcie *pci,
> > > > > > u32 index,
> > > > > u32 reg,
> > > > > >  	dw_pcie_writel_atu(pci, offset + reg, val);  }
> > > > > >
> > > > > > -static int dw_pcie_prog_inbound_atu_unroll(struct dw_pcie
> > > > > > *pci, int
> > > index,
> > > > > > -					   int bar, u64 cpu_addr,
> > > > > > +static int dw_pcie_prog_inbound_atu_unroll(struct dw_pcie
> > > > > > +*pci,
> > > > > > +u8
> > > > > func_no,
> > > > > > +					   int index, int bar, u64 cpu_addr,
> > > > > >  					   enum dw_pcie_as_type as_type)  {
> > > > > >  	int type;
> > > > > > @@ -275,8 +357,10 @@ static int
> > > > > > dw_pcie_prog_inbound_atu_unroll(struct
> > > > > dw_pcie *pci, int index,
> > > > > >  		return -EINVAL;
> > > > > >  	}
> > > > > >
> > > > > > -	dw_pcie_writel_ib_unroll(pci, index,
> > > PCIE_ATU_UNR_REGION_CTRL1,
> > > > > type);
> > > > > > +	dw_pcie_writel_ib_unroll(pci, index,
> > > PCIE_ATU_UNR_REGION_CTRL1,
> > > > > type |
> > > > > > +				 PCIE_ATU_FUNC_NUM(func_no));
> > > > > >  	dw_pcie_writel_ib_unroll(pci, index,
> > > PCIE_ATU_UNR_REGION_CTRL2,
> > > > > > +				 PCIE_ATU_FUNC_NUM_MATCH_EN |
> > > > > >  				 PCIE_ATU_ENABLE |
> > > > > >  				 PCIE_ATU_BAR_MODE_ENABLE | (bar << 8));
> > > > > >
> > > > > > @@ -297,14 +381,15 @@ static int
> > > > > dw_pcie_prog_inbound_atu_unroll(struct dw_pcie *pci, int index,
> > > > > >  	return -EBUSY;
> > > > > >  }
> > > > > >
> > > > > > -int dw_pcie_prog_inbound_atu(struct dw_pcie *pci, int index, int
> bar,
> > > > > > -			     u64 cpu_addr, enum dw_pcie_as_type
> as_type)
> > > > > > +int dw_pcie_prog_inbound_atu(struct dw_pcie *pci, u8 func_no,
> > > > > > +int
> > > index,
> > > > > > +			     int bar, u64 cpu_addr,
> > > > > > +			     enum dw_pcie_as_type as_type)
> > > > > >  {
> > > > > >  	int type;
> > > > > >  	u32 retries, val;
> > > > > >
> > > > > >  	if (pci->iatu_unroll_enabled)
> > > > > > -		return dw_pcie_prog_inbound_atu_unroll(pci, index, bar,
> > > > > > +		return dw_pcie_prog_inbound_atu_unroll(pci, func_no,
> index,
> > > > > > +bar,
> > > > > >  						       cpu_addr, as_type);
> > > > > >
> > > > > >  	dw_pcie_writel_dbi(pci, PCIE_ATU_VIEWPORT,
> > > > > PCIE_ATU_REGION_INBOUND |
> > > > > > @@ -323,9 +408,11 @@ int dw_pcie_prog_inbound_atu(struct
> > > > > > dw_pcie
> > > > > *pci, int index, int bar,
> > > > > >  		return -EINVAL;
> > > > > >  	}
> > > > > >
> > > > > > -	dw_pcie_writel_dbi(pci, PCIE_ATU_CR1, type);
> > > > > > -	dw_pcie_writel_dbi(pci, PCIE_ATU_CR2, PCIE_ATU_ENABLE
> > > > > > -			   | PCIE_ATU_BAR_MODE_ENABLE | (bar << 8));
> > > > > > +	dw_pcie_writel_dbi(pci, PCIE_ATU_CR1, type |
> > > > > > +			   PCIE_ATU_FUNC_NUM(func_no));
> > > > > > +	dw_pcie_writel_dbi(pci, PCIE_ATU_CR2, PCIE_ATU_ENABLE |
> > > > > > +			   PCIE_ATU_FUNC_NUM_MATCH_EN |
> > > > > > +			   PCIE_ATU_BAR_MODE_ENABLE | (bar << 8));
> > > > > >
> > > > > >  	/*
> > > > > >  	 * Make sure ATU enable takes effect before any subsequent
> > > > > > config diff --git
> > > > > > a/drivers/pci/controller/dwc/pcie-designware.h
> > > > > > b/drivers/pci/controller/dwc/pcie-designware.h
> > > > > > index ffed084..2b291e8 100644
> > > > > > --- a/drivers/pci/controller/dwc/pcie-designware.h
> > > > > > +++ b/drivers/pci/controller/dwc/pcie-designware.h
> > > > > > @@ -71,9 +71,11 @@
> > > > > >  #define PCIE_ATU_TYPE_IO		0x2
> > > > > >  #define PCIE_ATU_TYPE_CFG0		0x4
> > > > > >  #define PCIE_ATU_TYPE_CFG1		0x5
> > > > > > +#define PCIE_ATU_FUNC_NUM(pf)           (pf << 20)
> > > > > >  #define PCIE_ATU_CR2			0x908
> > > > > >  #define PCIE_ATU_ENABLE			BIT(31)
> > > > > >  #define PCIE_ATU_BAR_MODE_ENABLE	BIT(30)
> > > > > > +#define PCIE_ATU_FUNC_NUM_MATCH_EN      BIT(19)
> > > > > >  #define PCIE_ATU_LOWER_BASE		0x90C
> > > > > >  #define PCIE_ATU_UPPER_BASE		0x910
> > > > > >  #define PCIE_ATU_LIMIT			0x914
> > > > > > @@ -265,8 +267,12 @@ int dw_pcie_wait_for_link(struct dw_pcie
> > > > > > *pci); void dw_pcie_prog_outbound_atu(struct dw_pcie *pci, int
> index,
> > > > > >  			       int type, u64 cpu_addr, u64 pci_addr,
> > > > > >  			       u32 size);
> > > > > > -int dw_pcie_prog_inbound_atu(struct dw_pcie *pci, int index, int
> bar,
> > > > > > -			     u64 cpu_addr, enum dw_pcie_as_type
> as_type);
> > > > > > +void dw_pcie_prog_ep_outbound_atu(struct dw_pcie *pci, u8
> > > > > > +func_no, int
> > > > > index,
> > > > > > +				  int type, u64 cpu_addr, u64 pci_addr,
> > > > > > +				  u32 size);
> > > > > > +int dw_pcie_prog_inbound_atu(struct dw_pcie *pci, u8 func_no,
> > > > > > +int
> > > index,
> > > > > > +			     int bar, u64 cpu_addr,
> > > > > > +			     enum dw_pcie_as_type as_type);
> > > > > >  void dw_pcie_disable_atu(struct dw_pcie *pci, int index,
> > > > > >  			 enum dw_pcie_region_type type);  void
> > > dw_pcie_setup(struct
> > > > > > dw_pcie *pci); diff --git a/include/linux/pci-epc.h
> > > > > > b/include/linux/pci-epc.h index f641bad..fc2feee 100644
> > > > > > --- a/include/linux/pci-epc.h
> > > > > > +++ b/include/linux/pci-epc.h
> > > > > > @@ -96,6 +96,7 @@ struct pci_epc {
> > > > > >  	const struct pci_epc_ops	*ops;
> > > > > >  	struct pci_epc_mem		*mem;
> > > > > >  	u8				max_functions;
> > > > > > +	u32				pf_offset;
> > >
> > > Also pf_offset is an implementation detail needed only by the driver
> > > to calculate where the PF is - it doesn't seem right that we share
> > > this with the EP controller framework (whereas max_functions is used
> > > as a bounds check for func_no in the framework calls).
> > >
> > > I'd suggest that pf_offset is moved to a dwc structure, perhaps
> dw_pcie_ep?
> > I add the variable to this struct is consider that all PF is belong to
> > a PCIe controller, and the pci_epc indicate a PCIe controller, so I
> > add this variable to this struct, what do you think about this? I am not sure
> whether I should add this variable to dw_pcie_ep.
> 
> The EPC framework won't use the pf_offset and doesn't need it. It abstracts
> the complexity of writing to the config address space (and similar) through the
> pci_epc_ops. I'd suggest that the EPC framework (and pci_epc struct) only
> needs to contain what *it* needs. Especially given that not all ep drivers have
> a pf_offset or similar.
> 
> I understand the logic that pci_epc represents a EP controller, but I think you
> should consider that it actually represents a *generic* EP controller in the
> context of a framework which solely serves the purpose of connecting
> controllers with functions. Whereas the dw_pcie_ep represents a specific
> type of controller (DW) - as the pf_offset is (so far) relating to only DW
> controllers (and as confirmed by the DT mapping) then it makes sense to not
> move pf_offset from the specialised specific controller to the generic
> controller. (Or at least this is how I rationalise it, though the EPC framework is
> something quite unfamiliar to me).
I think this is more reasonable by your detail explaining, I will move pf_offset
to the struct dw_pcie_ep, thanks again!
> 
> Thanks,
> 
> Andrew Murray
> 
> > >
> > > Thanks,
> > >
> > > Andrew Murray
> > >
> > > > > >  	struct config_group		*group;
> > > > > >  	/* spinlock to protect against concurrent access of EP
> controller */
> > > > > >  	spinlock_t			lock;
> > > > > > --
> > > > > > 2.9.5
> > > > > >
> > > > > >
> > > > > > _______________________________________________
> > > > > > linux-arm-kernel mailing list
> > > > > > linux-arm-kernel@lists.infradead.org
> > > > > > http://lists
> > > > > > .infradead.org%2Fmailman%2Flistinfo%2Flinux-arm-kernel&amp;dat
> > > > > > a=0
> > > 2
> > > > > > %
> > > > > 7C0
> > > > > >
> > > > >
> > >
> 1%7Cxiaowei.bao%40nxp.com%7C0e39168f6f144db6840308d721742040%7
> > > > > C686ea1d
> > > > > >
> > > > >
> > >
> 3bc2b4c6fa92cd99c5c301635%7C0%7C1%7C637014654998524452&amp;sd
> > > > > ata=bP7eh
> > > > > > cjlGXCMVFE2b4f12Q6fGV7lQ%2F5i9qIi9FoPlbI%3D&amp;reserved=0
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH] arm64: do_csum: implement accelerated scalar version
From: Robin Murphy @ 2019-08-16 14:55 UTC (permalink / raw)
  To: Shaokun Zhang, Will Deacon
  Cc: Ard Biesheuvel, steve.capper, netdev, ilias.apalodimas,
	huanglingyan (A), linux-arm-kernel
In-Reply-To: <37fbc2a3-069d-9f75-f3d0-3eda2efa5c9b@hisilicon.com>

On 16/08/2019 09:15, Shaokun Zhang wrote:
> Hi Will,
> 
> On 2019/8/16 0:46, Will Deacon wrote:
>> On Thu, May 16, 2019 at 11:14:35AM +0800, Zhangshaokun wrote:
>>> On 2019/5/15 17:47, Will Deacon wrote:
>>>> On Mon, Apr 15, 2019 at 07:18:22PM +0100, Robin Murphy wrote:
>>>>> On 12/04/2019 10:52, Will Deacon wrote:
>>>>>> I'm waiting for Robin to come back with numbers for a C implementation.
>>>>>>
>>>>>> Robin -- did you get anywhere with that?
>>>>>
>>>>> Still not what I would call finished, but where I've got so far (besides an
>>>>> increasingly elaborate test rig) is as below - it still wants some unrolling
>>>>> in the middle to really fly (and actual testing on BE), but the worst-case
>>>>> performance already equals or just beats this asm version on Cortex-A53 with
>>>>> GCC 7 (by virtue of being alignment-insensitive and branchless except for
>>>>> the loop). Unfortunately, the advantage of C code being instrumentable does
>>>>> also come around to bite me...
>>>>
>>>> Is there any interest from anybody in spinning a proper patch out of this?
>>>> Shaokun?
>>>
>>> HiSilicon's Kunpeng920(Hi1620) benefits from do_csum optimization, if Ard and
>>> Robin are ok, Lingyan or I can try to do it.
>>> Of course, if any guy posts the patch, we are happy to test it.
>>> Any will be ok.
>>
>> I don't mind who posts it, but Robin is super busy with SMMU stuff at the
>> moment so it probably makes more sense for you or Lingyan to do it.
> 
> Thanks for restarting this topic, I or Lingyan will do it soon.

FWIW, I've rolled up what I had so far and dumped it up into a quick 
semi-realistic patch here:

http://linux-arm.org/git?p=linux-rm.git;a=commitdiff;h=859c5566510c32ae72039aa5072e932a771a3596

So far I'd put most of the effort into the aforementioned benchmarking 
harness to compare performance and correctness for all the proposed 
implementations over all reasonable alignment/length combinations - I 
think that got pretty much finished, but as Will says I'm unlikely to 
find time to properly look at this again for several weeks.

Robin.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH] ARM64: dts: allwinner: Add devicetree for pine H64 modelA evaluation board
From: Corentin Labbe @ 2019-08-16 14:00 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: mark.rutland, devicetree, linux-sunxi, linux-kernel, wens,
	robh+dt, linux-arm-kernel
In-Reply-To: <20190816135206.pnf3iperzyhcbg4h@flea>

On Fri, Aug 16, 2019 at 03:52:06PM +0200, Maxime Ripard wrote:
> On Fri, Aug 16, 2019 at 01:57:50PM +0200, Corentin Labbe wrote:
> > On Fri, Aug 16, 2019 at 01:36:50PM +0200, Maxime Ripard wrote:
> > > On Fri, Aug 16, 2019 at 11:35:13AM +0200, Corentin Labbe wrote:
> > > > On Wed, Aug 14, 2019 at 03:33:22PM +0200, Maxime Ripard wrote:
> > > > > On Wed, Aug 14, 2019 at 03:17:41PM +0200, Corentin Labbe wrote:
> > > > > > On Mon, Aug 12, 2019 at 11:40:00AM +0200, Maxime Ripard wrote:
> > > > > > > On Thu, Aug 08, 2019 at 10:42:53AM +0200, Corentin Labbe wrote:
> > > > > > > > This patch adds the evaluation variant of the model A of the PineH64.
> > > > > > > > The model A has the same size of the pine64 and has a PCIE slot.
> > > > > > > >
> > > > > > > > The only devicetree difference with current pineH64, is the PHY
> > > > > > > > regulator.
> > > > > > > >
> > > > > > > > Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
> > > > > > > > ---
> > > > > > > >  arch/arm64/boot/dts/allwinner/Makefile        |  1 +
> > > > > > > >  .../sun50i-h6-pine-h64-modelA-eval.dts        | 26 +++++++++++++++++++
> > > > > > > >  2 files changed, 27 insertions(+)
> > > > > > > >  create mode 100644 arch/arm64/boot/dts/allwinner/sun50i-h6-pine-h64-modelA-eval.dts
> > > > > > > >
> > > > > > > > diff --git a/arch/arm64/boot/dts/allwinner/Makefile b/arch/arm64/boot/dts/allwinner/Makefile
> > > > > > > > index f6db0611cb85..9a02166cbf72 100644
> > > > > > > > --- a/arch/arm64/boot/dts/allwinner/Makefile
> > > > > > > > +++ b/arch/arm64/boot/dts/allwinner/Makefile
> > > > > > > > @@ -25,3 +25,4 @@ dtb-$(CONFIG_ARCH_SUNXI) += sun50i-h6-orangepi-3.dtb
> > > > > > > >  dtb-$(CONFIG_ARCH_SUNXI) += sun50i-h6-orangepi-lite2.dtb
> > > > > > > >  dtb-$(CONFIG_ARCH_SUNXI) += sun50i-h6-orangepi-one-plus.dtb
> > > > > > > >  dtb-$(CONFIG_ARCH_SUNXI) += sun50i-h6-pine-h64.dtb
> > > > > > > > +dtb-$(CONFIG_ARCH_SUNXI) += sun50i-h6-pine-h64-modelA-eval.dtb
> > > > > > > > diff --git a/arch/arm64/boot/dts/allwinner/sun50i-h6-pine-h64-modelA-eval.dts b/arch/arm64/boot/dts/allwinner/sun50i-h6-pine-h64-modelA-eval.dts
> > > > > > > > new file mode 100644
> > > > > > > > index 000000000000..d8ff02747efe
> > > > > > > > --- /dev/null
> > > > > > > > +++ b/arch/arm64/boot/dts/allwinner/sun50i-h6-pine-h64-modelA-eval.dts
> > > > > > > > @@ -0,0 +1,26 @@
> > > > > > > > +// SPDX-License-Identifier: (GPL-2.0+ or MIT)
> > > > > > > > +/*
> > > > > > > > + * Copyright (C) 2019 Corentin Labbe <clabbe.montjoie@gmail.com>
> > > > > > > > + */
> > > > > > > > +
> > > > > > > > +#include "sun50i-h6-pine-h64.dts"
> > > > > > > > +
> > > > > > > > +/ {
> > > > > > > > +	model = "Pine H64 model A evaluation board";
> > > > > > > > +	compatible = "pine64,pine-h64-modelA-eval", "allwinner,sun50i-h6";
> > > > > > > > +
> > > > > > > > +	reg_gmac_3v3: gmac-3v3 {
> > > > > > > > +		compatible = "regulator-fixed";
> > > > > > > > +		regulator-name = "vcc-gmac-3v3";
> > > > > > > > +		regulator-min-microvolt = <3300000>;
> > > > > > > > +		regulator-max-microvolt = <3300000>;
> > > > > > > > +		startup-delay-us = <100000>;
> > > > > > > > +		gpio = <&pio 2 16 GPIO_ACTIVE_HIGH>;
> > > > > > > > +		enable-active-high;
> > > > > > > > +	};
> > > > > > > > +
> > > > > > > > +};
> > > > > > > > +
> > > > > > > > +&emac {
> > > > > > > > +	phy-supply = <&reg_gmac_3v3>;
> > > > > > > > +};
> > > > > > >
> > > > > > > I might be missing some context here, but I'm pretty sure that the
> > > > > > > initial intent of the pine h64 DTS was to support the model A all
> > > > > > > along.
> > > > > > >
> > > > > >
> > > > > > The regulator changed between modelA and B.
> > > > > > See this old patchset (supporting modelA) https://patchwork.kernel.org/patch/10539149/ for example.
> > > > >
> > > > > I'm not sure what your point is, but mine is that everything about the
> > > > > model A should be in sun50i-h6-pine-h64.dts.
> > > > >
> > > >
> > > > model A and B are different enough for distinct dtb, (see sub-thread
> > > > on HDMI difference for an other difference than PHY regulator)
> > >
> > > I don't mind having separate DTBs for model A and model B.
> > >
> > > > And clearly, the current dtb is for model B.
> > >
> > > That DTS was added almost a year before the model B was announced, and
> > > no commit to that file mention the model B, so it's definitely not
> > > clear.
> >
> > Normal it was added for model A (without any ethernet/HDMI support,
> > so nothing distinct from model B), and the modelB ethernet/HDMI
> > support cames after.
> 
> Changing the board a DT is meant to halfway through the development is
> definitely not ok.
> 
> > > > So do you mean that we need to create a new dtb for model B ? (and
> > > > hack the current back to model A ?)
> > >
> > > I'd prefer not to hack anything, but yes
> > >
> >
> > Since model A is not public (only evaluations boards exists), the
> > probability of a production model A is low and the current dtb is
> > perfect for model B , could you reconsider this ?
> 
> I mean, you could buy it, so it's definitely public.

Where ? official pineh64 site speaks only of modelB.

> 
> Model A also had HDMI, and it doesn't look like there's anything
> particularly specific with that board.

A subthread just say the opposite, modelA need something more for HDMI
https://lkml.org/lkml/2019/8/12/394

> 
> On the Ethernet side, the only thing that changes is the regulator /
> GPIO being used to enable the PHY?
> 

Yes

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* [PATCH] arm64: ftrace: Ensure module ftrace trampoline is coherent with I-side
From: Will Deacon @ 2019-08-16 13:57 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Catalin Marinas, Will Deacon, James Morse, stable, Ard Biesheuvel

The initial support for dynamic ftrace trampolines in modules made use
of an indirect branch which loaded its target from the beginning of
a special section (e71a4e1bebaf7 ("arm64: ftrace: add support for far
branches to dynamic ftrace")). Since no instructions were being patched,
no cache maintenance was needed. However, later in be0f272bfc83 ("arm64:
ftrace: emit ftrace-mod.o contents through code") this code was reworked
to output the trampoline instructions directly into the PLT entry but,
unfortunately, the necessary cache maintenance was overlooked.

Add a call to __flush_icache_range() after writing the new trampoline
instructions but before patching in the branch to the trampoline.

Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: <stable@vger.kernel.org>
Fixes: be0f272bfc83 ("arm64: ftrace: emit ftrace-mod.o contents through code")
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kernel/ftrace.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c
index 1285c7b2947f..171773257974 100644
--- a/arch/arm64/kernel/ftrace.c
+++ b/arch/arm64/kernel/ftrace.c
@@ -73,7 +73,7 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
 
 	if (offset < -SZ_128M || offset >= SZ_128M) {
 #ifdef CONFIG_ARM64_MODULE_PLTS
-		struct plt_entry trampoline;
+		struct plt_entry trampoline, *dst;
 		struct module *mod;
 
 		/*
@@ -106,23 +106,27 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
 		 * to check if the actual opcodes are in fact identical,
 		 * regardless of the offset in memory so use memcmp() instead.
 		 */
-		trampoline = get_plt_entry(addr, mod->arch.ftrace_trampoline);
-		if (memcmp(mod->arch.ftrace_trampoline, &trampoline,
-			   sizeof(trampoline))) {
-			if (plt_entry_is_initialized(mod->arch.ftrace_trampoline)) {
+		dst = mod->arch.ftrace_trampoline;
+		trampoline = get_plt_entry(addr, dst);
+		if (memcmp(dst, &trampoline, sizeof(trampoline))) {
+			if (plt_entry_is_initialized(dst)) {
 				pr_err("ftrace: far branches to multiple entry points unsupported inside a single module\n");
 				return -EINVAL;
 			}
 
 			/* point the trampoline to our ftrace entry point */
 			module_disable_ro(mod);
-			*mod->arch.ftrace_trampoline = trampoline;
+			*dst = trampoline;
 			module_enable_ro(mod, true);
 
-			/* update trampoline before patching in the branch */
-			smp_wmb();
+			/*
+			 * Ensure updated trampoline is visible to instruction
+			 * fetch before we patch in the branch.
+			 */
+			__flush_icache_range((unsigned long)&dst[0],
+					     (unsigned long)&dst[1]);
 		}
-		addr = (unsigned long)(void *)mod->arch.ftrace_trampoline;
+		addr = (unsigned long)dst;
 #else /* CONFIG_ARM64_MODULE_PLTS */
 		return -EINVAL;
 #endif /* CONFIG_ARM64_MODULE_PLTS */
-- 
2.11.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* Re: [PATCH] ARM64: dts: allwinner: Add devicetree for pine H64 modelA evaluation board
From: Maxime Ripard @ 2019-08-16 13:52 UTC (permalink / raw)
  To: Corentin Labbe
  Cc: mark.rutland, devicetree, linux-sunxi, linux-kernel, wens,
	robh+dt, linux-arm-kernel
In-Reply-To: <20190816115750.GA24545@Red>


[-- Attachment #1.1: Type: text/plain, Size: 5129 bytes --]

On Fri, Aug 16, 2019 at 01:57:50PM +0200, Corentin Labbe wrote:
> On Fri, Aug 16, 2019 at 01:36:50PM +0200, Maxime Ripard wrote:
> > On Fri, Aug 16, 2019 at 11:35:13AM +0200, Corentin Labbe wrote:
> > > On Wed, Aug 14, 2019 at 03:33:22PM +0200, Maxime Ripard wrote:
> > > > On Wed, Aug 14, 2019 at 03:17:41PM +0200, Corentin Labbe wrote:
> > > > > On Mon, Aug 12, 2019 at 11:40:00AM +0200, Maxime Ripard wrote:
> > > > > > On Thu, Aug 08, 2019 at 10:42:53AM +0200, Corentin Labbe wrote:
> > > > > > > This patch adds the evaluation variant of the model A of the PineH64.
> > > > > > > The model A has the same size of the pine64 and has a PCIE slot.
> > > > > > >
> > > > > > > The only devicetree difference with current pineH64, is the PHY
> > > > > > > regulator.
> > > > > > >
> > > > > > > Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
> > > > > > > ---
> > > > > > >  arch/arm64/boot/dts/allwinner/Makefile        |  1 +
> > > > > > >  .../sun50i-h6-pine-h64-modelA-eval.dts        | 26 +++++++++++++++++++
> > > > > > >  2 files changed, 27 insertions(+)
> > > > > > >  create mode 100644 arch/arm64/boot/dts/allwinner/sun50i-h6-pine-h64-modelA-eval.dts
> > > > > > >
> > > > > > > diff --git a/arch/arm64/boot/dts/allwinner/Makefile b/arch/arm64/boot/dts/allwinner/Makefile
> > > > > > > index f6db0611cb85..9a02166cbf72 100644
> > > > > > > --- a/arch/arm64/boot/dts/allwinner/Makefile
> > > > > > > +++ b/arch/arm64/boot/dts/allwinner/Makefile
> > > > > > > @@ -25,3 +25,4 @@ dtb-$(CONFIG_ARCH_SUNXI) += sun50i-h6-orangepi-3.dtb
> > > > > > >  dtb-$(CONFIG_ARCH_SUNXI) += sun50i-h6-orangepi-lite2.dtb
> > > > > > >  dtb-$(CONFIG_ARCH_SUNXI) += sun50i-h6-orangepi-one-plus.dtb
> > > > > > >  dtb-$(CONFIG_ARCH_SUNXI) += sun50i-h6-pine-h64.dtb
> > > > > > > +dtb-$(CONFIG_ARCH_SUNXI) += sun50i-h6-pine-h64-modelA-eval.dtb
> > > > > > > diff --git a/arch/arm64/boot/dts/allwinner/sun50i-h6-pine-h64-modelA-eval.dts b/arch/arm64/boot/dts/allwinner/sun50i-h6-pine-h64-modelA-eval.dts
> > > > > > > new file mode 100644
> > > > > > > index 000000000000..d8ff02747efe
> > > > > > > --- /dev/null
> > > > > > > +++ b/arch/arm64/boot/dts/allwinner/sun50i-h6-pine-h64-modelA-eval.dts
> > > > > > > @@ -0,0 +1,26 @@
> > > > > > > +// SPDX-License-Identifier: (GPL-2.0+ or MIT)
> > > > > > > +/*
> > > > > > > + * Copyright (C) 2019 Corentin Labbe <clabbe.montjoie@gmail.com>
> > > > > > > + */
> > > > > > > +
> > > > > > > +#include "sun50i-h6-pine-h64.dts"
> > > > > > > +
> > > > > > > +/ {
> > > > > > > +	model = "Pine H64 model A evaluation board";
> > > > > > > +	compatible = "pine64,pine-h64-modelA-eval", "allwinner,sun50i-h6";
> > > > > > > +
> > > > > > > +	reg_gmac_3v3: gmac-3v3 {
> > > > > > > +		compatible = "regulator-fixed";
> > > > > > > +		regulator-name = "vcc-gmac-3v3";
> > > > > > > +		regulator-min-microvolt = <3300000>;
> > > > > > > +		regulator-max-microvolt = <3300000>;
> > > > > > > +		startup-delay-us = <100000>;
> > > > > > > +		gpio = <&pio 2 16 GPIO_ACTIVE_HIGH>;
> > > > > > > +		enable-active-high;
> > > > > > > +	};
> > > > > > > +
> > > > > > > +};
> > > > > > > +
> > > > > > > +&emac {
> > > > > > > +	phy-supply = <&reg_gmac_3v3>;
> > > > > > > +};
> > > > > >
> > > > > > I might be missing some context here, but I'm pretty sure that the
> > > > > > initial intent of the pine h64 DTS was to support the model A all
> > > > > > along.
> > > > > >
> > > > >
> > > > > The regulator changed between modelA and B.
> > > > > See this old patchset (supporting modelA) https://patchwork.kernel.org/patch/10539149/ for example.
> > > >
> > > > I'm not sure what your point is, but mine is that everything about the
> > > > model A should be in sun50i-h6-pine-h64.dts.
> > > >
> > >
> > > model A and B are different enough for distinct dtb, (see sub-thread
> > > on HDMI difference for an other difference than PHY regulator)
> >
> > I don't mind having separate DTBs for model A and model B.
> >
> > > And clearly, the current dtb is for model B.
> >
> > That DTS was added almost a year before the model B was announced, and
> > no commit to that file mention the model B, so it's definitely not
> > clear.
>
> Normal it was added for model A (without any ethernet/HDMI support,
> so nothing distinct from model B), and the modelB ethernet/HDMI
> support cames after.

Changing the board a DT is meant to halfway through the development is
definitely not ok.

> > > So do you mean that we need to create a new dtb for model B ? (and
> > > hack the current back to model A ?)
> >
> > I'd prefer not to hack anything, but yes
> >
>
> Since model A is not public (only evaluations boards exists), the
> probability of a production model A is low and the current dtb is
> perfect for model B , could you reconsider this ?

I mean, you could buy it, so it's definitely public.

Model A also had HDMI, and it doesn't look like there's anything
particularly specific with that board.

On the Ethernet side, the only thing that changes is the regulator /
GPIO being used to enable the PHY?

Maxime

--
Maxime Ripard, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH v2] gpio: pl061: Fix the issue failed to register the ACPI interrtupion
From: Andy Shevchenko @ 2019-08-16 13:40 UTC (permalink / raw)
  To: Wei Xu
  Cc: salil.mehta, jinying, tangkunshan, liguozhu, Linus Walleij,
	John Garry, Rafael J. Wysocki, Linux Kernel Mailing List,
	shameerali.kolothum.thodi, Linuxarm, open list:GPIO SUBSYSTEM,
	huangdaode, Jonathan Cameron, shiju.jose, Mika Westerberg,
	zhangyi.ac, linux-arm Mailing List, Len Brown
In-Reply-To: <1565946336-20080-1-git-send-email-xuwei5@hisilicon.com>

On Fri, Aug 16, 2019 at 12:07 PM Wei Xu <xuwei5@hisilicon.com> wrote:
>
> Invoke acpi_gpiochip_request_interrupts after the acpi data has been
> attached to the pl061 acpi node to register interruption.
>
> Otherwise it will be failed to register interruption for the ACPI case.
> Because in the gpiochip_add_data_with_key, acpi_gpiochip_add is invoked
> after gpiochip_add_irqchip but at that time the acpi data has not been
> attached yet.

> 2. cat /proc/interrupts in the guest console:
>
>         estuary:/$ cat /proc/interrupts
>                    CPU0
>         2:         3228     GICv3  27 Level     arch_timer
>         4:           15     GICv3  33 Level     uart-pl011
>         42:           0     GICv3  23 Level     arm-pmu
>         IPI0:         0       Rescheduling interrupts
>         IPI1:         0       Function call interrupts
>         IPI2:         0       CPU stop interrupts
>         IPI3:         0       CPU stop (for crash dump) interrupts
>         IPI4:         0       Timer broadcast interrupts
>         IPI5:         0       IRQ work interrupts
>         IPI6:         0       CPU wake-up interrupts
>         Err:          0
>
> But on QEMU v3.0.0 and Linux kernel v5.2.0-rc7, pl061 interruption is
> there as below:
>
>         estuary:/$ cat /proc/interrupts
>                    CPU0
>           2:       2648     GICv3  27 Level     arch_timer
>           4:         12     GICv3  33 Level     uart-pl011
>          42:          0     GICv3  23 Level     arm-pmu
>          43:          0  ARMH0061:00   3 Edge      ACPI:Event
>         IPI0:         0       Rescheduling interrupts
>         IPI1:         0       Function call interrupts
>         IPI2:         0       CPU stop interrupts
>         IPI3:         0       CPU stop (for crash dump) interrupts
>         IPI4:         0       Timer broadcast interrupts
>         IPI5:         0       IRQ work interrupts
>         IPI6:         0       CPU wake-up interrupts
>         Err:          0

In above show only affected line.

> And the whole dmesg log on Linux kernel v5.2.0-rc7 is as below:

NO!
Please, remove this huge noise!

-- 
With Best Regards,
Andy Shevchenko

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH] arm64/kvm: remove VMID rollover I-cache maintenance
From: James Morse @ 2019-08-16 13:39 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Suzuki K Poulose, Marc Zyngier, Christoffer Dall,
	linux-arm-kernel, kvmarm, Julien Thierry
In-Reply-To: <20190806155737.39307-1-mark.rutland@arm.com>

Hi Mark,

On 06/08/2019 16:57, Mark Rutland wrote:
> For VPIPT I-caches, we need I-cache maintenance on VMID rollover to
> avoid an ABA problem. Consider a single vCPU VM, with a pinned stage-2,
> running with an idmap VA->IPA and idmap IPA->PA. If we don't do
> maintenance on rollover:
> 
>         // VMID A
>         Writes insn X to PA 0xF
>         Invalidates PA 0xF (for VMID A)
> 
>         I$ contains [{A,F}->X]
> 
>         [VMID ROLLOVER]
> 
>         // VMID B
>         Writes insn Y to PA 0xF
>         Invalidates PA 0xF (for VMID B)
> 
>         I$ contains [{A,F}->X, {B,F}->Y]
> 
>         [VMID ROLLOVER]
> 
>         // VMID A
>         I$ contains [{A,F}->X, {B,F}->Y]
> 
>         Unexpectedly hits stale I$ line {A,F}->X.
> 
> However, for PIPT and VIPT I-caches, the VMID doesn't affect lookup or
> constrain maintenance. Given the VMID doesn't affect PIPT and VIPT
> I-caches, and given VMID rollover is independent of changes to stage-2
> mappings, I-cache maintenance cannot be necessary on VMID rollover for
> PIPT or VIPT I-caches.
> 
> This patch removes the maintenance on rollover for VIPT and PIPT
> I-caches. At the same time, the unnecessary colons are removed from the
> asm statement to make it more legible.

Makes sense!

Reviewed-by: James Morse <james.morse@arm.com>


Thanks,

James

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH 03/11] xen/arm: pass one less argument to dma_cache_maint
From: Robin Murphy @ 2019-08-16 13:37 UTC (permalink / raw)
  To: Christoph Hellwig, Stefano Stabellini, Konrad Rzeszutek Wilk
  Cc: xen-devel, iommu, x86, linux-kernel, linux-arm-kernel
In-Reply-To: <20190816130013.31154-4-hch@lst.de>

On 16/08/2019 14:00, Christoph Hellwig wrote:
> Instead of taking apart the dma address in both callers do it inside
> dma_cache_maint itself.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   arch/arm/xen/mm.c | 10 ++++++----
>   1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
> index 90574d89d0d4..d9da24fda2f7 100644
> --- a/arch/arm/xen/mm.c
> +++ b/arch/arm/xen/mm.c
> @@ -43,13 +43,15 @@ static bool hypercall_cflush = false;
>   
>   /* functions called by SWIOTLB */
>   
> -static void dma_cache_maint(dma_addr_t handle, unsigned long offset,
> -	size_t size, enum dma_data_direction dir, enum dma_cache_op op)
> +static void dma_cache_maint(dma_addr_t handle, size_t size,
> +		enum dma_data_direction dir, enum dma_cache_op op)
>   {
>   	struct gnttab_cache_flush cflush;
>   	unsigned long xen_pfn;
> +	unsigned long offset = handle & ~PAGE_MASK;
>   	size_t left = size;
>   
> +	offset &= PAGE_MASK;

Ahem... presumably that should be handle, not offset.

Robin.

>   	xen_pfn = (handle >> XEN_PAGE_SHIFT) + offset / XEN_PAGE_SIZE;
>   	offset %= XEN_PAGE_SIZE;
>   
> @@ -86,13 +88,13 @@ static void dma_cache_maint(dma_addr_t handle, unsigned long offset,
>   static void __xen_dma_page_dev_to_cpu(struct device *hwdev, dma_addr_t handle,
>   		size_t size, enum dma_data_direction dir)
>   {
> -	dma_cache_maint(handle & PAGE_MASK, handle & ~PAGE_MASK, size, dir, DMA_UNMAP);
> +	dma_cache_maint(handle, size, dir, DMA_UNMAP);
>   }
>   
>   static void __xen_dma_page_cpu_to_dev(struct device *hwdev, dma_addr_t handle,
>   		size_t size, enum dma_data_direction dir)
>   {
> -	dma_cache_maint(handle & PAGE_MASK, handle & ~PAGE_MASK, size, dir, DMA_MAP);
> +	dma_cache_maint(handle, size, dir, DMA_MAP);
>   }
>   
>   void __xen_dma_map_page(struct device *hwdev, struct page *page,
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* [PATCH 1/1] arm64: dts: renesas: r8a77995: draak: Fix backlight regulator name
From: Simon Horman @ 2019-08-16 13:33 UTC (permalink / raw)
  To: linux-renesas-soc
  Cc: Simon Horman, Magnus Damm, Geert Uytterhoeven, linux-arm-kernel
In-Reply-To: <cover.1565962268.git.horms+renesas@verge.net.au>

From: Geert Uytterhoeven <geert+renesas@glider.be>

Currently there are two nodes named "regulator1" in the Draak DTS: a
3.3V regulator for the eMMC and the LVDS decoder, and a 12V regulator
for the backlight.  This causes the former to be overwritten by the
latter.

Fix this by renaming all regulators with numerical suffixes to use named
suffixes, which are less likely to conflict.

Fixes: 4fbd4158fe8967e9 ("arm64: dts: renesas: r8a77995: draak: Add backlight")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
---
 arch/arm64/boot/dts/renesas/r8a77995-draak.dts | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/boot/dts/renesas/r8a77995-draak.dts b/arch/arm64/boot/dts/renesas/r8a77995-draak.dts
index 0711170b26b1..3aa2564dfdc2 100644
--- a/arch/arm64/boot/dts/renesas/r8a77995-draak.dts
+++ b/arch/arm64/boot/dts/renesas/r8a77995-draak.dts
@@ -97,7 +97,7 @@
 		reg = <0x0 0x48000000 0x0 0x18000000>;
 	};
 
-	reg_1p8v: regulator0 {
+	reg_1p8v: regulator-1p8v {
 		compatible = "regulator-fixed";
 		regulator-name = "fixed-1.8V";
 		regulator-min-microvolt = <1800000>;
@@ -106,7 +106,7 @@
 		regulator-always-on;
 	};
 
-	reg_3p3v: regulator1 {
+	reg_3p3v: regulator-3p3v {
 		compatible = "regulator-fixed";
 		regulator-name = "fixed-3.3V";
 		regulator-min-microvolt = <3300000>;
@@ -115,7 +115,7 @@
 		regulator-always-on;
 	};
 
-	reg_12p0v: regulator1 {
+	reg_12p0v: regulator-12p0v {
 		compatible = "regulator-fixed";
 		regulator-name = "D12.0V";
 		regulator-min-microvolt = <12000000>;
-- 
2.11.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* [GIT PULL] Renesas ARM Based SoC Fixes for v5.3
From: Simon Horman @ 2019-08-16 13:33 UTC (permalink / raw)
  To: arm
  Cc: Arnd Bergmann, Kevin Hilman, Magnus Damm, linux-renesas-soc,
	Olof Johansson, Simon Horman, linux-arm-kernel

Hi Olof, Hi Kevin, Hi Arnd,

Please consider these Renesas ARM based SoC fixes for v5.3.


The following changes since commit 5f9e832c137075045d15cd6899ab0505cfb2ca4b:

  Linus 5.3-rc1 (2019-07-21 14:05:38 -0700)

are available in the git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas.git tags/renesas-fixes-for-v5.3

for you to fetch changes up to 45f5d5a9e34d3fe4140a9a3b5f7ebe86c252440a:

  arm64: dts: renesas: r8a77995: draak: Fix backlight regulator name (2019-08-09 11:58:17 -0700)

----------------------------------------------------------------
Renesas ARM Based SoC Fixes for v5.3

* R-Car D3 (r8a77995) based Draak Board
  - Correct backlight regulator name in device tree

----------------------------------------------------------------
Geert Uytterhoeven (1):
      arm64: dts: renesas: r8a77995: draak: Fix backlight regulator name

 arch/arm64/boot/dts/renesas/r8a77995-draak.dts | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH v5] perf machine: arm/arm64: Improve completeness for kernel address space
From: Adrian Hunter @ 2019-08-16 13:00 UTC (permalink / raw)
  To: Leo Yan
  Cc: Song Liu, Mathieu Poirier, Daniel Borkmann, Suzuki Poulouse,
	Alexander Shishkin, netdev, coresight, Alexei Starovoitov,
	Arnaldo Carvalho de Melo, linux-kernel, clang-built-linux,
	Peter Zijlstra, Yonghong Song, Namhyung Kim, bpf, Jiri Olsa,
	Martin KaFai Lau, linux-arm-kernel
In-Reply-To: <20190816014541.GA17960@leoy-ThinkPad-X240s>

On 16/08/19 4:45 AM, Leo Yan wrote:
> Hi Adrian,
> 
> On Thu, Aug 15, 2019 at 02:45:57PM +0300, Adrian Hunter wrote:
> 
> [...]
> 
>>>> How come you cannot use kallsyms to get the information?
>>>
>>> Thanks for pointing out this.  Sorry I skipped your comment "I don't
>>> know how you intend to calculate ARM_PRE_START_SIZE" when you reviewed
>>> the patch v3, I should use that chance to elaborate the detailed idea
>>> and so can get more feedback/guidance before procceed.
>>>
>>> Actually, I have considered to use kallsyms when worked on the previous
>>> patch set.
>>>
>>> As mentioned in patch set v4's cover letter, I tried to implement
>>> machine__create_extra_kernel_maps() for arm/arm64, the purpose is to
>>> parse kallsyms so can find more kernel maps and thus also can fixup
>>> the kernel start address.  But I found the 'perf script' tool directly
>>> calls machine__get_kernel_start() instead of running into the flow for
>>> machine__create_extra_kernel_maps();
>>
>> Doesn't it just need to loop through each kernel map to find the lowest
>> start address?
> 
> Based on your suggestion, I worked out below change and verified it
> can work well on arm64 for fixing up start address; please let me know
> if the change works for you?

How does that work if take a perf.data file to a machine with a different
architecture?

> 
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index f6ee7fbad3e4..51d78313dca1 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -2671,9 +2671,26 @@ int machine__nr_cpus_avail(struct machine *machine)
>  	return machine ? perf_env__nr_cpus_avail(machine->env) : 0;
>  }
>  
> +static int machine__fixup_kernel_start(void *arg,
> +				       const char *name __maybe_unused,
> +				       char type,
> +				       u64 start)
> +{
> +	struct machine *machine = arg;
> +
> +	type = toupper(type);
> +
> +	/* Fixup for text, weak, data and bss sections. */
> +	if (type == 'T' || type == 'W' || type == 'D' || type == 'B')
> +		machine->kernel_start = min(machine->kernel_start, start);
> +
> +	return 0;
> +}
> +
>  int machine__get_kernel_start(struct machine *machine)
>  {
>  	struct map *map = machine__kernel_map(machine);
> +	char filename[PATH_MAX];
>  	int err = 0;
>  
>  	/*
> @@ -2687,6 +2704,7 @@ int machine__get_kernel_start(struct machine *machine)
>  	machine->kernel_start = 1ULL << 63;
>  	if (map) {
>  		err = map__load(map);
>  		/*
>  		 * On x86_64, PTI entry trampolines are less than the
>  		 * start of kernel text, but still above 2^63. So leave
> @@ -2695,6 +2713,16 @@ int machine__get_kernel_start(struct machine *machine)
>  		if (!err && !machine__is(machine, "x86_64"))
>  			machine->kernel_start = map->start;
>  	}
> +
> +	machine__get_kallsyms_filename(machine, filename, PATH_MAX);
> +
> +	if (symbol__restricted_filename(filename, "/proc/kallsyms"))
> +		goto out;
> +
> +	if (kallsyms__parse(filename, machine, machine__fixup_kernel_start))
> +		pr_warning("Fail to fixup kernel start address. skipping...\n");
> +
> +out:
>  	return err;
>  }
> 
> Thanks,
> Leo Yan
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* [PATCH 11/11] arm64: use asm-generic/dma-mapping.h
From: Christoph Hellwig @ 2019-08-16 13:00 UTC (permalink / raw)
  To: Stefano Stabellini, Konrad Rzeszutek Wilk
  Cc: xen-devel, iommu, x86, linux-kernel, linux-arm-kernel
In-Reply-To: <20190816130013.31154-1-hch@lst.de>

Now that the Xen special cases are gone nothing worth mentioning is
left in the arm64 <asm/dma-mapping.h> file, so switch to use the
asm-generic version instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/arm64/include/asm/Kbuild        |  1 +
 arch/arm64/include/asm/dma-mapping.h | 22 ----------------------
 arch/arm64/mm/dma-mapping.c          |  1 +
 3 files changed, 2 insertions(+), 22 deletions(-)
 delete mode 100644 arch/arm64/include/asm/dma-mapping.h

diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild
index c52e151afab0..98a5405c8558 100644
--- a/arch/arm64/include/asm/Kbuild
+++ b/arch/arm64/include/asm/Kbuild
@@ -4,6 +4,7 @@ generic-y += delay.h
 generic-y += div64.h
 generic-y += dma.h
 generic-y += dma-contiguous.h
+generic-y += dma-mapping.h
 generic-y += early_ioremap.h
 generic-y += emergency-restart.h
 generic-y += hw_irq.h
diff --git a/arch/arm64/include/asm/dma-mapping.h b/arch/arm64/include/asm/dma-mapping.h
deleted file mode 100644
index 67243255a858..000000000000
--- a/arch/arm64/include/asm/dma-mapping.h
+++ /dev/null
@@ -1,22 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-/*
- * Copyright (C) 2012 ARM Ltd.
- */
-#ifndef __ASM_DMA_MAPPING_H
-#define __ASM_DMA_MAPPING_H
-
-#ifdef __KERNEL__
-
-#include <linux/types.h>
-#include <linux/vmalloc.h>
-
-#include <xen/xen.h>
-#include <asm/xen/hypervisor.h>
-
-static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
-{
-	return NULL;
-}
-
-#endif	/* __KERNEL__ */
-#endif	/* __ASM_DMA_MAPPING_H */
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 4b244a037349..6578abcfbbc7 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -8,6 +8,7 @@
 #include <linux/cache.h>
 #include <linux/dma-noncoherent.h>
 #include <linux/dma-iommu.h>
+#include <xen/xen.h>
 #include <xen/swiotlb-xen.h>
 
 #include <asm/cacheflush.h>
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* [PATCH 10/11] swiotlb-xen: merge xen_unmap_single into xen_swiotlb_unmap_page
From: Christoph Hellwig @ 2019-08-16 13:00 UTC (permalink / raw)
  To: Stefano Stabellini, Konrad Rzeszutek Wilk
  Cc: xen-devel, iommu, x86, linux-kernel, linux-arm-kernel
In-Reply-To: <20190816130013.31154-1-hch@lst.de>

No need for a no-op wrapper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/xen/swiotlb-xen.c | 15 ++++-----------
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index c3c383033ae4..b6b9c4c1b397 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -414,9 +414,8 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
  * After this call, reads by the cpu to the buffer are guaranteed to see
  * whatever the device wrote there.
  */
-static void xen_unmap_single(struct device *hwdev, dma_addr_t dev_addr,
-			     size_t size, enum dma_data_direction dir,
-			     unsigned long attrs)
+static void xen_swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
+		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
 	phys_addr_t paddr = xen_bus_to_phys(dev_addr);
 
@@ -430,13 +429,6 @@ static void xen_unmap_single(struct device *hwdev, dma_addr_t dev_addr,
 		swiotlb_tbl_unmap_single(hwdev, paddr, size, dir, attrs);
 }
 
-static void xen_swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
-			    size_t size, enum dma_data_direction dir,
-			    unsigned long attrs)
-{
-	xen_unmap_single(hwdev, dev_addr, size, dir, attrs);
-}
-
 static void
 xen_swiotlb_sync_single_for_cpu(struct device *dev, dma_addr_t dma_addr,
 		size_t size, enum dma_data_direction dir)
@@ -477,7 +469,8 @@ xen_swiotlb_unmap_sg(struct device *hwdev, struct scatterlist *sgl, int nelems,
 	BUG_ON(dir == DMA_NONE);
 
 	for_each_sg(sgl, sg, nelems, i)
-		xen_unmap_single(hwdev, sg->dma_address, sg_dma_len(sg), dir, attrs);
+		xen_swiotlb_unmap_page(hwdev, sg->dma_address, sg_dma_len(sg),
+				dir, attrs);
 
 }
 
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* [PATCH 09/11] swiotlb-xen: simplify cache maintainance
From: Christoph Hellwig @ 2019-08-16 13:00 UTC (permalink / raw)
  To: Stefano Stabellini, Konrad Rzeszutek Wilk
  Cc: xen-devel, iommu, x86, linux-kernel, linux-arm-kernel
In-Reply-To: <20190816130013.31154-1-hch@lst.de>

Now that we know we always have the dma-noncoherent.h helpers available
if we are on an architecture with support for non-coherent devices,
we can just call them directly, and remove the calls to the dma-direct
routines, including the fact that we call the dma_direct_map_page
routines but ignore the value returned from it.  Instead we now have
Xen wrappers for the arch_sync_dma_for_{device,cpu} helpers that call
the special Xen versions of those routines for foreign pages.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/arm/xen/mm.c           |  47 ++---------------
 drivers/xen/swiotlb-xen.c   |  19 ++++---
 include/xen/page-coherent.h | 100 +++++++++++-------------------------
 3 files changed, 42 insertions(+), 124 deletions(-)

diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
index 85482cdda1e5..0eb88f1355c2 100644
--- a/arch/arm/xen/mm.c
+++ b/arch/arm/xen/mm.c
@@ -86,59 +86,18 @@ static void dma_cache_maint(dma_addr_t handle, size_t size,
 	} while (left);
 }
 
-static void __xen_dma_page_dev_to_cpu(struct device *hwdev, dma_addr_t handle,
-		size_t size, enum dma_data_direction dir)
+void __xen_dma_sync_for_cpu(struct device *dev, dma_addr_t handle, size_t size,
+		enum dma_data_direction dir)
 {
 	dma_cache_maint(handle, size, dir, DMA_UNMAP);
 }
 
-static void __xen_dma_page_cpu_to_dev(struct device *hwdev, dma_addr_t handle,
+void __xen_dma_sync_for_device(struct device *dev, dma_addr_t handle,
 		size_t size, enum dma_data_direction dir)
 {
 	dma_cache_maint(handle, size, dir, DMA_MAP);
 }
 
-void __xen_dma_map_page(struct device *hwdev, struct page *page,
-	     dma_addr_t dev_addr, unsigned long offset, size_t size,
-	     enum dma_data_direction dir, unsigned long attrs)
-{
-	if (dev_is_dma_coherent(hwdev))
-		return;
-	if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
-		return;
-
-	__xen_dma_page_cpu_to_dev(hwdev, dev_addr, size, dir);
-}
-
-void __xen_dma_unmap_page(struct device *hwdev, dma_addr_t handle,
-		size_t size, enum dma_data_direction dir,
-		unsigned long attrs)
-
-{
-	if (dev_is_dma_coherent(hwdev))
-		return;
-	if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
-		return;
-
-	__xen_dma_page_dev_to_cpu(hwdev, handle, size, dir);
-}
-
-void __xen_dma_sync_single_for_cpu(struct device *hwdev,
-		dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-	if (dev_is_dma_coherent(hwdev))
-		return;
-	__xen_dma_page_dev_to_cpu(hwdev, handle, size, dir);
-}
-
-void __xen_dma_sync_single_for_device(struct device *hwdev,
-		dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-	if (dev_is_dma_coherent(hwdev))
-		return;
-	__xen_dma_page_cpu_to_dev(hwdev, handle, size, dir);
-}
-
 bool xen_arch_need_swiotlb(struct device *dev,
 			   phys_addr_t phys,
 			   dma_addr_t dev_addr)
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 7b23929854e7..c3c383033ae4 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -388,6 +388,7 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
 	if (map == (phys_addr_t)DMA_MAPPING_ERROR)
 		return DMA_MAPPING_ERROR;
 
+	phys = map;
 	dev_addr = xen_phys_to_bus(map);
 
 	/*
@@ -399,14 +400,9 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
 		return DMA_MAPPING_ERROR;
 	}
 
-	page = pfn_to_page(map >> PAGE_SHIFT);
-	offset = map & ~PAGE_MASK;
 done:
-	/*
-	 * we are not interested in the dma_addr returned by xen_dma_map_page,
-	 * only in the potential cache flushes executed by the function.
-	 */
-	xen_dma_map_page(dev, page, dev_addr, offset, size, dir, attrs);
+	if (!dev_is_dma_coherent(dev) && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+		xen_dma_sync_for_device(dev, dev_addr, phys, size, dir);
 	return dev_addr;
 }
 
@@ -426,7 +422,8 @@ static void xen_unmap_single(struct device *hwdev, dma_addr_t dev_addr,
 
 	BUG_ON(dir == DMA_NONE);
 
-	xen_dma_unmap_page(hwdev, dev_addr, size, dir, attrs);
+	if (!dev_is_dma_coherent(hwdev) && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+		xen_dma_sync_for_cpu(hwdev, dev_addr, paddr, size, dir);
 
 	/* NOTE: We use dev_addr here, not paddr! */
 	if (is_xen_swiotlb_buffer(dev_addr))
@@ -446,7 +443,8 @@ xen_swiotlb_sync_single_for_cpu(struct device *dev, dma_addr_t dma_addr,
 {
 	phys_addr_t paddr = xen_bus_to_phys(dma_addr);
 
-	xen_dma_sync_single_for_cpu(dev, dma_addr, size, dir);
+	if (!dev_is_dma_coherent(dev))
+		xen_dma_sync_for_cpu(dev, dma_addr, paddr, size, dir);
 
 	if (is_xen_swiotlb_buffer(dma_addr))
 		swiotlb_tbl_sync_single(dev, paddr, size, dir, SYNC_FOR_CPU);
@@ -461,7 +459,8 @@ xen_swiotlb_sync_single_for_device(struct device *dev, dma_addr_t dma_addr,
 	if (is_xen_swiotlb_buffer(dma_addr))
 		swiotlb_tbl_sync_single(dev, paddr, size, dir, SYNC_FOR_DEVICE);
 
-	xen_dma_sync_single_for_device(dev, dma_addr, size, dir);
+	if (!dev_is_dma_coherent(dev))
+		xen_dma_sync_for_device(dev, dma_addr, paddr, size, dir);
 }
 
 /*
diff --git a/include/xen/page-coherent.h b/include/xen/page-coherent.h
index 0f4d468e7a89..38b572ed0879 100644
--- a/include/xen/page-coherent.h
+++ b/include/xen/page-coherent.h
@@ -2,88 +2,48 @@
 #ifndef _XEN_PAGE_COHERENT_H
 #define _XEN_PAGE_COHERENT_H
 
-#include <linux/dma-mapping.h>
-#include <asm/page.h>
+#include <linux/dma-noncoherent.h>
 
 #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
     defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU)
-void __xen_dma_map_page(struct device *hwdev, struct page *page,
-	     dma_addr_t dev_addr, unsigned long offset, size_t size,
-	     enum dma_data_direction dir, unsigned long attrs);
-void __xen_dma_unmap_page(struct device *hwdev, dma_addr_t handle,
-		size_t size, enum dma_data_direction dir,
-		unsigned long attrs);
-void __xen_dma_sync_single_for_cpu(struct device *hwdev,
-		dma_addr_t handle, size_t size, enum dma_data_direction dir);
-void __xen_dma_sync_single_for_device(struct device *hwdev,
-		dma_addr_t handle, size_t size, enum dma_data_direction dir);
-
-static inline void xen_dma_sync_single_for_cpu(struct device *hwdev,
-		dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-	unsigned long pfn = PFN_DOWN(handle);
-
-	if (pfn_valid(pfn))
-		dma_direct_sync_single_for_cpu(hwdev, handle, size, dir);
-	else
-		__xen_dma_sync_single_for_cpu(hwdev, handle, size, dir);
-}
-
-static inline void xen_dma_sync_single_for_device(struct device *hwdev,
-		dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-	unsigned long pfn = PFN_DOWN(handle);
-	if (pfn_valid(pfn))
-		dma_direct_sync_single_for_device(hwdev, handle, size, dir);
-	else
-		__xen_dma_sync_single_for_device(hwdev, handle, size, dir);
-}
-
-static inline void xen_dma_map_page(struct device *hwdev, struct page *page,
-	     dma_addr_t dev_addr, unsigned long offset, size_t size,
-	     enum dma_data_direction dir, unsigned long attrs)
-{
-	unsigned long pfn = PFN_DOWN(dev_addr);
-
-	if (pfn_valid(pfn))
-		dma_direct_map_page(hwdev, page, offset, size, dir, attrs);
+/*
+ * Dom0 is mapped 1:1, while the Linux page can be spanned accross multiple Xen
+ * pages, it's not possible to have a mix of local and foreign Xen page.  Dom0
+ * is mapped 1:1, so calling pfn_valid on a foreign mfn will always return
+ * false.  If the page is local we can safely use the native routines for cache
+ * maintainance, otherwise we call the Xen specific function.
+ */
+void __xen_dma_sync_for_cpu(struct device *dev, dma_addr_t handle, size_t size,
+		enum dma_data_direction dir);
+void __xen_dma_sync_for_device(struct device *dev, dma_addr_t handle,
+		size_t size, enum dma_data_direction dir);
+
+static inline void xen_dma_sync_for_cpu(struct device *dev, dma_addr_t handle,
+		phys_addr_t paddr, size_t size, enum dma_data_direction dir)
+{
+	if (pfn_valid(PFN_DOWN(handle)))
+		arch_sync_dma_for_cpu(dev, paddr, size, dir);
 	else
-		__xen_dma_map_page(hwdev, page, dev_addr, offset, size, dir, attrs);
+		__xen_dma_sync_for_cpu(dev, handle, size, dir);
 }
 
-static inline void xen_dma_unmap_page(struct device *hwdev, dma_addr_t handle,
-		size_t size, enum dma_data_direction dir, unsigned long attrs)
+static inline void xen_dma_sync_for_device(struct device *dev,
+		dma_addr_t handle, phys_addr_t paddr, size_t size,
+		enum dma_data_direction dir)
 {
-	unsigned long pfn = PFN_DOWN(handle);
-	/*
-	 * Dom0 is mapped 1:1, while the Linux page can be spanned accross
-	 * multiple Xen page, it's not possible to have a mix of local and
-	 * foreign Xen page. Dom0 is mapped 1:1, so calling pfn_valid on a
-	 * foreign mfn will always return false. If the page is local we can
-	 * safely call the native dma_ops function, otherwise we call the xen
-	 * specific function.
-	 */
-	if (pfn_valid(pfn))
-		dma_direct_unmap_page(hwdev, handle, size, dir, attrs);
+	if (pfn_valid(PFN_DOWN(handle)))
+		arch_sync_dma_for_device(dev, paddr, size, dir);
 	else
-		__xen_dma_unmap_page(hwdev, handle, size, dir, attrs);
+		__xen_dma_sync_for_device(dev, handle, size, dir);
 }
 #else
-static inline void xen_dma_map_page(struct device *hwdev, struct page *page,
-	     dma_addr_t dev_addr, unsigned long offset, size_t size,
-	     enum dma_data_direction dir, unsigned long attrs)
-{
-}
-static inline void xen_dma_unmap_page(struct device *hwdev, dma_addr_t handle,
-		size_t size, enum dma_data_direction dir, unsigned long attrs)
-{
-}
-static inline void xen_dma_sync_single_for_cpu(struct device *hwdev,
-		dma_addr_t handle, size_t size, enum dma_data_direction dir)
+static inline void xen_dma_sync_for_cpu(struct device *dev, dma_addr_t handle,
+		phys_addr_t paddr, size_t size, enum dma_data_direction dir)
 {
 }
-static inline void xen_dma_sync_single_for_device(struct device *hwdev,
-		dma_addr_t handle, size_t size, enum dma_data_direction dir)
+static inline void xen_dma_sync_for_device(struct device *dev,
+		dma_addr_t handle, phys_addr_t paddr, size_t size,
+		enum dma_data_direction dir)
 {
 }
 #endif
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* [PATCH 08/11] swiotlb-xen: use the same foreign page check everywhere
From: Christoph Hellwig @ 2019-08-16 13:00 UTC (permalink / raw)
  To: Stefano Stabellini, Konrad Rzeszutek Wilk
  Cc: xen-devel, iommu, x86, linux-kernel, linux-arm-kernel
In-Reply-To: <20190816130013.31154-1-hch@lst.de>

xen_dma_map_page uses a different and more complicated check for
foreign pages than the other three cache maintainance helpers.
Switch it to the simpler pfn_vali method a well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/xen/page-coherent.h | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/include/xen/page-coherent.h b/include/xen/page-coherent.h
index 7c32944de051..0f4d468e7a89 100644
--- a/include/xen/page-coherent.h
+++ b/include/xen/page-coherent.h
@@ -43,14 +43,9 @@ static inline void xen_dma_map_page(struct device *hwdev, struct page *page,
 	     dma_addr_t dev_addr, unsigned long offset, size_t size,
 	     enum dma_data_direction dir, unsigned long attrs)
 {
-	unsigned long page_pfn = page_to_xen_pfn(page);
-	unsigned long dev_pfn = XEN_PFN_DOWN(dev_addr);
-	unsigned long compound_pages =
-		(1<<compound_order(page)) * XEN_PFN_PER_PAGE;
-	bool local = (page_pfn <= dev_pfn) &&
-		(dev_pfn - page_pfn < compound_pages);
+	unsigned long pfn = PFN_DOWN(dev_addr);
 
-	if (local)
+	if (pfn_valid(pfn))
 		dma_direct_map_page(hwdev, page, offset, size, dir, attrs);
 	else
 		__xen_dma_map_page(hwdev, page, dev_addr, offset, size, dir, attrs);
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* [PATCH 07/11] swiotlb-xen: provide a single page-coherent.h header
From: Christoph Hellwig @ 2019-08-16 13:00 UTC (permalink / raw)
  To: Stefano Stabellini, Konrad Rzeszutek Wilk
  Cc: xen-devel, iommu, x86, linux-kernel, linux-arm-kernel
In-Reply-To: <20190816130013.31154-1-hch@lst.de>

Merge the various page-coherent.h files into a single one that either
provides prototypes or stubs depending on the need for cache
maintainance.

For extra benefits alo include <xen/page-coherent.h> in the file
actually implementing the interfaces provided.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/arm/include/asm/xen/page-coherent.h   |  2 --
 arch/arm/xen/mm.c                          |  1 +
 arch/arm64/include/asm/xen/page-coherent.h |  2 --
 arch/x86/include/asm/xen/page-coherent.h   | 22 ------------------
 drivers/xen/swiotlb-xen.c                  |  4 +---
 include/Kbuild                             |  2 +-
 include/xen/{arm => }/page-coherent.h      | 27 +++++++++++++++++++---
 7 files changed, 27 insertions(+), 33 deletions(-)
 delete mode 100644 arch/arm/include/asm/xen/page-coherent.h
 delete mode 100644 arch/arm64/include/asm/xen/page-coherent.h
 delete mode 100644 arch/x86/include/asm/xen/page-coherent.h
 rename include/xen/{arm => }/page-coherent.h (76%)

diff --git a/arch/arm/include/asm/xen/page-coherent.h b/arch/arm/include/asm/xen/page-coherent.h
deleted file mode 100644
index 27e984977402..000000000000
--- a/arch/arm/include/asm/xen/page-coherent.h
+++ /dev/null
@@ -1,2 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#include <xen/arm/page-coherent.h>
diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
index a59980f1aa54..85482cdda1e5 100644
--- a/arch/arm/xen/mm.c
+++ b/arch/arm/xen/mm.c
@@ -15,6 +15,7 @@
 #include <xen/interface/grant_table.h>
 #include <xen/interface/memory.h>
 #include <xen/page.h>
+#include <xen/page-coherent.h>
 #include <xen/swiotlb-xen.h>
 
 #include <asm/cacheflush.h>
diff --git a/arch/arm64/include/asm/xen/page-coherent.h b/arch/arm64/include/asm/xen/page-coherent.h
deleted file mode 100644
index 27e984977402..000000000000
--- a/arch/arm64/include/asm/xen/page-coherent.h
+++ /dev/null
@@ -1,2 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#include <xen/arm/page-coherent.h>
diff --git a/arch/x86/include/asm/xen/page-coherent.h b/arch/x86/include/asm/xen/page-coherent.h
deleted file mode 100644
index 8ee33c5edded..000000000000
--- a/arch/x86/include/asm/xen/page-coherent.h
+++ /dev/null
@@ -1,22 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _ASM_X86_XEN_PAGE_COHERENT_H
-#define _ASM_X86_XEN_PAGE_COHERENT_H
-
-#include <asm/page.h>
-#include <linux/dma-mapping.h>
-
-static inline void xen_dma_map_page(struct device *hwdev, struct page *page,
-	     dma_addr_t dev_addr, unsigned long offset, size_t size,
-	     enum dma_data_direction dir, unsigned long attrs) { }
-
-static inline void xen_dma_unmap_page(struct device *hwdev, dma_addr_t handle,
-		size_t size, enum dma_data_direction dir,
-		unsigned long attrs) { }
-
-static inline void xen_dma_sync_single_for_cpu(struct device *hwdev,
-		dma_addr_t handle, size_t size, enum dma_data_direction dir) { }
-
-static inline void xen_dma_sync_single_for_device(struct device *hwdev,
-		dma_addr_t handle, size_t size, enum dma_data_direction dir) { }
-
-#endif /* _ASM_X86_XEN_PAGE_COHERENT_H */
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index f9dd4cb6e4b3..7b23929854e7 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -31,12 +31,10 @@
 #include <linux/export.h>
 #include <xen/swiotlb-xen.h>
 #include <xen/page.h>
+#include <xen/page-coherent.h>
 #include <xen/xen-ops.h>
 #include <xen/hvc-console.h>
 
-#include <asm/dma-mapping.h>
-#include <asm/xen/page-coherent.h>
-
 #include <trace/events/swiotlb.h>
 /*
  * Used to do a quick range check in swiotlb_tbl_unmap_single and
diff --git a/include/Kbuild b/include/Kbuild
index c38f0d46b267..e2ae52ef9e1e 100644
--- a/include/Kbuild
+++ b/include/Kbuild
@@ -1189,7 +1189,6 @@ header-test-			+= video/vga.h
 header-test-			+= video/w100fb.h
 header-test-			+= xen/acpi.h
 header-test-			+= xen/arm/hypercall.h
-header-test-			+= xen/arm/page-coherent.h
 header-test-			+= xen/arm/page.h
 header-test-			+= xen/balloon.h
 header-test-			+= xen/events.h
@@ -1231,6 +1230,7 @@ header-test-			+= xen/interface/xen.h
 header-test-			+= xen/interface/xenpmu.h
 header-test-			+= xen/mem-reservation.h
 header-test-			+= xen/page.h
+header-test-			+= xen/page-coherent.h
 header-test-			+= xen/platform_pci.h
 header-test-			+= xen/swiotlb-xen.h
 header-test-			+= xen/xen-front-pgdir-shbuf.h
diff --git a/include/xen/arm/page-coherent.h b/include/xen/page-coherent.h
similarity index 76%
rename from include/xen/arm/page-coherent.h
rename to include/xen/page-coherent.h
index 4294a31305ca..7c32944de051 100644
--- a/include/xen/arm/page-coherent.h
+++ b/include/xen/page-coherent.h
@@ -1,10 +1,12 @@
 /* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _XEN_ARM_PAGE_COHERENT_H
-#define _XEN_ARM_PAGE_COHERENT_H
+#ifndef _XEN_PAGE_COHERENT_H
+#define _XEN_PAGE_COHERENT_H
 
 #include <linux/dma-mapping.h>
 #include <asm/page.h>
 
+#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
+    defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU)
 void __xen_dma_map_page(struct device *hwdev, struct page *page,
 	     dma_addr_t dev_addr, unsigned long offset, size_t size,
 	     enum dma_data_direction dir, unsigned long attrs);
@@ -71,5 +73,24 @@ static inline void xen_dma_unmap_page(struct device *hwdev, dma_addr_t handle,
 	else
 		__xen_dma_unmap_page(hwdev, handle, size, dir, attrs);
 }
+#else
+static inline void xen_dma_map_page(struct device *hwdev, struct page *page,
+	     dma_addr_t dev_addr, unsigned long offset, size_t size,
+	     enum dma_data_direction dir, unsigned long attrs)
+{
+}
+static inline void xen_dma_unmap_page(struct device *hwdev, dma_addr_t handle,
+		size_t size, enum dma_data_direction dir, unsigned long attrs)
+{
+}
+static inline void xen_dma_sync_single_for_cpu(struct device *hwdev,
+		dma_addr_t handle, size_t size, enum dma_data_direction dir)
+{
+}
+static inline void xen_dma_sync_single_for_device(struct device *hwdev,
+		dma_addr_t handle, size_t size, enum dma_data_direction dir)
+{
+}
+#endif
 
-#endif /* _XEN_ARM_PAGE_COHERENT_H */
+#endif /* _XEN_PAGE_COHERENT_H */
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* [PATCH 06/11] swiotlb-xen: always use dma-direct helpers to alloc coherent pages
From: Christoph Hellwig @ 2019-08-16 13:00 UTC (permalink / raw)
  To: Stefano Stabellini, Konrad Rzeszutek Wilk
  Cc: xen-devel, iommu, x86, linux-kernel, linux-arm-kernel
In-Reply-To: <20190816130013.31154-1-hch@lst.de>

x86 currently calls alloc_pages, but using dma-direct works as well
there, with the added benefit of using the CMA pool if available.
The biggest advantage is of course to remove a pointless bit of
architecture specific code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/x86/include/asm/xen/page-coherent.h | 16 ----------------
 drivers/xen/swiotlb-xen.c                |  7 +++----
 include/xen/arm/page-coherent.h          | 12 ------------
 3 files changed, 3 insertions(+), 32 deletions(-)

diff --git a/arch/x86/include/asm/xen/page-coherent.h b/arch/x86/include/asm/xen/page-coherent.h
index 116777e7f387..8ee33c5edded 100644
--- a/arch/x86/include/asm/xen/page-coherent.h
+++ b/arch/x86/include/asm/xen/page-coherent.h
@@ -5,22 +5,6 @@
 #include <asm/page.h>
 #include <linux/dma-mapping.h>
 
-static inline void *xen_alloc_coherent_pages(struct device *hwdev, size_t size,
-		dma_addr_t *dma_handle, gfp_t flags,
-		unsigned long attrs)
-{
-	void *vstart = (void*)__get_free_pages(flags, get_order(size));
-	*dma_handle = virt_to_phys(vstart);
-	return vstart;
-}
-
-static inline void xen_free_coherent_pages(struct device *hwdev, size_t size,
-		void *cpu_addr, dma_addr_t dma_handle,
-		unsigned long attrs)
-{
-	free_pages((unsigned long) cpu_addr, get_order(size));
-}
-
 static inline void xen_dma_map_page(struct device *hwdev, struct page *page,
 	     dma_addr_t dev_addr, unsigned long offset, size_t size,
 	     enum dma_data_direction dir, unsigned long attrs) { }
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index b8808677ae1d..f9dd4cb6e4b3 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -299,8 +299,7 @@ xen_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
 	 * address. In fact on ARM virt_to_phys only works for kernel direct
 	 * mapped RAM memory. Also see comment below.
 	 */
-	ret = xen_alloc_coherent_pages(hwdev, size, dma_handle, flags, attrs);
-
+	ret = dma_direct_alloc(hwdev, size, dma_handle, flags, attrs);
 	if (!ret)
 		return ret;
 
@@ -319,7 +318,7 @@ xen_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
 	else {
 		if (xen_create_contiguous_region(phys, order,
 						 fls64(dma_mask), dma_handle) != 0) {
-			xen_free_coherent_pages(hwdev, size, ret, (dma_addr_t)phys, attrs);
+			dma_direct_free(hwdev, size, ret, (dma_addr_t)phys, attrs);
 			return NULL;
 		}
 		SetPageXenRemapped(virt_to_page(ret));
@@ -351,7 +350,7 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t size, void *vaddr,
 	    TestClearPageXenRemapped(virt_to_page(vaddr)))
 		xen_destroy_contiguous_region(phys, order);
 
-	xen_free_coherent_pages(hwdev, size, vaddr, (dma_addr_t)phys, attrs);
+	dma_direct_free(hwdev, size, vaddr, (dma_addr_t)phys, attrs);
 }
 
 /*
diff --git a/include/xen/arm/page-coherent.h b/include/xen/arm/page-coherent.h
index da2cc09c8eda..4294a31305ca 100644
--- a/include/xen/arm/page-coherent.h
+++ b/include/xen/arm/page-coherent.h
@@ -16,18 +16,6 @@ void __xen_dma_sync_single_for_cpu(struct device *hwdev,
 void __xen_dma_sync_single_for_device(struct device *hwdev,
 		dma_addr_t handle, size_t size, enum dma_data_direction dir);
 
-static inline void *xen_alloc_coherent_pages(struct device *hwdev, size_t size,
-		dma_addr_t *dma_handle, gfp_t flags, unsigned long attrs)
-{
-	return dma_direct_alloc(hwdev, size, dma_handle, flags, attrs);
-}
-
-static inline void xen_free_coherent_pages(struct device *hwdev, size_t size,
-		void *cpu_addr, dma_addr_t dma_handle, unsigned long attrs)
-{
-	dma_direct_free(hwdev, size, cpu_addr, dma_handle, attrs);
-}
-
 static inline void xen_dma_sync_single_for_cpu(struct device *hwdev,
 		dma_addr_t handle, size_t size, enum dma_data_direction dir)
 {
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* [PATCH 05/11] xen: remove the exports for xen_{create, destroy}_contiguous_region
From: Christoph Hellwig @ 2019-08-16 13:00 UTC (permalink / raw)
  To: Stefano Stabellini, Konrad Rzeszutek Wilk
  Cc: xen-devel, iommu, x86, linux-kernel, linux-arm-kernel
In-Reply-To: <20190816130013.31154-1-hch@lst.de>

These routines are only used by swiotlb-xen, which cannot be modular.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/arm/xen/mm.c     | 2 --
 arch/x86/xen/mmu_pv.c | 2 --
 2 files changed, 4 deletions(-)

diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
index 388a45002bad..a59980f1aa54 100644
--- a/arch/arm/xen/mm.c
+++ b/arch/arm/xen/mm.c
@@ -175,13 +175,11 @@ int xen_create_contiguous_region(phys_addr_t pstart, unsigned int order,
 	*dma_handle = pstart;
 	return 0;
 }
-EXPORT_SYMBOL_GPL(xen_create_contiguous_region);
 
 void xen_destroy_contiguous_region(phys_addr_t pstart, unsigned int order)
 {
 	return;
 }
-EXPORT_SYMBOL_GPL(xen_destroy_contiguous_region);
 
 int __init xen_mm_init(void)
 {
diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
index 26e8b326966d..c8dbee62ec2a 100644
--- a/arch/x86/xen/mmu_pv.c
+++ b/arch/x86/xen/mmu_pv.c
@@ -2625,7 +2625,6 @@ int xen_create_contiguous_region(phys_addr_t pstart, unsigned int order,
 	*dma_handle = virt_to_machine(vstart).maddr;
 	return success ? 0 : -ENOMEM;
 }
-EXPORT_SYMBOL_GPL(xen_create_contiguous_region);
 
 void xen_destroy_contiguous_region(phys_addr_t pstart, unsigned int order)
 {
@@ -2660,7 +2659,6 @@ void xen_destroy_contiguous_region(phys_addr_t pstart, unsigned int order)
 
 	spin_unlock_irqrestore(&xen_reservation_lock, flags);
 }
-EXPORT_SYMBOL_GPL(xen_destroy_contiguous_region);
 
 static noinline void xen_flush_tlb_all(void)
 {
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* [PATCH 04/11] xen/arm: remove xen_dma_ops
From: Christoph Hellwig @ 2019-08-16 13:00 UTC (permalink / raw)
  To: Stefano Stabellini, Konrad Rzeszutek Wilk
  Cc: xen-devel, iommu, x86, linux-kernel, linux-arm-kernel
In-Reply-To: <20190816130013.31154-1-hch@lst.de>

arm and arm64 can just use xen_swiotlb_dma_ops directly like x86, no
need for a pointer indirection.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/arm/mm/dma-mapping.c    | 3 ++-
 arch/arm/xen/mm.c            | 4 ----
 arch/arm64/mm/dma-mapping.c  | 3 ++-
 include/xen/arm/hypervisor.h | 2 --
 4 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 738097396445..2661cad36359 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -35,6 +35,7 @@
 #include <asm/mach/map.h>
 #include <asm/system_info.h>
 #include <asm/dma-contiguous.h>
+#include <xen/swiotlb-xen.h>
 
 #include "dma.h"
 #include "mm.h"
@@ -2360,7 +2361,7 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 
 #ifdef CONFIG_XEN
 	if (xen_initial_domain())
-		dev->dma_ops = xen_dma_ops;
+		dev->dma_ops = &xen_swiotlb_dma_ops;
 #endif
 	dev->archdata.dma_ops_setup = true;
 }
diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
index d9da24fda2f7..388a45002bad 100644
--- a/arch/arm/xen/mm.c
+++ b/arch/arm/xen/mm.c
@@ -183,16 +183,12 @@ void xen_destroy_contiguous_region(phys_addr_t pstart, unsigned int order)
 }
 EXPORT_SYMBOL_GPL(xen_destroy_contiguous_region);
 
-const struct dma_map_ops *xen_dma_ops;
-EXPORT_SYMBOL(xen_dma_ops);
-
 int __init xen_mm_init(void)
 {
 	struct gnttab_cache_flush cflush;
 	if (!xen_initial_domain())
 		return 0;
 	xen_swiotlb_init(1, false);
-	xen_dma_ops = &xen_swiotlb_dma_ops;
 
 	cflush.op = 0;
 	cflush.a.dev_bus_addr = 0;
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index bd2b039f43a6..4b244a037349 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -8,6 +8,7 @@
 #include <linux/cache.h>
 #include <linux/dma-noncoherent.h>
 #include <linux/dma-iommu.h>
+#include <xen/swiotlb-xen.h>
 
 #include <asm/cacheflush.h>
 
@@ -64,6 +65,6 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 
 #ifdef CONFIG_XEN
 	if (xen_initial_domain())
-		dev->dma_ops = xen_dma_ops;
+		dev->dma_ops = &xen_swiotlb_dma_ops;
 #endif
 }
diff --git a/include/xen/arm/hypervisor.h b/include/xen/arm/hypervisor.h
index 2982571f7cc1..43ef24dd030e 100644
--- a/include/xen/arm/hypervisor.h
+++ b/include/xen/arm/hypervisor.h
@@ -19,8 +19,6 @@ static inline enum paravirt_lazy_mode paravirt_get_lazy_mode(void)
 	return PARAVIRT_LAZY_NONE;
 }
 
-extern const struct dma_map_ops *xen_dma_ops;
-
 #ifdef CONFIG_XEN
 void __init xen_early_init(void);
 #else
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* [PATCH 03/11] xen/arm: pass one less argument to dma_cache_maint
From: Christoph Hellwig @ 2019-08-16 13:00 UTC (permalink / raw)
  To: Stefano Stabellini, Konrad Rzeszutek Wilk
  Cc: xen-devel, iommu, x86, linux-kernel, linux-arm-kernel
In-Reply-To: <20190816130013.31154-1-hch@lst.de>

Instead of taking apart the dma address in both callers do it inside
dma_cache_maint itself.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/arm/xen/mm.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
index 90574d89d0d4..d9da24fda2f7 100644
--- a/arch/arm/xen/mm.c
+++ b/arch/arm/xen/mm.c
@@ -43,13 +43,15 @@ static bool hypercall_cflush = false;
 
 /* functions called by SWIOTLB */
 
-static void dma_cache_maint(dma_addr_t handle, unsigned long offset,
-	size_t size, enum dma_data_direction dir, enum dma_cache_op op)
+static void dma_cache_maint(dma_addr_t handle, size_t size,
+		enum dma_data_direction dir, enum dma_cache_op op)
 {
 	struct gnttab_cache_flush cflush;
 	unsigned long xen_pfn;
+	unsigned long offset = handle & ~PAGE_MASK;
 	size_t left = size;
 
+	offset &= PAGE_MASK;
 	xen_pfn = (handle >> XEN_PAGE_SHIFT) + offset / XEN_PAGE_SIZE;
 	offset %= XEN_PAGE_SIZE;
 
@@ -86,13 +88,13 @@ static void dma_cache_maint(dma_addr_t handle, unsigned long offset,
 static void __xen_dma_page_dev_to_cpu(struct device *hwdev, dma_addr_t handle,
 		size_t size, enum dma_data_direction dir)
 {
-	dma_cache_maint(handle & PAGE_MASK, handle & ~PAGE_MASK, size, dir, DMA_UNMAP);
+	dma_cache_maint(handle, size, dir, DMA_UNMAP);
 }
 
 static void __xen_dma_page_cpu_to_dev(struct device *hwdev, dma_addr_t handle,
 		size_t size, enum dma_data_direction dir)
 {
-	dma_cache_maint(handle & PAGE_MASK, handle & ~PAGE_MASK, size, dir, DMA_MAP);
+	dma_cache_maint(handle, size, dir, DMA_MAP);
 }
 
 void __xen_dma_map_page(struct device *hwdev, struct page *page,
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* [PATCH 02/11] xen/arm: use dev_is_dma_coherent
From: Christoph Hellwig @ 2019-08-16 13:00 UTC (permalink / raw)
  To: Stefano Stabellini, Konrad Rzeszutek Wilk
  Cc: xen-devel, iommu, x86, linux-kernel, linux-arm-kernel
In-Reply-To: <20190816130013.31154-1-hch@lst.de>

Use the dma-noncoherent dev_is_dma_coherent helper instead of the home
grown variant.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/arm/include/asm/dma-mapping.h   |  6 ------
 arch/arm/xen/mm.c                    | 12 ++++++------
 arch/arm64/include/asm/dma-mapping.h |  9 ---------
 3 files changed, 6 insertions(+), 21 deletions(-)

diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index dba9355e2484..bdd80ddbca34 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -91,12 +91,6 @@ static inline dma_addr_t virt_to_dma(struct device *dev, void *addr)
 }
 #endif
 
-/* do not use this function in a driver */
-static inline bool is_device_dma_coherent(struct device *dev)
-{
-	return dev->archdata.dma_coherent;
-}
-
 /**
  * arm_dma_alloc - allocate consistent memory for DMA
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
index d33b77e9add3..90574d89d0d4 100644
--- a/arch/arm/xen/mm.c
+++ b/arch/arm/xen/mm.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0-only
 #include <linux/cpu.h>
-#include <linux/dma-mapping.h>
+#include <linux/dma-noncoherent.h>
 #include <linux/gfp.h>
 #include <linux/highmem.h>
 #include <linux/export.h>
@@ -99,7 +99,7 @@ void __xen_dma_map_page(struct device *hwdev, struct page *page,
 	     dma_addr_t dev_addr, unsigned long offset, size_t size,
 	     enum dma_data_direction dir, unsigned long attrs)
 {
-	if (is_device_dma_coherent(hwdev))
+	if (dev_is_dma_coherent(hwdev))
 		return;
 	if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
 		return;
@@ -112,7 +112,7 @@ void __xen_dma_unmap_page(struct device *hwdev, dma_addr_t handle,
 		unsigned long attrs)
 
 {
-	if (is_device_dma_coherent(hwdev))
+	if (dev_is_dma_coherent(hwdev))
 		return;
 	if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
 		return;
@@ -123,7 +123,7 @@ void __xen_dma_unmap_page(struct device *hwdev, dma_addr_t handle,
 void __xen_dma_sync_single_for_cpu(struct device *hwdev,
 		dma_addr_t handle, size_t size, enum dma_data_direction dir)
 {
-	if (is_device_dma_coherent(hwdev))
+	if (dev_is_dma_coherent(hwdev))
 		return;
 	__xen_dma_page_dev_to_cpu(hwdev, handle, size, dir);
 }
@@ -131,7 +131,7 @@ void __xen_dma_sync_single_for_cpu(struct device *hwdev,
 void __xen_dma_sync_single_for_device(struct device *hwdev,
 		dma_addr_t handle, size_t size, enum dma_data_direction dir)
 {
-	if (is_device_dma_coherent(hwdev))
+	if (dev_is_dma_coherent(hwdev))
 		return;
 	__xen_dma_page_cpu_to_dev(hwdev, handle, size, dir);
 }
@@ -159,7 +159,7 @@ bool xen_arch_need_swiotlb(struct device *dev,
 	 * memory and we are not able to flush the cache.
 	 */
 	return (!hypercall_cflush && (xen_pfn != bfn) &&
-		!is_device_dma_coherent(dev));
+		!dev_is_dma_coherent(dev));
 }
 
 int xen_create_contiguous_region(phys_addr_t pstart, unsigned int order,
diff --git a/arch/arm64/include/asm/dma-mapping.h b/arch/arm64/include/asm/dma-mapping.h
index bdcb0922a40c..67243255a858 100644
--- a/arch/arm64/include/asm/dma-mapping.h
+++ b/arch/arm64/include/asm/dma-mapping.h
@@ -18,14 +18,5 @@ static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
 	return NULL;
 }
 
-/*
- * Do not use this function in a driver, it is only provided for
- * arch/arm/mm/xen.c, which is used by arm64 as well.
- */
-static inline bool is_device_dma_coherent(struct device *dev)
-{
-	return dev->dma_coherent;
-}
-
 #endif	/* __KERNEL__ */
 #endif	/* __ASM_DMA_MAPPING_H */
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* [PATCH 01/11] xen/arm: use dma-noncoherent.h calls for xen-swiotlb cache maintainance
From: Christoph Hellwig @ 2019-08-16 13:00 UTC (permalink / raw)
  To: Stefano Stabellini, Konrad Rzeszutek Wilk
  Cc: xen-devel, iommu, x86, linux-kernel, linux-arm-kernel
In-Reply-To: <20190816130013.31154-1-hch@lst.de>

Reuse the arm64 code that uses the dma-direct/swiotlb helpers for DMA
non-coherent devices.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/arm/Kconfig                           |  4 +
 arch/arm/include/asm/device.h              |  3 -
 arch/arm/include/asm/xen/page-coherent.h   | 93 ----------------------
 arch/arm/mm/Kconfig                        |  4 -
 arch/arm/mm/dma-mapping.c                  |  8 +-
 arch/arm64/include/asm/xen/page-coherent.h | 75 -----------------
 drivers/xen/swiotlb-xen.c                  | 49 +-----------
 include/xen/arm/page-coherent.h            | 71 +++++++++++++++++
 8 files changed, 78 insertions(+), 229 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 33b00579beff..24360211534a 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -7,6 +7,8 @@ config ARM
 	select ARCH_HAS_BINFMT_FLAT
 	select ARCH_HAS_DEBUG_VIRTUAL if MMU
 	select ARCH_HAS_DEVMEM_IS_ALLOWED
+	select ARCH_HAS_DMA_COHERENT_TO_PFN if SWIOTLB
+	select ARCH_HAS_DMA_MMAP_PGPROT if SWIOTLB
 	select ARCH_HAS_ELF_RANDOMIZE
 	select ARCH_HAS_FORTIFY_SOURCE
 	select ARCH_HAS_KEEPINITRD
@@ -18,6 +20,8 @@ config ARM
 	select ARCH_HAS_SET_MEMORY
 	select ARCH_HAS_STRICT_KERNEL_RWX if MMU && !XIP_KERNEL
 	select ARCH_HAS_STRICT_MODULE_RWX if MMU
+	select ARCH_HAS_SYNC_DMA_FOR_DEVICE if SWIOTLB
+	select ARCH_HAS_SYNC_DMA_FOR_CPU if SWIOTLB
 	select ARCH_HAS_TEARDOWN_DMA_OPS if MMU
 	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
 	select ARCH_HAVE_CUSTOM_GPIO_H
diff --git a/arch/arm/include/asm/device.h b/arch/arm/include/asm/device.h
index f6955b55c544..c675bc0d5aa8 100644
--- a/arch/arm/include/asm/device.h
+++ b/arch/arm/include/asm/device.h
@@ -14,9 +14,6 @@ struct dev_archdata {
 #endif
 #ifdef CONFIG_ARM_DMA_USE_IOMMU
 	struct dma_iommu_mapping	*mapping;
-#endif
-#ifdef CONFIG_XEN
-	const struct dma_map_ops *dev_dma_ops;
 #endif
 	unsigned int dma_coherent:1;
 	unsigned int dma_ops_setup:1;
diff --git a/arch/arm/include/asm/xen/page-coherent.h b/arch/arm/include/asm/xen/page-coherent.h
index 2c403e7c782d..27e984977402 100644
--- a/arch/arm/include/asm/xen/page-coherent.h
+++ b/arch/arm/include/asm/xen/page-coherent.h
@@ -1,95 +1,2 @@
 /* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _ASM_ARM_XEN_PAGE_COHERENT_H
-#define _ASM_ARM_XEN_PAGE_COHERENT_H
-
-#include <linux/dma-mapping.h>
-#include <asm/page.h>
 #include <xen/arm/page-coherent.h>
-
-static inline const struct dma_map_ops *xen_get_dma_ops(struct device *dev)
-{
-	if (dev && dev->archdata.dev_dma_ops)
-		return dev->archdata.dev_dma_ops;
-	return get_arch_dma_ops(NULL);
-}
-
-static inline void *xen_alloc_coherent_pages(struct device *hwdev, size_t size,
-		dma_addr_t *dma_handle, gfp_t flags, unsigned long attrs)
-{
-	return xen_get_dma_ops(hwdev)->alloc(hwdev, size, dma_handle, flags, attrs);
-}
-
-static inline void xen_free_coherent_pages(struct device *hwdev, size_t size,
-		void *cpu_addr, dma_addr_t dma_handle, unsigned long attrs)
-{
-	xen_get_dma_ops(hwdev)->free(hwdev, size, cpu_addr, dma_handle, attrs);
-}
-
-static inline void xen_dma_map_page(struct device *hwdev, struct page *page,
-	     dma_addr_t dev_addr, unsigned long offset, size_t size,
-	     enum dma_data_direction dir, unsigned long attrs)
-{
-	unsigned long page_pfn = page_to_xen_pfn(page);
-	unsigned long dev_pfn = XEN_PFN_DOWN(dev_addr);
-	unsigned long compound_pages =
-		(1<<compound_order(page)) * XEN_PFN_PER_PAGE;
-	bool local = (page_pfn <= dev_pfn) &&
-		(dev_pfn - page_pfn < compound_pages);
-
-	/*
-	 * Dom0 is mapped 1:1, while the Linux page can span across
-	 * multiple Xen pages, it's not possible for it to contain a
-	 * mix of local and foreign Xen pages. So if the first xen_pfn
-	 * == mfn the page is local otherwise it's a foreign page
-	 * grant-mapped in dom0. If the page is local we can safely
-	 * call the native dma_ops function, otherwise we call the xen
-	 * specific function.
-	 */
-	if (local)
-		xen_get_dma_ops(hwdev)->map_page(hwdev, page, offset, size, dir, attrs);
-	else
-		__xen_dma_map_page(hwdev, page, dev_addr, offset, size, dir, attrs);
-}
-
-static inline void xen_dma_unmap_page(struct device *hwdev, dma_addr_t handle,
-		size_t size, enum dma_data_direction dir, unsigned long attrs)
-{
-	unsigned long pfn = PFN_DOWN(handle);
-	/*
-	 * Dom0 is mapped 1:1, while the Linux page can be spanned accross
-	 * multiple Xen page, it's not possible to have a mix of local and
-	 * foreign Xen page. Dom0 is mapped 1:1, so calling pfn_valid on a
-	 * foreign mfn will always return false. If the page is local we can
-	 * safely call the native dma_ops function, otherwise we call the xen
-	 * specific function.
-	 */
-	if (pfn_valid(pfn)) {
-		if (xen_get_dma_ops(hwdev)->unmap_page)
-			xen_get_dma_ops(hwdev)->unmap_page(hwdev, handle, size, dir, attrs);
-	} else
-		__xen_dma_unmap_page(hwdev, handle, size, dir, attrs);
-}
-
-static inline void xen_dma_sync_single_for_cpu(struct device *hwdev,
-		dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-	unsigned long pfn = PFN_DOWN(handle);
-	if (pfn_valid(pfn)) {
-		if (xen_get_dma_ops(hwdev)->sync_single_for_cpu)
-			xen_get_dma_ops(hwdev)->sync_single_for_cpu(hwdev, handle, size, dir);
-	} else
-		__xen_dma_sync_single_for_cpu(hwdev, handle, size, dir);
-}
-
-static inline void xen_dma_sync_single_for_device(struct device *hwdev,
-		dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-	unsigned long pfn = PFN_DOWN(handle);
-	if (pfn_valid(pfn)) {
-		if (xen_get_dma_ops(hwdev)->sync_single_for_device)
-			xen_get_dma_ops(hwdev)->sync_single_for_device(hwdev, handle, size, dir);
-	} else
-		__xen_dma_sync_single_for_device(hwdev, handle, size, dir);
-}
-
-#endif /* _ASM_ARM_XEN_PAGE_COHERENT_H */
diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index c54cd7ed90ba..c1222c0e9fd3 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -664,10 +664,6 @@ config ARM_LPAE
 		!CPU_32v4 && !CPU_32v3
 	select PHYS_ADDR_T_64BIT
 	select SWIOTLB
-	select ARCH_HAS_DMA_COHERENT_TO_PFN
-	select ARCH_HAS_DMA_MMAP_PGPROT
-	select ARCH_HAS_SYNC_DMA_FOR_DEVICE
-	select ARCH_HAS_SYNC_DMA_FOR_CPU
 	help
 	  Say Y if you have an ARMv7 processor supporting the LPAE page
 	  table format and you would like to access memory beyond the
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index d42557ee69c2..738097396445 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1132,10 +1132,6 @@ static const struct dma_map_ops *arm_get_dma_map_ops(bool coherent)
 	 * 32-bit DMA.
 	 * Use the generic dma-direct / swiotlb ops code in that case, as that
 	 * handles bounce buffering for us.
-	 *
-	 * Note: this checks CONFIG_ARM_LPAE instead of CONFIG_SWIOTLB as the
-	 * latter is also selected by the Xen code, but that code for now relies
-	 * on non-NULL dev_dma_ops.  To be cleaned up later.
 	 */
 	if (IS_ENABLED(CONFIG_ARM_LPAE))
 		return NULL;
@@ -2363,10 +2359,8 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 	set_dma_ops(dev, dma_ops);
 
 #ifdef CONFIG_XEN
-	if (xen_initial_domain()) {
-		dev->archdata.dev_dma_ops = dev->dma_ops;
+	if (xen_initial_domain())
 		dev->dma_ops = xen_dma_ops;
-	}
 #endif
 	dev->archdata.dma_ops_setup = true;
 }
diff --git a/arch/arm64/include/asm/xen/page-coherent.h b/arch/arm64/include/asm/xen/page-coherent.h
index d88e56b90b93..27e984977402 100644
--- a/arch/arm64/include/asm/xen/page-coherent.h
+++ b/arch/arm64/include/asm/xen/page-coherent.h
@@ -1,77 +1,2 @@
 /* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _ASM_ARM64_XEN_PAGE_COHERENT_H
-#define _ASM_ARM64_XEN_PAGE_COHERENT_H
-
-#include <linux/dma-mapping.h>
-#include <asm/page.h>
 #include <xen/arm/page-coherent.h>
-
-static inline void *xen_alloc_coherent_pages(struct device *hwdev, size_t size,
-		dma_addr_t *dma_handle, gfp_t flags, unsigned long attrs)
-{
-	return dma_direct_alloc(hwdev, size, dma_handle, flags, attrs);
-}
-
-static inline void xen_free_coherent_pages(struct device *hwdev, size_t size,
-		void *cpu_addr, dma_addr_t dma_handle, unsigned long attrs)
-{
-	dma_direct_free(hwdev, size, cpu_addr, dma_handle, attrs);
-}
-
-static inline void xen_dma_sync_single_for_cpu(struct device *hwdev,
-		dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-	unsigned long pfn = PFN_DOWN(handle);
-
-	if (pfn_valid(pfn))
-		dma_direct_sync_single_for_cpu(hwdev, handle, size, dir);
-	else
-		__xen_dma_sync_single_for_cpu(hwdev, handle, size, dir);
-}
-
-static inline void xen_dma_sync_single_for_device(struct device *hwdev,
-		dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-	unsigned long pfn = PFN_DOWN(handle);
-	if (pfn_valid(pfn))
-		dma_direct_sync_single_for_device(hwdev, handle, size, dir);
-	else
-		__xen_dma_sync_single_for_device(hwdev, handle, size, dir);
-}
-
-static inline void xen_dma_map_page(struct device *hwdev, struct page *page,
-	     dma_addr_t dev_addr, unsigned long offset, size_t size,
-	     enum dma_data_direction dir, unsigned long attrs)
-{
-	unsigned long page_pfn = page_to_xen_pfn(page);
-	unsigned long dev_pfn = XEN_PFN_DOWN(dev_addr);
-	unsigned long compound_pages =
-		(1<<compound_order(page)) * XEN_PFN_PER_PAGE;
-	bool local = (page_pfn <= dev_pfn) &&
-		(dev_pfn - page_pfn < compound_pages);
-
-	if (local)
-		dma_direct_map_page(hwdev, page, offset, size, dir, attrs);
-	else
-		__xen_dma_map_page(hwdev, page, dev_addr, offset, size, dir, attrs);
-}
-
-static inline void xen_dma_unmap_page(struct device *hwdev, dma_addr_t handle,
-		size_t size, enum dma_data_direction dir, unsigned long attrs)
-{
-	unsigned long pfn = PFN_DOWN(handle);
-	/*
-	 * Dom0 is mapped 1:1, while the Linux page can be spanned accross
-	 * multiple Xen page, it's not possible to have a mix of local and
-	 * foreign Xen page. Dom0 is mapped 1:1, so calling pfn_valid on a
-	 * foreign mfn will always return false. If the page is local we can
-	 * safely call the native dma_ops function, otherwise we call the xen
-	 * specific function.
-	 */
-	if (pfn_valid(pfn))
-		dma_direct_unmap_page(hwdev, handle, size, dir, attrs);
-	else
-		__xen_dma_unmap_page(hwdev, handle, size, dir, attrs);
-}
-
-#endif /* _ASM_ARM64_XEN_PAGE_COHERENT_H */
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index ae1df496bf38..b8808677ae1d 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -547,51 +547,6 @@ xen_swiotlb_dma_supported(struct device *hwdev, u64 mask)
 	return xen_virt_to_bus(xen_io_tlb_end - 1) <= mask;
 }
 
-/*
- * Create userspace mapping for the DMA-coherent memory.
- * This function should be called with the pages from the current domain only,
- * passing pages mapped from other domains would lead to memory corruption.
- */
-static int
-xen_swiotlb_dma_mmap(struct device *dev, struct vm_area_struct *vma,
-		     void *cpu_addr, dma_addr_t dma_addr, size_t size,
-		     unsigned long attrs)
-{
-#ifdef CONFIG_ARM
-	if (xen_get_dma_ops(dev)->mmap)
-		return xen_get_dma_ops(dev)->mmap(dev, vma, cpu_addr,
-						    dma_addr, size, attrs);
-#endif
-	return dma_common_mmap(dev, vma, cpu_addr, dma_addr, size, attrs);
-}
-
-/*
- * This function should be called with the pages from the current domain only,
- * passing pages mapped from other domains would lead to memory corruption.
- */
-static int
-xen_swiotlb_get_sgtable(struct device *dev, struct sg_table *sgt,
-			void *cpu_addr, dma_addr_t handle, size_t size,
-			unsigned long attrs)
-{
-#ifdef CONFIG_ARM
-	if (xen_get_dma_ops(dev)->get_sgtable) {
-#if 0
-	/*
-	 * This check verifies that the page belongs to the current domain and
-	 * is not one mapped from another domain.
-	 * This check is for debug only, and should not go to production build
-	 */
-		unsigned long bfn = PHYS_PFN(dma_to_phys(dev, handle));
-		BUG_ON (!page_is_ram(bfn));
-#endif
-		return xen_get_dma_ops(dev)->get_sgtable(dev, sgt, cpu_addr,
-							   handle, size, attrs);
-	}
-#endif
-	return dma_common_get_sgtable(dev, sgt, cpu_addr, handle, size, attrs);
-}
-
 const struct dma_map_ops xen_swiotlb_dma_ops = {
 	.alloc = xen_swiotlb_alloc_coherent,
 	.free = xen_swiotlb_free_coherent,
@@ -604,6 +559,6 @@ const struct dma_map_ops xen_swiotlb_dma_ops = {
 	.map_page = xen_swiotlb_map_page,
 	.unmap_page = xen_swiotlb_unmap_page,
 	.dma_supported = xen_swiotlb_dma_supported,
-	.mmap = xen_swiotlb_dma_mmap,
-	.get_sgtable = xen_swiotlb_get_sgtable,
+	.mmap = dma_common_mmap,
+	.get_sgtable = dma_common_get_sgtable,
 };
diff --git a/include/xen/arm/page-coherent.h b/include/xen/arm/page-coherent.h
index 2ca9164a79bf..da2cc09c8eda 100644
--- a/include/xen/arm/page-coherent.h
+++ b/include/xen/arm/page-coherent.h
@@ -2,6 +2,9 @@
 #ifndef _XEN_ARM_PAGE_COHERENT_H
 #define _XEN_ARM_PAGE_COHERENT_H
 
+#include <linux/dma-mapping.h>
+#include <asm/page.h>
+
 void __xen_dma_map_page(struct device *hwdev, struct page *page,
 	     dma_addr_t dev_addr, unsigned long offset, size_t size,
 	     enum dma_data_direction dir, unsigned long attrs);
@@ -13,4 +16,72 @@ void __xen_dma_sync_single_for_cpu(struct device *hwdev,
 void __xen_dma_sync_single_for_device(struct device *hwdev,
 		dma_addr_t handle, size_t size, enum dma_data_direction dir);
 
+static inline void *xen_alloc_coherent_pages(struct device *hwdev, size_t size,
+		dma_addr_t *dma_handle, gfp_t flags, unsigned long attrs)
+{
+	return dma_direct_alloc(hwdev, size, dma_handle, flags, attrs);
+}
+
+static inline void xen_free_coherent_pages(struct device *hwdev, size_t size,
+		void *cpu_addr, dma_addr_t dma_handle, unsigned long attrs)
+{
+	dma_direct_free(hwdev, size, cpu_addr, dma_handle, attrs);
+}
+
+static inline void xen_dma_sync_single_for_cpu(struct device *hwdev,
+		dma_addr_t handle, size_t size, enum dma_data_direction dir)
+{
+	unsigned long pfn = PFN_DOWN(handle);
+
+	if (pfn_valid(pfn))
+		dma_direct_sync_single_for_cpu(hwdev, handle, size, dir);
+	else
+		__xen_dma_sync_single_for_cpu(hwdev, handle, size, dir);
+}
+
+static inline void xen_dma_sync_single_for_device(struct device *hwdev,
+		dma_addr_t handle, size_t size, enum dma_data_direction dir)
+{
+	unsigned long pfn = PFN_DOWN(handle);
+	if (pfn_valid(pfn))
+		dma_direct_sync_single_for_device(hwdev, handle, size, dir);
+	else
+		__xen_dma_sync_single_for_device(hwdev, handle, size, dir);
+}
+
+static inline void xen_dma_map_page(struct device *hwdev, struct page *page,
+	     dma_addr_t dev_addr, unsigned long offset, size_t size,
+	     enum dma_data_direction dir, unsigned long attrs)
+{
+	unsigned long page_pfn = page_to_xen_pfn(page);
+	unsigned long dev_pfn = XEN_PFN_DOWN(dev_addr);
+	unsigned long compound_pages =
+		(1<<compound_order(page)) * XEN_PFN_PER_PAGE;
+	bool local = (page_pfn <= dev_pfn) &&
+		(dev_pfn - page_pfn < compound_pages);
+
+	if (local)
+		dma_direct_map_page(hwdev, page, offset, size, dir, attrs);
+	else
+		__xen_dma_map_page(hwdev, page, dev_addr, offset, size, dir, attrs);
+}
+
+static inline void xen_dma_unmap_page(struct device *hwdev, dma_addr_t handle,
+		size_t size, enum dma_data_direction dir, unsigned long attrs)
+{
+	unsigned long pfn = PFN_DOWN(handle);
+	/*
+	 * Dom0 is mapped 1:1, while the Linux page can be spanned accross
+	 * multiple Xen page, it's not possible to have a mix of local and
+	 * foreign Xen page. Dom0 is mapped 1:1, so calling pfn_valid on a
+	 * foreign mfn will always return false. If the page is local we can
+	 * safely call the native dma_ops function, otherwise we call the xen
+	 * specific function.
+	 */
+	if (pfn_valid(pfn))
+		dma_direct_unmap_page(hwdev, handle, size, dir, attrs);
+	else
+		__xen_dma_unmap_page(hwdev, handle, size, dir, attrs);
+}
+
 #endif /* _XEN_ARM_PAGE_COHERENT_H */
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* swiotlb-xen cleanups
From: Christoph Hellwig @ 2019-08-16 13:00 UTC (permalink / raw)
  To: Stefano Stabellini, Konrad Rzeszutek Wilk
  Cc: xen-devel, iommu, x86, linux-kernel, linux-arm-kernel

Hi Xen maintainers and friends,

please take a look at this series that cleans up the parts of swiotlb-xen
that deal with non-coherent caches.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* [PATCH v3 5/5] Documentation: arm64: Document PMU counters access from userspace
From: Raphael Gault @ 2019-08-16 12:59 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel
  Cc: mark.rutland, raph.gault+kdev, peterz, catalin.marinas,
	will.deacon, acme, Raphael Gault, mingo
In-Reply-To: <20190816125934.18509-1-raphael.gault@arm.com>

Add a documentation file to describe the access to the pmu hardware
counters from userspace

Signed-off-by: Raphael Gault <raphael.gault@arm.com>
---
 .../arm64/pmu_counter_user_access.txt         | 42 +++++++++++++++++++
 1 file changed, 42 insertions(+)
 create mode 100644 Documentation/arm64/pmu_counter_user_access.txt

diff --git a/Documentation/arm64/pmu_counter_user_access.txt b/Documentation/arm64/pmu_counter_user_access.txt
new file mode 100644
index 000000000000..6788b1107381
--- /dev/null
+++ b/Documentation/arm64/pmu_counter_user_access.txt
@@ -0,0 +1,42 @@
+Access to PMU hardware counter from userspace
+=============================================
+
+Overview
+--------
+The perf user-space tool relies on the PMU to monitor events. It offers an
+abstraction layer over the hardware counters since the underlying
+implementation is cpu-dependent.
+Arm64 allows userspace tools to have access to the registers storing the
+hardware counters' values directly.
+
+This targets specifically self-monitoring tasks in order to reduce the overhead
+by directly accessing the registers without having to go through the kernel.
+
+How-to
+------
+The focus is set on the armv8 pmuv3 which makes sure that the access to the pmu
+registers is enable and that the userspace have access to the relevent
+information in order to use them.
+
+In order to have access to the hardware counter it is necessary to open the event
+using the perf tool interface: the sys_perf_event_open syscall returns a fd which
+can subsequently be used with the mmap syscall in order to retrieve a page of memory
+containing information about the event.
+The PMU driver uses this page to expose to the user the hardware counter's
+index. Using this index enables the user to access the PMU registers using the
+`mrs` instruction.
+
+Have a look `at tools/perf/arch/arm64/tests/user-events.c` for an example. It can be
+run using the perf tool to check that the access to the registers works
+correctly from userspace:
+
+./perf test -v
+
+About chained events
+--------------------
+When the user requests for an event to be counted on 64 bits, two hardware
+counters are used and need to be combined to retrieve the correct value:
+
+val = read_counter(idx);
+if ((event.attr.config1 & 0x1))
+	val = (val << 32) | read_counter(idx - 1);
-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox