xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* PCI Passthrough Design - Draft 3
@ 2015-08-04 12:27 Manish Jaggi
  2015-08-11 20:34 ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 8+ messages in thread
From: Manish Jaggi @ 2015-08-04 12:27 UTC (permalink / raw)
  To: Xen Devel
  Cc: Prasun.kapoor@cavium.com, Kumar, Vijaya, Julien Grall,
	Ian Campbell, Stefano Stabellini

              -----------------------------
             | PCI Pass-through in Xen ARM |
              -----------------------------
             manish.jaggi@caviumnetworks.com
             -------------------------------

                      Draft-3


-------------------------------------------------------------------------------
Introduction
-------------------------------------------------------------------------------
This document describes the design for the PCI passthrough support in 
Xen ARM.
The target system is an ARM 64bit Soc with GICv3 and SMMU v2 and PCIe 
devices.

-------------------------------------------------------------------------------
Revision History
-------------------------------------------------------------------------------
Changes from Draft-1:
---------------------
a) map_mmio hypercall removed from earlier draft
b) device bar mapping into guest not 1:1
c) holes in guest address space 32bit / 64bit for MMIO virtual BARs
d) xenstore device's BAR info addition.

Changes from Draft-2:
---------------------
a) DomU boot information updated with boot-time device assignment and 
hotplug.
b) SMMU description added
c) Mapping between streamID - bdf - deviceID.
d) assign_device hypercall to include virtual(guest) sbdf.
Toolstack to generate guest sbdf rather than pciback.

-------------------------------------------------------------------------------
Index
-------------------------------------------------------------------------------
   (1) Background

   (2) Basic PCI Support in Xen ARM
   (2.1)    pci_hostbridge and pci_hostbridge_ops
   (2.2)    PHYSDEVOP_HOSTBRIDGE_ADD hypercall

   (3) SMMU programming
   (3.1) Additions for PCI Passthrough
   (3.2)    Mapping between streamID - deviceID - pci sbdf

   (4) Assignment of PCI device

   (4.1) Dom0
   (4.1.1) Stage 2 Mapping of GITS_ITRANSLATER space (4k)
   (4.1.1.1) For Dom0
   (4.1.1.2) For DomU
   (4.1.1.2.1) Hypercall Details: XEN_DOMCTL_get_itranslater_space

   (4.2) DomU
   (4.2.1) Reserved Areas in guest memory space
   (4.2.2) New entries in xenstore for device BARs
   (4.2.4) Hypercall Modification for bdf mapping notification to xen

   (5) DomU FrontEnd Bus Changes
   (5.1)    Change in Linux PCI FrontEnd - backend driver for MSI/X 
programming
   (5.2)    Frontend bus and interrupt parent vITS

   (6) NUMA and PCI passthrough
-------------------------------------------------------------------------------

1.    Background of PCI passthrough
--------------------------------------
Passthrough refers to assigning a pci device to a guest domain (domU) 
such that
the guest has full control over the device. The MMIO space and 
interrupts are
managed by the guest itself, close to how a bare kernel manages a device.

Device's access to guest address space needs to be isolated and 
protected. SMMU
(System MMU - IOMMU in ARM) is programmed by xen hypervisor to allow device
access guest memory for data transfer and sending MSI/X interrupts. PCI 
devices
generated message signalled interrupt write are within guest address 
spaces which
are also translated using SMMU.
For this reason the GITS (ITS address space) Interrupt Translation Register
space is mapped in the guest address space.

2.    Basic PCI Support for ARM
----------------------------------
The apis to read write from pci configuration space are based on 
segment:bdf.
How the sbdf is mapped to a physical address is under the realm of the pci
host controller.

ARM PCI support in Xen, introduces pci host controller similar to what 
exists
in Linux. Each drivers registers callbacks, which are invoked on 
matching the
compatible property in pci device tree node.

2.1    pci_hostbridge and pci_hostbridge_ops
----------------------------------------------
The init function in the pci host driver calls to register hostbridge 
callbacks:
int pci_hostbridge_register(pci_hostbridge_t *pcihb);

struct pci_hostbridge_ops {
     u32 (*pci_conf_read)(struct pci_hostbridge*, u32 bus, u32 devfn,
                                 u32 reg, u32 bytes);
     void (*pci_conf_write)(struct pci_hostbridge*, u32 bus, u32 devfn,
                                 u32 reg, u32 bytes, u32 val);
};

struct pci_hostbridge{
     u32 segno;
     paddr_t cfg_base;
     paddr_t cfg_size;
     struct dt_device_node *dt_node;
     struct pci_hostbridge_ops ops;
     struct list_head list;
};

A pci conf read function would internally be as follows:
u32 pcihb_conf_read(u32 seg, u32 bus, u32 devfn,u32 reg, u32 bytes)
{
     pci_hostbridge_t *pcihb;
     list_for_each_entry(pcihb, &pci_hostbridge_list, list)
     {
         if(pcihb->segno == seg)
             return pcihb->ops.pci_conf_read(pcihb, bus, devfn, reg, bytes);
     }
     return -1;
}

2.2    PHYSDEVOP_pci_host_bridge_add hypercall
----------------------------------------------
Xen code accesses PCI configuration space based on the sbdf received 
from the
guest. The order in which the pci device tree node appear may not be the 
same
order of device enumeration in dom0. Thus there needs to be a mechanism 
to bind
the segment number assigned by dom0 to the pci host controller. The 
hypercall
is introduced:

#define PHYSDEVOP_pci_host_bridge_add    44
struct physdev_pci_host_bridge_add {
     /* IN */
     uint16_t seg;
     uint64_t cfg_base;
     uint64_t cfg_size;
};

This hypercall is invoked before dom0 invokes the PHYSDEVOP_pci_device_add
hypercall. The handler code invokes to update segment number in 
pci_hostbridge:

int pci_hostbridge_setup(uint32_t segno, uint64_t cfg_base, uint64_t 
cfg_size);

Subsequent calls to pci_conf_read/write are completed by the 
pci_hostbridge_ops
of the respective pci_hostbridge.

2.3    Helper Functions
------------------------
a) pci_hostbridge_dt_node(pdev->seg);
Returns the device tree node pointer of the pci node from which the pdev 
got
enumerated.

3.    SMMU programming
-------------------

3.1.    Additions for PCI Passthrough
-----------------------------------
3.1.1 - add_device in iommu_ops is implemented.

This is called when PHYSDEVOP_pci_add_device is called from dom0.

.add_device = arm_smmu_add_dom0_dev,
static int arm_smmu_add_dom0_dev(u8 devfn, struct device *dev)
{
         if (dev_is_pci(dev)) {
             struct pci_dev *pdev = to_pci_dev(dev);
             return arm_smmu_assign_dev(pdev->domain, devfn, dev);
         }
         return -1;
}

3.1.2 dev_get_dev_node is modified for pci devices.
-------------------------------------------------------------------------
The function is modified to return the dt_node of the pci hostbridge from
the device tree. This is required as non-dt devices need a way to find on
which smmu they are attached.

static struct arm_smmu_device *find_smmu_for_device(struct device *dev)
{
         struct device_node *dev_node = dev_get_dev_node(dev);
....

static struct device_node *dev_get_dev_node(struct device *dev)
{
         if (dev_is_pci(dev)) {
                 struct pci_dev *pdev = to_pci_dev(dev);
                 return pci_hostbridge_dt_node(pdev->seg);
         }
...


3.2.    Mapping between streamID - deviceID - pci sbdf - requesterID
---------------------------------------------------------------------
For a simpler case all should be equal to BDF. But there are some 
devices that
use the wrong requester ID for DMA transactions. Linux kernel has pci 
quirks
for these. How the same be implemented in Xen or a diffrent approach has 
to be
taken is TODO here.
Till that time, for basic implementation it is assumed that all are 
equal to BDF.


4.    Assignment of PCI device
---------------------------------

4.1    Dom0
------------
All PCI devices are assigned to dom0 unless hidden by pci-hide bootargs 
in dom0.
Dom0 enumerates the PCI devices. For each device the MMIO space has to 
be mapped
in the Stage2 translation for dom0. For dom0 xen maps the ranges from dt pci
nodes in stage 2 translation during boot.

4.1.1    Stage 2 Mapping of GITS_ITRANSLATER space (64k)
------------------------------------------------------

GITS_ITRANSLATER space (64k) must be programmed in Stage2 translation so 
that SMMU
can translate MSI(x) from the device using the page table of the domain.

4.1.1.1 For Dom0
-----------------
GITS_ITRANSLATER address space is mapped 1:1 during dom0 boot. For dom0 
this
mapping is done in the vgic driver. For domU the mapping is done by 
toolstack.

4.1.1.2    For DomU
-----------------
For domU, while creating the domain, the toolstack reads the IPA from the
macro GITS_ITRANSLATER_SPACE from xen/include/public/arch-arm.h. The PA is
read from a new hypercall which returns the PA of the 
GITS_ITRANSLATER_SPACE.
Subsequently the toolstack sends a hypercall to create a stage 2 mapping.

Hypercall Details: XEN_DOMCTL_get_itranslater_space

/* XEN_DOMCTL_get_itranslater_space */
struct xen_domctl_get_itranslater_space {
     /* OUT variables. */
     uint64_aligned_t start_addr;
     uint64_aligned_t size;
};
typedef struct xen_domctl_get_itranslater_space 
xen_domctl_get_itranslater_space;
DEFINE_XEN_GUEST_HANDLE(xen_domctl_get_itranslater_space;

4.2    DomU
------------
There are two ways a device is assigned
In the flow of pci-attach device, the toolstack will read the pci 
configuration
space BAR registers. The toolstack has the guest memory map and the 
information
of the MMIO holes.

When the first pci device is assigned to domU, toolstack allocates a virtual
BAR region from the MMIO hole area. toolstack then sends domctl
xc_domain_memory_mapping to map in stage2 translation.

4.2.1    Reserved Areas in guest memory space
--------------------------------------------
Parts of the guest address space is reserved for mapping assigned pci 
device's
BAR regions. Toolstack is responsible for allocating ranges from this 
area and
creating stage 2 mapping for the domain.

/* For 32bit */
GUEST_MMIO_BAR_BASE_32, GUEST_MMIO_BAR_SIZE_32

/* For 64bit */

GUEST_MMIO_BAR_BASE_64, GUEST_MMIO_BAR_SIZE_64

Note: For 64bit systems, PCI BAR regions should be mapped from
GUEST_MMIO_BAR_BASE_64.

IPA is allocated from the {GUEST_MMIO_BAR_BASE_64, GUEST_MMIO_BAR_SIZE_64}
range and PA is the values read from the BAR registers.

4.2.2    New entries in xenstore for device BARs
-----------------------------------------------
toolstack also updates the xenstore information for the device
(virtualbar:physical bar).This information is read by xenpciback and 
returned
to the pcifront driver configuration space reads for BAR.

Entries created are as follows:
/local/domain/0/backend/pci/1/0
vdev-N
     BDF = ""
     BAR-0-IPA = ""
     BAR-0-PA = ""
     BAR-0-SIZE = ""
     ...
     BAR-M-IPA = ""
     BAR-M-PA = ""
     BAR-M-SIZE = ""

Note: Is BAR M SIZE is 0, it is not a valied entry.

4.2.4    Hypercall Modification for bdf mapping notification to xen
-------------------------------------------------------------------
Guest devfn generation currently done by xen-pciback to be done by toolstack
only. Guest devfn is generated at the time of domain creation (if pci 
devices
are specified in cfg file) or using xl pci-attach call.

5. DomU FrontEnd Bus Changes
------------------------------------------------------------------------------- 

5.1    Change in Linux PCI ForntEnd - backend driver for MSI/X programming
---------------------------------------------------------------------------
FrontEnd backend communication for MSI is removed in XEN ARM. It would be
handled by the gic-its driver in guest kernel and trapped in xen.

5.2    Frontend bus and interrupt parent vITS
-----------------------------------------------
On the Pci frontend bus msi-parent gicv3-its is added. As there is a single
virtual its for a domU, as there is only a single virtual pci bus in 
domU. This
ensures that the config_msi calls are handled by the gicv3 its driver in 
domU
kernel and not utilising frontend-backend communication between dom0-domU.

It is required to have a gicv3-its node in guest device tree.

6.    NUMA domU and vITS
--------------------------
a) On NUMA systems domU still have a single its node.
b) How can xen identify the ITS on which a device is connected.
- Using segment number query using api which gives pci host controllers
device node

struct dt_device_node* pci_hostbridge_dt_node(uint32_t segno)

c) Query the interrupt parent of the pci device node to find out the its.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PCI Passthrough Design - Draft 3
  2015-08-04 12:27 PCI Passthrough Design - Draft 3 Manish Jaggi
@ 2015-08-11 20:34 ` Konrad Rzeszutek Wilk
  2015-08-12  7:33   ` Manish Jaggi
  2015-08-12  8:56   ` Ian Campbell
  0 siblings, 2 replies; 8+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-08-11 20:34 UTC (permalink / raw)
  To: Manish Jaggi
  Cc: Prasun.kapoor@cavium.com, Ian Campbell, Stefano Stabellini,
	Kumar, Vijaya, Julien Grall, Xen Devel

On Tue, Aug 04, 2015 at 05:57:24PM +0530, Manish Jaggi wrote:
>              -----------------------------
>             | PCI Pass-through in Xen ARM |
>              -----------------------------
>             manish.jaggi@caviumnetworks.com
>             -------------------------------
> 
>                      Draft-3
> 
> 
> -------------------------------------------------------------------------------
> Introduction
> -------------------------------------------------------------------------------
> This document describes the design for the PCI passthrough support in Xen
> ARM.
> The target system is an ARM 64bit Soc with GICv3 and SMMU v2 and PCIe
> devices.
> 
> -------------------------------------------------------------------------------
> Revision History
> -------------------------------------------------------------------------------
> Changes from Draft-1:
> ---------------------
> a) map_mmio hypercall removed from earlier draft
> b) device bar mapping into guest not 1:1
> c) holes in guest address space 32bit / 64bit for MMIO virtual BARs
> d) xenstore device's BAR info addition.
> 
> Changes from Draft-2:
> ---------------------
> a) DomU boot information updated with boot-time device assignment and
> hotplug.
> b) SMMU description added
> c) Mapping between streamID - bdf - deviceID.
> d) assign_device hypercall to include virtual(guest) sbdf.
> Toolstack to generate guest sbdf rather than pciback.
> 
> -------------------------------------------------------------------------------
> Index
> -------------------------------------------------------------------------------
>   (1) Background
> 
>   (2) Basic PCI Support in Xen ARM
>   (2.1)    pci_hostbridge and pci_hostbridge_ops
>   (2.2)    PHYSDEVOP_HOSTBRIDGE_ADD hypercall
> 
>   (3) SMMU programming
>   (3.1) Additions for PCI Passthrough
>   (3.2)    Mapping between streamID - deviceID - pci sbdf
> 
>   (4) Assignment of PCI device
> 
>   (4.1) Dom0
>   (4.1.1) Stage 2 Mapping of GITS_ITRANSLATER space (4k)
>   (4.1.1.1) For Dom0
>   (4.1.1.2) For DomU
>   (4.1.1.2.1) Hypercall Details: XEN_DOMCTL_get_itranslater_space
> 
>   (4.2) DomU
>   (4.2.1) Reserved Areas in guest memory space
>   (4.2.2) New entries in xenstore for device BARs
>   (4.2.4) Hypercall Modification for bdf mapping notification to xen
> 
>   (5) DomU FrontEnd Bus Changes
>   (5.1)    Change in Linux PCI FrontEnd - backend driver for MSI/X
> programming
>   (5.2)    Frontend bus and interrupt parent vITS
> 
>   (6) NUMA and PCI passthrough
> -------------------------------------------------------------------------------
> 
> 1.    Background of PCI passthrough
> --------------------------------------
> Passthrough refers to assigning a pci device to a guest domain (domU) such
> that
> the guest has full control over the device. The MMIO space and interrupts
> are
> managed by the guest itself, close to how a bare kernel manages a device.

s/pci/PCI/
> 
> Device's access to guest address space needs to be isolated and protected.
> SMMU
> (System MMU - IOMMU in ARM) is programmed by xen hypervisor to allow device
> access guest memory for data transfer and sending MSI/X interrupts. PCI
> devices
> generated message signalled interrupt write are within guest address spaces
> which
> are also translated using SMMU.
> For this reason the GITS (ITS address space) Interrupt Translation Register
> space is mapped in the guest address space.
> 
> 2.    Basic PCI Support for ARM
> ----------------------------------
> The apis to read write from pci configuration space are based on

s/apis/APIs/
s/pci/PCI/
> segment:bdf.
> How the sbdf is mapped to a physical address is under the realm of the pci
s/pci/PCI/
> host controller.
> 
> ARM PCI support in Xen, introduces pci host controller similar to what

s/pci/PCI/
> exists
> in Linux. Each drivers registers callbacks, which are invoked on matching
> the
> compatible property in pci device tree node.
> 
> 2.1    pci_hostbridge and pci_hostbridge_ops
> ----------------------------------------------
> The init function in the pci host driver calls to register hostbridge
> callbacks:
> int pci_hostbridge_register(pci_hostbridge_t *pcihb);
> 
> struct pci_hostbridge_ops {
>     u32 (*pci_conf_read)(struct pci_hostbridge*, u32 bus, u32 devfn,
>                                 u32 reg, u32 bytes);
>     void (*pci_conf_write)(struct pci_hostbridge*, u32 bus, u32 devfn,
>                                 u32 reg, u32 bytes, u32 val);
> };
> 
> struct pci_hostbridge{
>     u32 segno;
>     paddr_t cfg_base;
>     paddr_t cfg_size;
>     struct dt_device_node *dt_node;
>     struct pci_hostbridge_ops ops;
>     struct list_head list;
> };
> 
> A pci conf read function would internally be as follows:
> u32 pcihb_conf_read(u32 seg, u32 bus, u32 devfn,u32 reg, u32 bytes)
> {
>     pci_hostbridge_t *pcihb;
>     list_for_each_entry(pcihb, &pci_hostbridge_list, list)
>     {
>         if(pcihb->segno == seg)
>             return pcihb->ops.pci_conf_read(pcihb, bus, devfn, reg, bytes);
>     }
>     return -1;
> }
> 
> 2.2    PHYSDEVOP_pci_host_bridge_add hypercall
> ----------------------------------------------
> Xen code accesses PCI configuration space based on the sbdf received from
> the
> guest. The order in which the pci device tree node appear may not be the
> same
> order of device enumeration in dom0. Thus there needs to be a mechanism to
> bind
> the segment number assigned by dom0 to the pci host controller. The
> hypercall
> is introduced:

Why can't we extend the existing hypercall to have the segment value?

Oh wait, PHYSDEVOP_manage_pci_add_ext does it already!

And have the hypercall (and Xen) be able to deal with introduction of PCI
devices that are out of sync?

Maybe I am confused but aren't PCI host controllers also 'uploaded' to
Xen?
> 
> #define PHYSDEVOP_pci_host_bridge_add    44
> struct physdev_pci_host_bridge_add {
>     /* IN */
>     uint16_t seg;
>     uint64_t cfg_base;
>     uint64_t cfg_size;
> };
> 
> This hypercall is invoked before dom0 invokes the PHYSDEVOP_pci_device_add
> hypercall. The handler code invokes to update segment number in
> pci_hostbridge:
> 
> int pci_hostbridge_setup(uint32_t segno, uint64_t cfg_base, uint64_t
> cfg_size);
> 
> Subsequent calls to pci_conf_read/write are completed by the
> pci_hostbridge_ops
> of the respective pci_hostbridge.

This design sounds like it is added to deal with having to pre-allocate the
amount host controllers structure before the PCI devices are streaming in?

Instead of having the PCI devices and PCI host controllers be updated
as they are coming in?

Why can't the second option be done?
> 
> 2.3    Helper Functions
> ------------------------
> a) pci_hostbridge_dt_node(pdev->seg);
> Returns the device tree node pointer of the pci node from which the pdev got
> enumerated.
> 
> 3.    SMMU programming
> -------------------
> 
> 3.1.    Additions for PCI Passthrough
> -----------------------------------
> 3.1.1 - add_device in iommu_ops is implemented.
> 
> This is called when PHYSDEVOP_pci_add_device is called from dom0.

Or for PHYSDEVOP_manage_pci_add_ext ?

> 
> .add_device = arm_smmu_add_dom0_dev,
> static int arm_smmu_add_dom0_dev(u8 devfn, struct device *dev)
> {
>         if (dev_is_pci(dev)) {
>             struct pci_dev *pdev = to_pci_dev(dev);
>             return arm_smmu_assign_dev(pdev->domain, devfn, dev);
>         }
>         return -1;
> }
> 

What about removal?

What if the device is removed (hot-unplugged??

> 3.1.2 dev_get_dev_node is modified for pci devices.
> -------------------------------------------------------------------------
> The function is modified to return the dt_node of the pci hostbridge from
> the device tree. This is required as non-dt devices need a way to find on
> which smmu they are attached.
> 
> static struct arm_smmu_device *find_smmu_for_device(struct device *dev)
> {
>         struct device_node *dev_node = dev_get_dev_node(dev);
> ....
> 
> static struct device_node *dev_get_dev_node(struct device *dev)
> {
>         if (dev_is_pci(dev)) {
>                 struct pci_dev *pdev = to_pci_dev(dev);
>                 return pci_hostbridge_dt_node(pdev->seg);
>         }
> ...
> 
> 
> 3.2.    Mapping between streamID - deviceID - pci sbdf - requesterID
> ---------------------------------------------------------------------
> For a simpler case all should be equal to BDF. But there are some devices
> that
> use the wrong requester ID for DMA transactions. Linux kernel has pci quirks
> for these. How the same be implemented in Xen or a diffrent approach has to

s/pci/PCI/
> be
> taken is TODO here.
> Till that time, for basic implementation it is assumed that all are equal to
> BDF.
> 
> 
> 4.    Assignment of PCI device
> ---------------------------------
> 
> 4.1    Dom0
> ------------
> All PCI devices are assigned to dom0 unless hidden by pci-hide bootargs in
> dom0.

'pci-hide' in dom0? Greeping in Documentation/kernel-parameters.txt I don't
see anything.

> Dom0 enumerates the PCI devices. For each device the MMIO space has to be
> mapped
> in the Stage2 translation for dom0. For dom0 xen maps the ranges from dt pci

s/xen/Xen/
s/pci/PCI/
> nodes in stage 2 translation during boot.

> 
> 4.1.1    Stage 2 Mapping of GITS_ITRANSLATER space (64k)
> ------------------------------------------------------
> 
> GITS_ITRANSLATER space (64k) must be programmed in Stage2 translation so
> that SMMU
> can translate MSI(x) from the device using the page table of the domain.
> 
> 4.1.1.1 For Dom0
> -----------------
> GITS_ITRANSLATER address space is mapped 1:1 during dom0 boot. For dom0 this
> mapping is done in the vgic driver. For domU the mapping is done by
> toolstack.
> 
> 4.1.1.2    For DomU
> -----------------
> For domU, while creating the domain, the toolstack reads the IPA from the
> macro GITS_ITRANSLATER_SPACE from xen/include/public/arch-arm.h. The PA is
> read from a new hypercall which returns the PA of the
> GITS_ITRANSLATER_SPACE.
> Subsequently the toolstack sends a hypercall to create a stage 2 mapping.
> 
> Hypercall Details: XEN_DOMCTL_get_itranslater_space
> 
> /* XEN_DOMCTL_get_itranslater_space */
> struct xen_domctl_get_itranslater_space {
>     /* OUT variables. */
>     uint64_aligned_t start_addr;
>     uint64_aligned_t size;
> };
> typedef struct xen_domctl_get_itranslater_space
> xen_domctl_get_itranslater_space;
> DEFINE_XEN_GUEST_HANDLE(xen_domctl_get_itranslater_space;
> 
> 4.2    DomU
> ------------
> There are two ways a device is assigned
> In the flow of pci-attach device, the toolstack will read the pci
> configuration
> space BAR registers. The toolstack has the guest memory map and the
> information
> of the MMIO holes.
> 
> When the first pci device is assigned to domU, toolstack allocates a virtual

s/pci/PCI/

first? What about the other ones?

> BAR region from the MMIO hole area. toolstack then sends domctl

s/sends/invokes/
> xc_domain_memory_mapping to map in stage2 translation.

What if there are more than one device? How will the MMIO and BAR regions
picked? Based on first-come first-serve?

> 
> 4.2.1    Reserved Areas in guest memory space
> --------------------------------------------
> Parts of the guest address space is reserved for mapping assigned pci
> device's

s/pci/PCI/
> BAR regions. Toolstack is responsible for allocating ranges from this area
> and
> creating stage 2 mapping for the domain.
> 
> /* For 32bit */
> GUEST_MMIO_BAR_BASE_32, GUEST_MMIO_BAR_SIZE_32
> 
> /* For 64bit */
> 
> GUEST_MMIO_BAR_BASE_64, GUEST_MMIO_BAR_SIZE_64

Not sure what this means.

> 
> Note: For 64bit systems, PCI BAR regions should be mapped from
> GUEST_MMIO_BAR_BASE_64.
> 
> IPA is allocated from the {GUEST_MMIO_BAR_BASE_64, GUEST_MMIO_BAR_SIZE_64}
> range and PA is the values read from the BAR registers.

Is the BAR size dynamic?

>

What happens when the device is unplugged? And then plugged back in?
How do you choose where in the GUEST_MMIO_.. it is going to be in?
What is the hypercall you are goign to use for unplugging it?

 
> 4.2.2    New entries in xenstore for device BARs

s/xenstore/XenStore/

> -----------------------------------------------
> toolstack also updates the xenstore information for the device
s/toolstack/Toolstack

> (virtualbar:physical bar).This information is read by xenpciback and

s/xenpciback/xen-pciback/

No segment value?

> returned
> to the pcifront driver configuration space reads for BAR.
> 
> Entries created are as follows:
> /local/domain/0/backend/pci/1/0
> vdev-N
>     BDF = ""
>     BAR-0-IPA = ""
>     BAR-0-PA = ""
>     BAR-0-SIZE = ""
>     ...
>     BAR-M-IPA = ""
>     BAR-M-PA = ""
>     BAR-M-SIZE = ""
> 
> Note: Is BAR M SIZE is 0, it is not a valied entry.

s/valied/valid/

s/Is/If/ ?

> 
> 4.2.4    Hypercall Modification for bdf mapping notification to xen

s/xen/Xen/
> -------------------------------------------------------------------
> Guest devfn generation currently done by xen-pciback to be done by toolstack
> only. Guest devfn is generated at the time of domain creation (if pci
> devices
> are specified in cfg file) or using xl pci-attach call.

What is 'devfn generation'? It sounds to me that you are saying that
xen-pciback should follow the XenStore keys and use those.

But the title talks about 'hypercall modifications' - while this
talks about bdf mapping?

> 
> 5. DomU FrontEnd Bus Changes
> -------------------------------------------------------------------------------
> 
> 5.1    Change in Linux PCI ForntEnd - backend driver for MSI/X programming

s/ForntEnd/Frontend/

And I would say 'Linux Xen PCI frontend'.

> ---------------------------------------------------------------------------
> FrontEnd backend communication for MSI is removed in XEN ARM. It would be
> handled by the gic-its driver in guest kernel and trapped in xen.

s/xen/Xen/

s/removed/disabled/

> 
> 5.2    Frontend bus and interrupt parent vITS
> -----------------------------------------------
> On the Pci frontend bus msi-parent gicv3-its is added. As there is a single

s/Pci/PCI/

> virtual its for a domU, as there is only a single virtual pci bus in domU.

its?
ITS perhaps?

We could have multiple segments too in Xen pci-frontend..

> This
> ensures that the config_msi calls are handled by the gicv3 its driver in

s/its/ITS/
s/gicv3/GICV3/

> domU
> kernel and not utilising frontend-backend communication between dom0-domU.

utilising? Utilizing.

> 
> It is required to have a gicv3-its node in guest device tree.

OK, you totally lost me. You said earlier that we do not want to use
Xen pcifrontend for MSI. But here you talk about 'PCI frontend'? So
what is it?

And how do you keep the vITS segment:bus:devfn mapping in sync
with Xen PCI backend? I presume you need to update the vITS in
the hypervisor with the proper segment:bus:devfn values?
Is there an hypercall for that?

> 
> 6.    NUMA domU and vITS
> --------------------------
> a) On NUMA systems domU still have a single its node.

s/its/ITS/

> b) How can xen identify the ITS on which a device is connected.
s/xen/Xen/

> - Using segment number query using api which gives pci host controllers
> device node
s/api/API/
s/pci/PCI/

Which is ? I only see one hypercall mentioned here.

> 
> struct dt_device_node* pci_hostbridge_dt_node(uint32_t segno)

Oh, this is INTERNAL to the hypervisor. Sorry, you lost me a bit
with the domU part so I thought it meant the domU should be able
to query it.
> 
> c) Query the interrupt parent of the pci device node to find out the its.
> 
s/its/ITS/

?
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PCI Passthrough Design - Draft 3
  2015-08-11 20:34 ` Konrad Rzeszutek Wilk
@ 2015-08-12  7:33   ` Manish Jaggi
  2015-08-12 14:24     ` Konrad Rzeszutek Wilk
  2015-08-12  8:56   ` Ian Campbell
  1 sibling, 1 reply; 8+ messages in thread
From: Manish Jaggi @ 2015-08-12  7:33 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Prasun.kapoor@cavium.com, Ian Campbell, Stefano Stabellini,
	Kumar, Vijaya, Julien Grall, Xen Devel


[-- Attachment #1.1: Type: text/plain, Size: 13759 bytes --]

Below are the comments. I will also send a Draft 4 taking account of the comments.


On Wednesday 12 August 2015 02:04 AM, Konrad Rzeszutek Wilk wrote:
> On Tue, Aug 04, 2015 at 05:57:24PM +0530, Manish Jaggi wrote:
>>               -----------------------------
>>              | PCI Pass-through in Xen ARM |
>>               -----------------------------
>>              manish.jaggi@caviumnetworks.com
>>              -------------------------------
>>
>>                       Draft-3
>> ...
>> [snip]
>> 2.2    PHYSDEVOP_pci_host_bridge_add hypercall
>> ----------------------------------------------
>> Xen code accesses PCI configuration space based on the sbdf received from
>> the
>> guest. The order in which the pci device tree node appear may not be the
>> same
>> order of device enumeration in dom0. Thus there needs to be a mechanism to
>> bind
>> the segment number assigned by dom0 to the pci host controller. The
>> hypercall
>> is introduced:
> Why can't we extend the existing hypercall to have the segment value?
>
> Oh wait, PHYSDEVOP_manage_pci_add_ext does it already!
It doesn’t pass the cfg_base and size to xen
>
> And have the hypercall (and Xen) be able to deal with introduction of PCI
> devices that are out of sync?
>
> Maybe I am confused but aren't PCI host controllers also 'uploaded' to
> Xen?
I need to add one more line here to be more descriptive. The binding is
between the segment number (domain number in linux)
used by dom0 and the pci config space address in the pci node of device
tree (reg property).
The hypercall was introduced to cater the fact that the dom0 may process
pci nodes in the device tree in any order.
By this binding it is a clear ABI.
>> #define PHYSDEVOP_pci_host_bridge_add    44
>> struct physdev_pci_host_bridge_add {
>>      /* IN */
>>      uint16_t seg;
>>      uint64_t cfg_base;
>>      uint64_t cfg_size;
>> };
>>
>> This hypercall is invoked before dom0 invokes the PHYSDEVOP_pci_device_add
>> hypercall. The handler code invokes to update segment number in
>> pci_hostbridge:
>>
>> int pci_hostbridge_setup(uint32_t segno, uint64_t cfg_base, uint64_t
>> cfg_size);
>>
>> Subsequent calls to pci_conf_read/write are completed by the
>> pci_hostbridge_ops
>> of the respective pci_hostbridge.
> This design sounds like it is added to deal with having to pre-allocate the
> amount host controllers structure before the PCI devices are streaming in?
>
> Instead of having the PCI devices and PCI host controllers be updated
> as they are coming in?
>
> Why can't the second option be done?
If you are referring to ACPI, we have to add the support.
PCI Host controllers are pci nodes in device tree.
>> 2.3    Helper Functions
>> ------------------------
>> a) pci_hostbridge_dt_node(pdev->seg);
>> Returns the device tree node pointer of the pci node from which the pdev got
>> enumerated.
>>
>> 3.    SMMU programming
>> -------------------
>>
>> 3.1.    Additions for PCI Passthrough
>> -----------------------------------
>> 3.1.1 - add_device in iommu_ops is implemented.
>>
>> This is called when PHYSDEVOP_pci_add_device is called from dom0.
> Or for PHYSDEVOP_manage_pci_add_ext ?
Not sure but it seems logical for this also.
>> .add_device = arm_smmu_add_dom0_dev,
>> static int arm_smmu_add_dom0_dev(u8 devfn, struct device *dev)
>> {
>>          if (dev_is_pci(dev)) {
>>              struct pci_dev *pdev = to_pci_dev(dev);
>>              return arm_smmu_assign_dev(pdev->domain, devfn, dev);
>>          }
>>          return -1;
>> }
>>
> What about removal?
>
> What if the device is removed (hot-unplugged??
.remove_device  = arm_smmu_remove_device(). would be called.
Will update in Draft4

>> 3.1.2 dev_get_dev_node is modified for pci devices.
>> -------------------------------------------------------------------------
>> The function is modified to return the dt_node of the pci hostbridge from
>> the device tree. This is required as non-dt devices need a way to find on
>> which smmu they are attached.
>>
>> static struct arm_smmu_device *find_smmu_for_device(struct device *dev)
>> {
>>          struct device_node *dev_node = dev_get_dev_node(dev);
>> ....
>>
>> static struct device_node *dev_get_dev_node(struct device *dev)
>> {
>>          if (dev_is_pci(dev)) {
>>                  struct pci_dev *pdev = to_pci_dev(dev);
>>                  return pci_hostbridge_dt_node(pdev->seg);
>>          }
>> ...
>>
>>
>> 3.2.    Mapping between streamID - deviceID - pci sbdf - requesterID
>> ---------------------------------------------------------------------
>> For a simpler case all should be equal to BDF. But there are some devices
>> that
>> use the wrong requester ID for DMA transactions. Linux kernel has pci quirks
>> for these. How the same be implemented in Xen or a diffrent approach has to
> s/pci/PCI/
>> be
>> taken is TODO here.
>> Till that time, for basic implementation it is assumed that all are equal to
>> BDF.
>>
>>
>> 4.    Assignment of PCI device
>> ---------------------------------
>>
>> 4.1    Dom0
>> ------------
>> All PCI devices are assigned to dom0 unless hidden by pci-hide bootargs in
>> dom0.
> 'pci-hide' in dom0? Greeping in Documentation/kernel-parameters.txt I don't
> see anything.
%s/pci-hide//pciback/./hide//
>> Dom0 enumerates the PCI devices. For each device the MMIO space has to be
>> mapped
>> in the Stage2 translation for dom0. For dom0 xen maps the ranges from dt pci
> s/xen/Xen/
> s/pci/PCI/
>> nodes in stage 2 translation during boot.
>> 4.1.1    Stage 2 Mapping of GITS_ITRANSLATER space (64k)
>> ------------------------------------------------------
>>
>> GITS_ITRANSLATER space (64k) must be programmed in Stage2 translation so
>> that SMMU
>> can translate MSI(x) from the device using the page table of the domain.
>>
>> 4.1.1.1 For Dom0
>> -----------------
>> GITS_ITRANSLATER address space is mapped 1:1 during dom0 boot. For dom0 this
>> mapping is done in the vgic driver. For domU the mapping is done by
>> toolstack.
>>
>> 4.1.1.2    For DomU
>> -----------------
>> For domU, while creating the domain, the toolstack reads the IPA from the
>> macro GITS_ITRANSLATER_SPACE from xen/include/public/arch-arm.h. The PA is
>> read from a new hypercall which returns the PA of the
>> GITS_ITRANSLATER_SPACE.
>> Subsequently the toolstack sends a hypercall to create a stage 2 mapping.
>>
>> Hypercall Details: XEN_DOMCTL_get_itranslater_space
>>
>> /* XEN_DOMCTL_get_itranslater_space */
>> struct xen_domctl_get_itranslater_space {
>>      /* OUT variables. */
>>      uint64_aligned_t start_addr;
>>      uint64_aligned_t size;
>> };
>> typedef struct xen_domctl_get_itranslater_space
>> xen_domctl_get_itranslater_space;
>> DEFINE_XEN_GUEST_HANDLE(xen_domctl_get_itranslater_space;
>>
>> 4.2    DomU
>> ------------
>> There are two ways a device is assigned
>> In the flow of pci-attach device, the toolstack will read the pci
>> configuration
>> space BAR registers. The toolstack has the guest memory map and the
>> information
>> of the MMIO holes.
>>
>> When the first pci device is assigned to domU, toolstack allocates a virtual
> s/pci/PCI/
>
> first? What about the other ones?
%s/the first/a/
Typo
>
>> BAR region from the MMIO hole area. toolstack then sends domctl
> s/sends/invokes/
>> xc_domain_memory_mapping to map in stage2 translation.
> What if there are more than one device? How will the MMIO and BAR regions
> picked? Based on first-come first-serve?
>> 4.2.1    Reserved Areas in guest memory space
>> --------------------------------------------
>> Parts of the guest address space is reserved for mapping assigned pci
>> device's
> s/pci/PCI/
>> BAR regions. Toolstack is responsible for allocating ranges from this area
>> and
>> creating stage 2 mapping for the domain.
>>
>> /* For 32bit */
>> GUEST_MMIO_BAR_BASE_32, GUEST_MMIO_BAR_SIZE_32
>>
>> /* For 64bit */
>>
>> GUEST_MMIO_BAR_BASE_64, GUEST_MMIO_BAR_SIZE_64
in public/arch-arm.h

/* For 32bit */
#define GUEST_MMIO_BAR_BASE_32 <<>>
#define GUEST_MMIO_BAR_SIZE_32 <<>>

/* For 64bit */

#define GUEST_MMIO_BAR_BASE_64 <<>>
#define GUEST_MMIO_BAR_SIZE_64 <<>>


> Not sure what this means.
Will add more description.
The idea is to map the PCI BAR regions into guest Stage2 translation, so
a pre defined area in guest address
space is reserved for this.
If a BAR region address is 32b BASE_32 area would be used, otherwise 64b.
>> Note: For 64bit systems, PCI BAR regions should be mapped from
>> GUEST_MMIO_BAR_BASE_64.
>>
>> IPA is allocated from the {GUEST_MMIO_BAR_BASE_64, GUEST_MMIO_BAR_SIZE_64}
%s/{GUEST_MMIO_BAR_BASE_64, GUEST_MMIO_BAR_SIZE_64}/

(GUEST_MMIO_BAR_BASE_64 ... GUEST_MMIO_BAR_BASE_64+GUEST_MMIO_BAR_SIZE_64) region

>> range and PA is the values read from the BAR registers.
> Is the BAR size dynamic?
see above
> What happens when the device is unplugged? And then plugged back in?
> How do you choose where in the GUEST_MMIO_.. it is going to be in?
> What is the hypercall you are goign to use for unplugging it?
>
>
>> 4.2.2    New entries in xenstore for device BARs
> s/xenstore/XenStore/
>
>> -----------------------------------------------
>> toolstack also updates the xenstore information for the device
> s/toolstack/Toolstack
>
>> (virtualbar:physical bar).This information is read by xenpciback and
> s/xenpciback/xen-pciback/
>
> No segment value?
Where. Didnt get you
>> returned
>> to the pcifront driver configuration space reads for BAR.
>>
>> Entries created are as follows:
>> /local/domain/0/backend/pci/1/0
>> vdev-N
>>      BDF = ""
>>      BAR-0-IPA = ""
>>      BAR-0-PA = ""
>>      BAR-0-SIZE = ""
>>      ...
>>      BAR-M-IPA = ""
>>      BAR-M-PA = ""
>>      BAR-M-SIZE = ""
>>
>> Note: Is BAR M SIZE is 0, it is not a valied entry.
> s/valied/valid/
>
> s/Is/If/ ?
>
>> 4.2.4    Hypercall Modification for bdf mapping notification to xen
> s/xen/Xen/
>> -------------------------------------------------------------------
>> Guest devfn generation currently done by xen-pciback to be done by toolstack
>> only. Guest devfn is generated at the time of domain creation (if pci
>> devices
>> are specified in cfg file) or using xl pci-attach call.
> What is 'devfn generation'? It sounds to me that you are saying that
> xen-pciback should follow the XenStore keys and use those.
Yes, that is what Ian / Julien suggested. x86 to follow the same as
guest devfn generation should be
in toolstack on not in pciback.
>
> But the title talks about 'hypercall modifications' - while this
> talks about bdf mapping?
the xc_assgin_device will include the guest devfn
>> 5. DomU FrontEnd Bus Changes
>> -------------------------------------------------------------------------------
>>
>> 5.1    Change in Linux PCI ForntEnd - backend driver for MSI/X programming
> s/ForntEnd/Frontend/
>
> And I would say 'Linux Xen PCI frontend'.
>
>> ---------------------------------------------------------------------------
>> FrontEnd backend communication for MSI is removed in XEN ARM. It would be
>> handled by the gic-its driver in guest kernel and trapped in xen.
> s/xen/Xen/
>
> s/removed/disabled/
>
>> 5.2    Frontend bus and interrupt parent vITS
>> -----------------------------------------------
>> On the Pci frontend bus msi-parent gicv3-its is added. As there is a single
> s/Pci/PCI/
>
>> virtual its for a domU, as there is only a single virtual pci bus in domU.
> its?
> ITS perhaps?
>
> We could have multiple segments too in Xen pci-frontend..
>
>> This
>> ensures that the config_msi calls are handled by the gicv3 its driver in
> s/its/ITS/
> s/gicv3/GICV3/
>
>> domU
>> kernel and not utilising frontend-backend communication between dom0-domU.
> utilising? Utilizing.
>
>> It is required to have a gicv3-its node in guest device tree.
> OK, you totally lost me. You said earlier that we do not want to use
> Xen pcifrontend for MSI. But here you talk about 'PCI frontend'? So
> what is it?
PCI Frontend bus is a virtual bus in domU on which assigned devices are
enumerated.
While the PCI Frontend backend communication is limited to config space
access.
>
> And how do you keep the vITS segment:bus:devfn mapping in sync
> with Xen PCI backend? I presume you need to update the vITS in
> the hypervisor with the proper segment:bus:devfn values?
I will add a reference to the vITS design.
see above. assign_device will have a guest devfn.
> Is there an hypercall for that?
we had earlier a hypercall map_sbdf but removed it due to addition of
guest devfn in assign_device call.
>> 6.    NUMA domU and vITS
>> --------------------------
>> a) On NUMA systems domU still have a single its node.
> s/its/ITS/
>
>> b) How can xen identify the ITS on which a device is connected.
> s/xen/Xen/
>
>> - Using segment number query using api which gives pci host controllers
>> device node
> s/api/API/
> s/pci/PCI/
>
> Which is ? I only see one hypercall mentioned here.
>
>> struct dt_device_node* pci_hostbridge_dt_node(uint32_t segno)
> Oh, this is INTERNAL to the hypervisor. Sorry, you lost me a bit
> with the domU part so I thought it meant the domU should be able
> to query it.
I will add a bit more of description in Draft 4 .
>> c) Query the interrupt parent of the pci device node to find out the its.
>>
> s/its/ITS/
>
> ?
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel


[-- Attachment #1.2: Type: text/html, Size: 20282 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PCI Passthrough Design - Draft 3
  2015-08-11 20:34 ` Konrad Rzeszutek Wilk
  2015-08-12  7:33   ` Manish Jaggi
@ 2015-08-12  8:56   ` Ian Campbell
  2015-08-12 14:25     ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 8+ messages in thread
From: Ian Campbell @ 2015-08-12  8:56 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Manish Jaggi
  Cc: Prasun.kapoor@cavium.com, Kumar, Vijaya, Julien Grall,
	Stefano Stabellini, Xen Devel

On Tue, 2015-08-11 at 16:34 -0400, Konrad Rzeszutek Wilk wrote:
> 
> > 2.2    PHYSDEVOP_pci_host_bridge_add hypercall
> > ----------------------------------------------
> > Xen code accesses PCI configuration space based on the sbdf received from
> > the
> > guest. The order in which the pci device tree node appear may not be the
> > same
> > order of device enumeration in dom0. Thus there needs to be a mechanism to
> > bind
> > the segment number assigned by dom0 to the pci host controller. The
> > hypercall
> > is introduced:
> 
> Why can't we extend the existing hypercall to have the segment value?
> 
> Oh wait, PHYSDEVOP_manage_pci_add_ext does it already!
> 
> And have the hypercall (and Xen) be able to deal with introduction of PCI
> devices that are out of sync?
> 
> Maybe I am confused but aren't PCI host controllers also 'uploaded' to
> Xen?

The issue is that Dom0 and Xen need to agree on a common numbering space
for the "PCI domain" AKA "segment", which is really just a software concept
i.e. on ARM Linux just makes them up (on x86 I believe they come from some
firmware table so Xen and Dom0 "agree" to both use that).

Ian.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PCI Passthrough Design - Draft 3
  2015-08-12  7:33   ` Manish Jaggi
@ 2015-08-12 14:24     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 8+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-08-12 14:24 UTC (permalink / raw)
  To: Manish Jaggi
  Cc: Prasun.kapoor@cavium.com, Ian Campbell, Stefano Stabellini,
	Kumar, Vijaya, Julien Grall, Xen Devel

On Wed, Aug 12, 2015 at 01:03:07PM +0530, Manish Jaggi wrote:
> Below are the comments. I will also send a Draft 4 taking account of the comments.
> 
> 
> On Wednesday 12 August 2015 02:04 AM, Konrad Rzeszutek Wilk wrote:
> >On Tue, Aug 04, 2015 at 05:57:24PM +0530, Manish Jaggi wrote:
> >>              -----------------------------
> >>             | PCI Pass-through in Xen ARM |
> >>              -----------------------------
> >>             manish.jaggi@caviumnetworks.com
> >>             -------------------------------
> >>
> >>                      Draft-3
> >>...
> >>[snip]
> >>2.2    PHYSDEVOP_pci_host_bridge_add hypercall
> >>----------------------------------------------
> >>Xen code accesses PCI configuration space based on the sbdf received from
> >>the
> >>guest. The order in which the pci device tree node appear may not be the
> >>same
> >>order of device enumeration in dom0. Thus there needs to be a mechanism to
> >>bind
> >>the segment number assigned by dom0 to the pci host controller. The
> >>hypercall
> >>is introduced:
> >Why can't we extend the existing hypercall to have the segment value?
> >
> >Oh wait, PHYSDEVOP_manage_pci_add_ext does it already!
> It doesn’t pass the cfg_base and size to xen

cfg_base is the BAR? Or the MMIO ?

> >
> >And have the hypercall (and Xen) be able to deal with introduction of PCI
> >devices that are out of sync?
> >
> >Maybe I am confused but aren't PCI host controllers also 'uploaded' to
> >Xen?
> I need to add one more line here to be more descriptive. The binding is
> between the segment number (domain number in linux)
> used by dom0 and the pci config space address in the pci node of device tree
> (reg property).
> The hypercall was introduced to cater the fact that the dom0 may process pci
> nodes in the device tree in any order.

I still don't follow - sorry.

Why would it matter that the PCI nodes are processed in any order?

> By this binding it is a clear ABI.
> >>#define PHYSDEVOP_pci_host_bridge_add    44
> >>struct physdev_pci_host_bridge_add {
> >>     /* IN */
> >>     uint16_t seg;
> >>     uint64_t cfg_base;
> >>     uint64_t cfg_size;
> >>};
> >>
> >>This hypercall is invoked before dom0 invokes the PHYSDEVOP_pci_device_add
> >>hypercall. The handler code invokes to update segment number in
> >>pci_hostbridge:
> >>
> >>int pci_hostbridge_setup(uint32_t segno, uint64_t cfg_base, uint64_t
> >>cfg_size);
> >>
> >>Subsequent calls to pci_conf_read/write are completed by the
> >>pci_hostbridge_ops
> >>of the respective pci_hostbridge.
> >This design sounds like it is added to deal with having to pre-allocate the
> >amount host controllers structure before the PCI devices are streaming in?
> >
> >Instead of having the PCI devices and PCI host controllers be updated
> >as they are coming in?
> >
> >Why can't the second option be done?
> If you are referring to ACPI, we have to add the support.
> PCI Host controllers are pci nodes in device tree.

I think what you are saying is that the PCI devices are being uploaded
during ACPI parsing. The PCI host controllers are done via
device tree.

But what difference does that make? Why can't Xen deal with these
being in any order? Can't it re-organize its internal represenation
of PCI host controllers and PCI devices based on new data?



> >>2.3    Helper Functions
> >>------------------------
> >>a) pci_hostbridge_dt_node(pdev->seg);
> >>Returns the device tree node pointer of the pci node from which the pdev got
> >>enumerated.
> >>
> >>3.    SMMU programming
> >>-------------------
> >>
> >>3.1.    Additions for PCI Passthrough
> >>-----------------------------------
> >>3.1.1 - add_device in iommu_ops is implemented.
> >>
> >>This is called when PHYSDEVOP_pci_add_device is called from dom0.
> >Or for PHYSDEVOP_manage_pci_add_ext ?
> Not sure but it seems logical for this also.
> >>.add_device = arm_smmu_add_dom0_dev,
> >>static int arm_smmu_add_dom0_dev(u8 devfn, struct device *dev)
> >>{
> >>         if (dev_is_pci(dev)) {
> >>             struct pci_dev *pdev = to_pci_dev(dev);
> >>             return arm_smmu_assign_dev(pdev->domain, devfn, dev);
> >>         }
> >>         return -1;
> >>}
> >>
> >What about removal?
> >
> >What if the device is removed (hot-unplugged??
> .remove_device  = arm_smmu_remove_device(). would be called.
> Will update in Draft4

Also please mention what hypercall you would use.

> 
> >>3.1.2 dev_get_dev_node is modified for pci devices.
> >>-------------------------------------------------------------------------
> >>The function is modified to return the dt_node of the pci hostbridge from
> >>the device tree. This is required as non-dt devices need a way to find on
> >>which smmu they are attached.
> >>
> >>static struct arm_smmu_device *find_smmu_for_device(struct device *dev)
> >>{
> >>         struct device_node *dev_node = dev_get_dev_node(dev);
> >>....
> >>
> >>static struct device_node *dev_get_dev_node(struct device *dev)
> >>{
> >>         if (dev_is_pci(dev)) {
> >>                 struct pci_dev *pdev = to_pci_dev(dev);
> >>                 return pci_hostbridge_dt_node(pdev->seg);
> >>         }
> >>...
> >>
> >>
> >>3.2.    Mapping between streamID - deviceID - pci sbdf - requesterID
> >>---------------------------------------------------------------------
> >>For a simpler case all should be equal to BDF. But there are some devices
> >>that
> >>use the wrong requester ID for DMA transactions. Linux kernel has pci quirks
> >>for these. How the same be implemented in Xen or a diffrent approach has to
> >s/pci/PCI/
> >>be
> >>taken is TODO here.
> >>Till that time, for basic implementation it is assumed that all are equal to
> >>BDF.
> >>
> >>
> >>4.    Assignment of PCI device
> >>---------------------------------
> >>
> >>4.1    Dom0
> >>------------
> >>All PCI devices are assigned to dom0 unless hidden by pci-hide bootargs in
> >>dom0.
> >'pci-hide' in dom0? Greeping in Documentation/kernel-parameters.txt I don't
> >see anything.
> %s/pci-hide//pciback/./hide//
> >>Dom0 enumerates the PCI devices. For each device the MMIO space has to be
> >>mapped
> >>in the Stage2 translation for dom0. For dom0 xen maps the ranges from dt pci
> >s/xen/Xen/
> >s/pci/PCI/
> >>nodes in stage 2 translation during boot.
> >>4.1.1    Stage 2 Mapping of GITS_ITRANSLATER space (64k)
> >>------------------------------------------------------
> >>
> >>GITS_ITRANSLATER space (64k) must be programmed in Stage2 translation so
> >>that SMMU
> >>can translate MSI(x) from the device using the page table of the domain.
> >>
> >>4.1.1.1 For Dom0
> >>-----------------
> >>GITS_ITRANSLATER address space is mapped 1:1 during dom0 boot. For dom0 this
> >>mapping is done in the vgic driver. For domU the mapping is done by
> >>toolstack.
> >>
> >>4.1.1.2    For DomU
> >>-----------------
> >>For domU, while creating the domain, the toolstack reads the IPA from the
> >>macro GITS_ITRANSLATER_SPACE from xen/include/public/arch-arm.h. The PA is
> >>read from a new hypercall which returns the PA of the
> >>GITS_ITRANSLATER_SPACE.
> >>Subsequently the toolstack sends a hypercall to create a stage 2 mapping.
> >>
> >>Hypercall Details: XEN_DOMCTL_get_itranslater_space
> >>
> >>/* XEN_DOMCTL_get_itranslater_space */
> >>struct xen_domctl_get_itranslater_space {
> >>     /* OUT variables. */
> >>     uint64_aligned_t start_addr;
> >>     uint64_aligned_t size;
> >>};
> >>typedef struct xen_domctl_get_itranslater_space
> >>xen_domctl_get_itranslater_space;
> >>DEFINE_XEN_GUEST_HANDLE(xen_domctl_get_itranslater_space;
> >>
> >>4.2    DomU
> >>------------
> >>There are two ways a device is assigned
> >>In the flow of pci-attach device, the toolstack will read the pci
> >>configuration
> >>space BAR registers. The toolstack has the guest memory map and the
> >>information
> >>of the MMIO holes.
> >>
> >>When the first pci device is assigned to domU, toolstack allocates a virtual
> >s/pci/PCI/
> >
> >first? What about the other ones?
> %s/the first/a/
> Typo
> >
> >>BAR region from the MMIO hole area. toolstack then sends domctl
> >s/sends/invokes/
> >>xc_domain_memory_mapping to map in stage2 translation.
> >What if there are more than one device? How will the MMIO and BAR regions
> >picked? Based on first-come first-serve?
> >>4.2.1    Reserved Areas in guest memory space
> >>--------------------------------------------
> >>Parts of the guest address space is reserved for mapping assigned pci
> >>device's
> >s/pci/PCI/
> >>BAR regions. Toolstack is responsible for allocating ranges from this area
> >>and
> >>creating stage 2 mapping for the domain.
> >>
> >>/* For 32bit */
> >>GUEST_MMIO_BAR_BASE_32, GUEST_MMIO_BAR_SIZE_32
> >>
> >>/* For 64bit */
> >>
> >>GUEST_MMIO_BAR_BASE_64, GUEST_MMIO_BAR_SIZE_64
> in public/arch-arm.h
> 
> /* For 32bit */
> #define GUEST_MMIO_BAR_BASE_32 <<>>
> #define GUEST_MMIO_BAR_SIZE_32 <<>>
> 
> /* For 64bit */
> 
> #define GUEST_MMIO_BAR_BASE_64 <<>>
> #define GUEST_MMIO_BAR_SIZE_64 <<>>
> 
> 
> >Not sure what this means.
> Will add more description.
> The idea is to map the PCI BAR regions into guest Stage2 translation, so a
> pre defined area in guest address
> space is reserved for this.
> If a BAR region address is 32b BASE_32 area would be used, otherwise 64b.

What if you have both? 32-bit and 64-bit?

> >>Note: For 64bit systems, PCI BAR regions should be mapped from
> >>GUEST_MMIO_BAR_BASE_64.
> >>
> >>IPA is allocated from the {GUEST_MMIO_BAR_BASE_64, GUEST_MMIO_BAR_SIZE_64}
> %s/{GUEST_MMIO_BAR_BASE_64, GUEST_MMIO_BAR_SIZE_64}/
> 
> (GUEST_MMIO_BAR_BASE_64 ... GUEST_MMIO_BAR_BASE_64+GUEST_MMIO_BAR_SIZE_64) region
> 
> >>range and PA is the values read from the BAR registers.
> >Is the BAR size dynamic?
> see above
> >What happens when the device is unplugged? And then plugged back in?
> >How do you choose where in the GUEST_MMIO_.. it is going to be in?
> >What is the hypercall you are goign to use for unplugging it?
> >
> >>4.2.2    New entries in xenstore for device BARs
> >s/xenstore/XenStore/
> >
> >>-----------------------------------------------
> >>toolstack also updates the xenstore information for the device
> >s/toolstack/Toolstack
> >
> >>(virtualbar:physical bar).This information is read by xenpciback and
> >s/xenpciback/xen-pciback/
> >
> >No segment value?
> Where. Didnt get you

The Xen PCI back can also deal with segment values (domain).

> >>returned
> >>to the pcifront driver configuration space reads for BAR.
> >>
> >>Entries created are as follows:
> >>/local/domain/0/backend/pci/1/0
> >>vdev-N
> >>     BDF = ""
> >>     BAR-0-IPA = ""
> >>     BAR-0-PA = ""
> >>     BAR-0-SIZE = ""
> >>     ...
> >>     BAR-M-IPA = ""
> >>     BAR-M-PA = ""
> >>     BAR-M-SIZE = ""
> >>
> >>Note: Is BAR M SIZE is 0, it is not a valied entry.
> >s/valied/valid/
> >
> >s/Is/If/ ?
> >
> >>4.2.4    Hypercall Modification for bdf mapping notification to xen
> >s/xen/Xen/
> >>-------------------------------------------------------------------
> >>Guest devfn generation currently done by xen-pciback to be done by toolstack
> >>only. Guest devfn is generated at the time of domain creation (if pci
> >>devices
> >>are specified in cfg file) or using xl pci-attach call.
> >What is 'devfn generation'? It sounds to me that you are saying that
> >xen-pciback should follow the XenStore keys and use those.
> Yes, that is what Ian / Julien suggested. x86 to follow the same as guest
> devfn generation should be
> in toolstack on not in pciback.
> >
> >But the title talks about 'hypercall modifications' - while this
> >talks about bdf mapping?
> the xc_assgin_device will include the guest devfn
> >>5. DomU FrontEnd Bus Changes
> >>-------------------------------------------------------------------------------
> >>
> >>5.1    Change in Linux PCI ForntEnd - backend driver for MSI/X programming
> >s/ForntEnd/Frontend/
> >
> >And I would say 'Linux Xen PCI frontend'.
> >
> >>---------------------------------------------------------------------------
> >>FrontEnd backend communication for MSI is removed in XEN ARM. It would be
> >>handled by the gic-its driver in guest kernel and trapped in xen.
> >s/xen/Xen/
> >
> >s/removed/disabled/
> >
> >>5.2    Frontend bus and interrupt parent vITS
> >>-----------------------------------------------
> >>On the Pci frontend bus msi-parent gicv3-its is added. As there is a single
> >s/Pci/PCI/
> >
> >>virtual its for a domU, as there is only a single virtual pci bus in domU.
> >its?
> >ITS perhaps?
> >
> >We could have multiple segments too in Xen pci-frontend..
> >
> >>This
> >>ensures that the config_msi calls are handled by the gicv3 its driver in
> >s/its/ITS/
> >s/gicv3/GICV3/
> >
> >>domU
> >>kernel and not utilising frontend-backend communication between dom0-domU.
> >utilising? Utilizing.
> >
> >>It is required to have a gicv3-its node in guest device tree.
> >OK, you totally lost me. You said earlier that we do not want to use
> >Xen pcifrontend for MSI. But here you talk about 'PCI frontend'? So
> >what is it?
> PCI Frontend bus is a virtual bus in domU on which assigned devices are
> enumerated.
> While the PCI Frontend backend communication is limited to config space
> access.

It can also do MSI and MSI-X.
> >
> >And how do you keep the vITS segment:bus:devfn mapping in sync
> >with Xen PCI backend? I presume you need to update the vITS in
> >the hypervisor with the proper segment:bus:devfn values?
> I will add a reference to the vITS design.
> see above. assign_device will have a guest devfn.
> >Is there an hypercall for that?
> we had earlier a hypercall map_sbdf but removed it due to addition of guest
> devfn in assign_device call.


However I don't see in xen_domctl_assign_device anything mentioning
the guest sbdf? What if you want the sbdfs in the guest to start at
a different segment or bus than what they do in on the physical machine?

> >>6.    NUMA domU and vITS
> >>--------------------------
> >>a) On NUMA systems domU still have a single its node.
> >s/its/ITS/
> >
> >>b) How can xen identify the ITS on which a device is connected.
> >s/xen/Xen/
> >
> >>- Using segment number query using api which gives pci host controllers
> >>device node
> >s/api/API/
> >s/pci/PCI/
> >
> >Which is ? I only see one hypercall mentioned here.
> >
> >>struct dt_device_node* pci_hostbridge_dt_node(uint32_t segno)
> >Oh, this is INTERNAL to the hypervisor. Sorry, you lost me a bit
> >with the domU part so I thought it meant the domU should be able
> >to query it.
> I will add a bit more of description in Draft 4 .
> >>c) Query the interrupt parent of the pci device node to find out the its.
> >>
> >s/its/ITS/
> >
> >?
> >>_______________________________________________
> >>Xen-devel mailing list
> >>Xen-devel@lists.xen.org
> >>http://lists.xen.org/xen-devel
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PCI Passthrough Design - Draft 3
  2015-08-12  8:56   ` Ian Campbell
@ 2015-08-12 14:25     ` Konrad Rzeszutek Wilk
  2015-08-12 14:42       ` Ian Campbell
  0 siblings, 1 reply; 8+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-08-12 14:25 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Prasun.kapoor@cavium.com, Stefano Stabellini, Manish Jaggi,
	Kumar, Vijaya, Julien Grall, Xen Devel

On Wed, Aug 12, 2015 at 09:56:41AM +0100, Ian Campbell wrote:
> On Tue, 2015-08-11 at 16:34 -0400, Konrad Rzeszutek Wilk wrote:
> > 
> > > 2.2    PHYSDEVOP_pci_host_bridge_add hypercall
> > > ----------------------------------------------
> > > Xen code accesses PCI configuration space based on the sbdf received from
> > > the
> > > guest. The order in which the pci device tree node appear may not be the
> > > same
> > > order of device enumeration in dom0. Thus there needs to be a mechanism to
> > > bind
> > > the segment number assigned by dom0 to the pci host controller. The
> > > hypercall
> > > is introduced:
> > 
> > Why can't we extend the existing hypercall to have the segment value?
> > 
> > Oh wait, PHYSDEVOP_manage_pci_add_ext does it already!
> > 
> > And have the hypercall (and Xen) be able to deal with introduction of PCI
> > devices that are out of sync?
> > 
> > Maybe I am confused but aren't PCI host controllers also 'uploaded' to
> > Xen?
> 
> The issue is that Dom0 and Xen need to agree on a common numbering space
> for the "PCI domain" AKA "segment", which is really just a software concept
> i.e. on ARM Linux just makes them up (on x86 I believe they come from some
> firmware table so Xen and Dom0 "agree" to both use that).

Doesn't the PCI domain or segments have an notion of which PCI devices are
underneath it? Or vice-verse - PCI devices know what their segment (or domain) is?

> 
> Ian.
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PCI Passthrough Design - Draft 3
  2015-08-12 14:25     ` Konrad Rzeszutek Wilk
@ 2015-08-12 14:42       ` Ian Campbell
  2015-08-12 14:55         ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 8+ messages in thread
From: Ian Campbell @ 2015-08-12 14:42 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Prasun.kapoor@cavium.com, Manish Jaggi, Kumar, Vijaya,
	Julien Grall, Xen Devel, Stefano Stabellini

On Wed, 2015-08-12 at 10:25 -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Aug 12, 2015 at 09:56:41AM +0100, Ian Campbell wrote:
> > On Tue, 2015-08-11 at 16:34 -0400, Konrad Rzeszutek Wilk wrote:
> > > 
> > > > 2.2    PHYSDEVOP_pci_host_bridge_add hypercall
> > > > ----------------------------------------------
> > > > Xen code accesses PCI configuration space based on the sbdf 
> > > > received from
> > > > the
> > > > guest. The order in which the pci device tree node appear may not 
> > > > be the
> > > > same
> > > > order of device enumeration in dom0. Thus there needs to be a 
> > > > mechanism to
> > > > bind
> > > > the segment number assigned by dom0 to the pci host controller. The
> > > > hypercall
> > > > is introduced:
> > > 
> > > Why can't we extend the existing hypercall to have the segment value?
> > > 
> > > Oh wait, PHYSDEVOP_manage_pci_add_ext does it already!
> > > 
> > > And have the hypercall (and Xen) be able to deal with introduction of 
> > > PCI
> > > devices that are out of sync?
> > > 
> > > Maybe I am confused but aren't PCI host controllers also 'uploaded' 
> > > to
> > > Xen?
> > 
> > The issue is that Dom0 and Xen need to agree on a common numbering 
> > space
> > for the "PCI domain" AKA "segment", which is really just a software 
> > concept
> > i.e. on ARM Linux just makes them up (on x86 I believe they come from 
> > some
> > firmware table so Xen and Dom0 "agree" to both use that).
> 
> Doesn't the PCI domain or segments have an notion of which PCI devices are
> underneath it? Or vice-verse - PCI devices know what their segment (or domain) is?

The PCI domain or segment does contain a device, but it is a purely OS
level concept, it has no real meaning in the hardware. So both Xen and
Linux are free to fabricate whatever segment naming space they want, but
obviously they need to agree, hence this hypercall lets Linux tell Xen what
segment it has associated with a given PCI controller.

Perhaps an example will help.

Imagine we have two PCI host bridges, one with CFG space at 0xA0000000 and
a second with CFG space at 0xB0000000.

Xen discovers these and assigns segment 0=0xA0000000 and segment
1=0xB0000000.

Dom0 discovers them too but assigns segment 1=0xA0000000 and segment
0=0xB0000000 (i.e. the other way).

Now Dom0 makes a hypercall referring to a device as (segment=1,BDF), i.e.
the device with BDF behind the root bridge at 0xA0000000. (Perhaps this is
the PHYSDEVOP_manage_pci_add_ext call).

But Xen thinks it is talking about the device with BDF behind the root
bridge at 0xB0000000 because Dom0 and Xen do not agree on what the segments
mean. Now Xen will use the wrong device ID in the IOMMU (since that is
associated with the host bridge), or poke the wrong configuration space, or
whatever.

Or maybe Xen chose 42=0xB0000000 and 43=0xA0000000 so when Dom0 starts
talking about segment=0 and =1 it has no idea what is going on.

PHYSDEVOP_pci_host_bridge_add is intended to allow Dom0 to say "Segment 0
is the host bridge at 0xB0000000" and "Segment 1 is the host bridge at
0xA0000000". With this there is no confusion between Xen and Dom0 because
Xen isn't picking a segment ID, it is being told what it is by Dom0 which
has done the picking.

Does that help?

Ian.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PCI Passthrough Design - Draft 3
  2015-08-12 14:42       ` Ian Campbell
@ 2015-08-12 14:55         ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 8+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-08-12 14:55 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Prasun.kapoor@cavium.com, Manish Jaggi, Kumar, Vijaya,
	Julien Grall, Xen Devel, Stefano Stabellini

On Wed, Aug 12, 2015 at 03:42:10PM +0100, Ian Campbell wrote:
> On Wed, 2015-08-12 at 10:25 -0400, Konrad Rzeszutek Wilk wrote:
> > On Wed, Aug 12, 2015 at 09:56:41AM +0100, Ian Campbell wrote:
> > > On Tue, 2015-08-11 at 16:34 -0400, Konrad Rzeszutek Wilk wrote:
> > > > 
> > > > > 2.2    PHYSDEVOP_pci_host_bridge_add hypercall
> > > > > ----------------------------------------------
> > > > > Xen code accesses PCI configuration space based on the sbdf 
> > > > > received from
> > > > > the
> > > > > guest. The order in which the pci device tree node appear may not 
> > > > > be the
> > > > > same
> > > > > order of device enumeration in dom0. Thus there needs to be a 
> > > > > mechanism to
> > > > > bind
> > > > > the segment number assigned by dom0 to the pci host controller. The
> > > > > hypercall
> > > > > is introduced:
> > > > 
> > > > Why can't we extend the existing hypercall to have the segment value?
> > > > 
> > > > Oh wait, PHYSDEVOP_manage_pci_add_ext does it already!
> > > > 
> > > > And have the hypercall (and Xen) be able to deal with introduction of 
> > > > PCI
> > > > devices that are out of sync?
> > > > 
> > > > Maybe I am confused but aren't PCI host controllers also 'uploaded' 
> > > > to
> > > > Xen?
> > > 
> > > The issue is that Dom0 and Xen need to agree on a common numbering 
> > > space
> > > for the "PCI domain" AKA "segment", which is really just a software 
> > > concept
> > > i.e. on ARM Linux just makes them up (on x86 I believe they come from 
> > > some
> > > firmware table so Xen and Dom0 "agree" to both use that).
> > 
> > Doesn't the PCI domain or segments have an notion of which PCI devices are
> > underneath it? Or vice-verse - PCI devices know what their segment (or domain) is?
> 
> The PCI domain or segment does contain a device, but it is a purely OS
> level concept, it has no real meaning in the hardware. So both Xen and
> Linux are free to fabricate whatever segment naming space they want, but
> obviously they need to agree, hence this hypercall lets Linux tell Xen what
> segment it has associated with a given PCI controller.
> 
> Perhaps an example will help.
> 
> Imagine we have two PCI host bridges, one with CFG space at 0xA0000000 and
> a second with CFG space at 0xB0000000.
> 
> Xen discovers these and assigns segment 0=0xA0000000 and segment
> 1=0xB0000000.
> 
> Dom0 discovers them too but assigns segment 1=0xA0000000 and segment
> 0=0xB0000000 (i.e. the other way).
> 
> Now Dom0 makes a hypercall referring to a device as (segment=1,BDF), i.e.
> the device with BDF behind the root bridge at 0xA0000000. (Perhaps this is
> the PHYSDEVOP_manage_pci_add_ext call).
> 
> But Xen thinks it is talking about the device with BDF behind the root
> bridge at 0xB0000000 because Dom0 and Xen do not agree on what the segments
> mean. Now Xen will use the wrong device ID in the IOMMU (since that is
> associated with the host bridge), or poke the wrong configuration space, or
> whatever.
> 
> Or maybe Xen chose 42=0xB0000000 and 43=0xA0000000 so when Dom0 starts
> talking about segment=0 and =1 it has no idea what is going on.
> 
> PHYSDEVOP_pci_host_bridge_add is intended to allow Dom0 to say "Segment 0
> is the host bridge at 0xB0000000" and "Segment 1 is the host bridge at
> 0xA0000000". With this there is no confusion between Xen and Dom0 because
> Xen isn't picking a segment ID, it is being told what it is by Dom0 which
> has done the picking.
> 
> Does that help?

Yes thank you!

Manish, please include this explanation in the design as it will surely
help other folks in understanding it.
> 
> Ian.
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-08-12 14:55 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-04 12:27 PCI Passthrough Design - Draft 3 Manish Jaggi
2015-08-11 20:34 ` Konrad Rzeszutek Wilk
2015-08-12  7:33   ` Manish Jaggi
2015-08-12 14:24     ` Konrad Rzeszutek Wilk
2015-08-12  8:56   ` Ian Campbell
2015-08-12 14:25     ` Konrad Rzeszutek Wilk
2015-08-12 14:42       ` Ian Campbell
2015-08-12 14:55         ` Konrad Rzeszutek Wilk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).