* pci-mvebu driver on km_kirkwood
@ 2013-07-10 16:15 Gerlando Falauto
2013-07-10 16:57 ` Thomas Petazzoni
2013-07-31 8:03 ` Thomas Petazzoni
0 siblings, 2 replies; 55+ messages in thread
From: Gerlando Falauto @ 2013-07-10 16:15 UTC (permalink / raw)
To: linux-arm-kernel
Hi Thomas,
I am trying to use the pci-mvebu driver on one of our km_kirkwood
boards. The board is based on Marvell's 98dx4122, which should
essentially be 6281 compatible.
So I copied the following block from kirkwood-6281.dtsi into
kirkwood-98dx4122.dtsi:
pcie-controller {
compatible = "marvell,kirkwood-pcie";
status = "disabled";
device_type = "pci";
#address-cells = <3>;
#size-cells = <2>;
bus-range = <0x00 0xff>;
ranges = <0x82000000 0 0x00040000 0x00040000 0 0x00002000 /* Port
0.0 registers */
0x82000000 0 0xe0000000 0xe0000000 0 0x08000000 /*
non-prefetchable memory */
0x81000000 0 0 0xe8000000 0 0x00100000>; /*
downstream I/O */
pcie at 1,0 {
device_type = "pci";
assigned-addresses = <0x82000800 0 0x00040000 0 0x2000>;
reg = <0x0800 0 0 0 0>;
#address-cells = <3>;
#size-cells = <2>;
#interrupt-cells = <1>;
ranges;
interrupt-map-mask = <0 0 0 0>;
interrupt-map = <0 0 0 0 &intc 9>;
marvell,pcie-port = <0>;
marvell,pcie-lane = <0>;
clocks = <&gate_clk 2>;
status = "disabled";
};
};
And added the following block to kirkwood-km_kirkwood.dts:
pcie-controller {
status = "okay";
pcie at 1,0 {
status = "okay";
};
};
The code I took from jcooper's repo:
http://git.infradead.org/users/jcooper/linux.git
I took the tag
dt-3.11-6
on top of which I merged:
mvebu/pcie
mvebu/pcie_bridge
mvebu/pcie_kirkwood
Only with the latest merge did I get some conflict on
kirkwood.dtsi:
<<<<<<< HEAD
ranges = <0x00000000 0xf1000000 0x0100000
0xf4000000 0xf4000000 0x0000400
=======
ranges = <0x00000000 0xf1000000 0x4000000
0xe0000000 0xe0000000 0x8100000
>>>>>>> jcooper/mvebu/pcie_kirkwood
tried both variants, (almost) the same result:
<<<<<<< HEAD
Kirkwood: MV88F6281-A0, TCLK=200000000.
Feroceon L2: Cache support initialised, in WT override mode.
mvebu-pcie pcie-controller.1: PCIe0.0: link up
mvebu-pcie pcie-controller.1: PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [io 0x1000-0xfffff]
pci_bus 0000:00: root bus resource [mem 0xffffffff-0x07fffffe]
pci_bus 0000:00: root bus resource [bus 00-ff]
pci 0000:00:01.0: [11ab:7846] type 01 class 0x060400
PCI: bus0: Fast back to back transfers disabled
pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pci 0000:01:00.0: [10ee:0008] type 00 class 0x050000
pci 0000:01:00.0: reg 10: [mem 0x00000000-0x00000fff]
pci 0000:01:00.0: reg 14: [mem 0x00000000-0x07ffffff]
pci 0000:01:00.0: reg 18: [mem 0x00000000-0x00000fff]
pci 0000:01:00.0: reg 1c: [mem 0x00000000-0x007fffff]
pci 0000:01:00.0: reg 20: [mem 0x00000000-0x00001fff]
pci 0000:01:00.0: reg 24: [mem 0x00000000-0x00000fff]
pci 0000:01:00.0: supports D1 D2
pci 0000:01:00.0: PME# supported from D0 D1 D2 D3hot
PCI: bus1: Fast back to back transfers disabled
pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
pci 0000:00:01.0: BAR 8: can't assign mem (size 0xc000000)
pci 0000:01:00.0: BAR 1: can't assign mem (size 0x8000000)
pci 0000:01:00.0: BAR 3: can't assign mem (size 0x800000)
pci 0000:01:00.0: BAR 4: can't assign mem (size 0x2000)
pci 0000:01:00.0: BAR 0: can't assign mem (size 0x1000)
pci 0000:01:00.0: BAR 2: can't assign mem (size 0x1000)
pci 0000:01:00.0: BAR 5: can't assign mem (size 0x1000)
pci 0000:00:01.0: PCI bridge to [bus 01]
=======
Kirkwood: MV88F6281-A0, TCLK=200000000.
Feroceon L2: Cache support initialised, in WT override mode.
mvebu-pcie pcie-controller.2: PCIe0.0: link up
mvebu-pcie pcie-controller.2: PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [io 0x1000-0xfffff]
pci_bus 0000:00: root bus resource [mem 0xe0000000-0xe7ffffff]
pci_bus 0000:00: root bus resource [bus 00-ff]
pci 0000:00:01.0: [11ab:7846] type 01 class 0x060400
PCI: bus0: Fast back to back transfers disabled
pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pci 0000:01:00.0: [10ee:0008] type 00 class 0x050000
pci 0000:01:00.0: reg 10: [mem 0x00000000-0x00000fff]
pci 0000:01:00.0: reg 14: [mem 0x00000000-0x07ffffff]
pci 0000:01:00.0: reg 18: [mem 0x00000000-0x00000fff]
pci 0000:01:00.0: reg 1c: [mem 0x00000000-0x007fffff]
pci 0000:01:00.0: reg 20: [mem 0x00000000-0x00001fff]
pci 0000:01:00.0: reg 24: [mem 0x00000000-0x00000fff]
pci 0000:01:00.0: supports D1 D2
pci 0000:01:00.0: PME# supported from D0 D1 D2 D3hot
PCI: bus1: Fast back to back transfers disabled
pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
pci 0000:00:01.0: BAR 8: can't assign mem (size 0xc000000)
pci 0000:01:00.0: BAR 1: can't assign mem (size 0x8000000)
pci 0000:01:00.0: BAR 3: can't assign mem (size 0x800000)
pci 0000:01:00.0: BAR 4: can't assign mem (size 0x2000)
pci 0000:01:00.0: BAR 0: can't assign mem (size 0x1000)
pci 0000:01:00.0: BAR 2: can't assign mem (size 0x1000)
pci 0000:01:00.0: BAR 5: can't assign mem (size 0x1000)
pci 0000:00:01.0: PCI bridge to [bus 01]
>>>>>>> jcooper/mvebu/pcie_kirkwood
Compared to a working configuration, here I see a spurious
pci 0000:00:01.0: BAR 8: can't assign mem (size 0xc000000)
which I don't understand, plus all others which are failing.
It's weird how with the second configuration:
mvebu-pcie pcie-controller.2: PCIe0.0: link up
mvebu-pcie pcie-controller.2: PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [io 0x1000-0xfffff]
pci_bus 0000:00: root bus resource [mem 0xe0000000-0xe7ffffff]
I get a second mvebu-pcie pcie-controller.2, although with a more
reasonable memory range.
Needless to say, I did try several other combinations of your recent and
not-so-recent patches (from May 23rd onwards), with essentially the same
results.
It *must* be something trivial. Any hints?
Thanks a lot!
Gerlando
^ permalink raw reply [flat|nested] 55+ messages in thread* pci-mvebu driver on km_kirkwood 2013-07-10 16:15 pci-mvebu driver on km_kirkwood Gerlando Falauto @ 2013-07-10 16:57 ` Thomas Petazzoni 2013-07-10 17:31 ` Gerlando Falauto 2013-07-31 8:03 ` Thomas Petazzoni 1 sibling, 1 reply; 55+ messages in thread From: Thomas Petazzoni @ 2013-07-10 16:57 UTC (permalink / raw) To: linux-arm-kernel Gerlando, On Wed, 10 Jul 2013 18:15:32 +0200, Gerlando Falauto wrote: > I am trying to use the pci-mvebu driver on one of our km_kirkwood > boards. The board is based on Marvell's 98dx4122, which should > essentially be 6281 compatible. Was this platform working with the old PCIe driver in mach-kirkwood/ ? > The code I took from jcooper's repo: > > http://git.infradead.org/users/jcooper/linux.git > > I took the tag > > dt-3.11-6 > > on top of which I merged: > > mvebu/pcie > mvebu/pcie_bridge > mvebu/pcie_kirkwood Could you instead use the latest master from Linus tree? That would avoid merge conflicts, and ensure you have all the necessary pieces. > Only with the latest merge did I get some conflict on > kirkwood.dtsi: > > <<<<<<< HEAD > ranges = <0x00000000 0xf1000000 0x0100000 > 0xf4000000 0xf4000000 0x0000400 > ======= > ranges = <0x00000000 0xf1000000 0x4000000 > 0xe0000000 0xe0000000 0x8100000 The first cannot work, because it lacks the range for the PCIe. The second should work. The correct merge should be: ranges = <0x00000000 0xf1000000 0x0100000 0xf4000000 0xf4000000 0x0000400 0xe0000000 0xe0000000 0x8100000>; i.e, we've added the PCIe range (last line) and splitted the SRAM into its own range (or something like that, don't remember the details, but Ezequiel can confirm). > <<<<<<< HEAD > Kirkwood: MV88F6281-A0, TCLK=200000000. > Feroceon L2: Cache support initialised, in WT override mode. > mvebu-pcie pcie-controller.1: PCIe0.0: link up > mvebu-pcie pcie-controller.1: PCI host bridge to bus 0000:00 > pci_bus 0000:00: root bus resource [io 0x1000-0xfffff] > pci_bus 0000:00: root bus resource [mem 0xffffffff-0x07fffffe] > pci_bus 0000:00: root bus resource [bus 00-ff] > pci 0000:00:01.0: [11ab:7846] type 01 class 0x060400 > PCI: bus0: Fast back to back transfers disabled > pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring > pci 0000:01:00.0: [10ee:0008] type 00 class 0x050000 > pci 0000:01:00.0: reg 10: [mem 0x00000000-0x00000fff] > pci 0000:01:00.0: reg 14: [mem 0x00000000-0x07ffffff] > pci 0000:01:00.0: reg 18: [mem 0x00000000-0x00000fff] > pci 0000:01:00.0: reg 1c: [mem 0x00000000-0x007fffff] > pci 0000:01:00.0: reg 20: [mem 0x00000000-0x00001fff] > pci 0000:01:00.0: reg 24: [mem 0x00000000-0x00000fff] > pci 0000:01:00.0: supports D1 D2 > pci 0000:01:00.0: PME# supported from D0 D1 D2 D3hot > PCI: bus1: Fast back to back transfers disabled > pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01 > pci 0000:00:01.0: BAR 8: can't assign mem (size 0xc000000) > pci 0000:01:00.0: BAR 1: can't assign mem (size 0x8000000) > pci 0000:01:00.0: BAR 3: can't assign mem (size 0x800000) > pci 0000:01:00.0: BAR 4: can't assign mem (size 0x2000) > pci 0000:01:00.0: BAR 0: can't assign mem (size 0x1000) > pci 0000:01:00.0: BAR 2: can't assign mem (size 0x1000) > pci 0000:01:00.0: BAR 5: can't assign mem (size 0x1000) > pci 0000:00:01.0: PCI bridge to [bus 01] The first test you did cannot work at all, due to the incorrect ranges. If you have the PCIe working with the old driver, can you pastebin somewhere the complete boot log, as well as the output of "lspci -vvv" ? > Compared to a working configuration, here I see a spurious > > pci 0000:00:01.0: BAR 8: can't assign mem (size 0xc000000) > > which I don't understand, plus all others which are failing. > > It's weird how with the second configuration: > > mvebu-pcie pcie-controller.2: PCIe0.0: link up > mvebu-pcie pcie-controller.2: PCI host bridge to bus 0000:00 > pci_bus 0000:00: root bus resource [io 0x1000-0xfffff] > pci_bus 0000:00: root bus resource [mem 0xe0000000-0xe7ffffff] > > I get a second mvebu-pcie pcie-controller.2, although with a more > reasonable memory range. A second mvebu-pcie controller? Is your Device Tree correct? I'm not really sure to understand what's going on here. Can you post the complete boot log, and test with the latest Linus git tree, where all the PCIe support got merged? Thanks! Thomas -- Thomas Petazzoni, Free Electrons Kernel, drivers, real-time and embedded Linux development, consulting, training and support. http://free-electrons.com ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2013-07-10 16:57 ` Thomas Petazzoni @ 2013-07-10 17:31 ` Gerlando Falauto 2013-07-10 19:56 ` Gerlando Falauto ` (2 more replies) 0 siblings, 3 replies; 55+ messages in thread From: Gerlando Falauto @ 2013-07-10 17:31 UTC (permalink / raw) To: linux-arm-kernel Hi Thomas, first of all thanks for your quick feedback. On 07/10/2013 06:57 PM, Thomas Petazzoni wrote: > Gerlando, > > On Wed, 10 Jul 2013 18:15:32 +0200, Gerlando Falauto wrote: > >> I am trying to use the pci-mvebu driver on one of our km_kirkwood >> boards. The board is based on Marvell's 98dx4122, which should >> essentially be 6281 compatible. > > Was this platform working with the old PCIe driver in mach-kirkwood/ ? Yes, though we had to trick it a little bit to get both the internal switch and this PCIe device working: - this PCIe device requires to map 256M of memory as opposed to just 128 - we need a virtual PCIe device to connect to the internal switch, which must be mapped at 0xf4000000 (normally used for the NAND which must then move to 0xff000000) But apart from the huge BAR (0x07ffffff aka 128M) for the PCIe device not being mappable, the rest was normally working just fine even without the above changes (i.e. the other BARs were mapped fine). > >> The code I took from jcooper's repo: >> >> http://git.infradead.org/users/jcooper/linux.git >> >> I took the tag >> >> dt-3.11-6 >> >> on top of which I merged: >> >> mvebu/pcie >> mvebu/pcie_bridge >> mvebu/pcie_kirkwood > > Could you instead use the latest master from Linus tree? That would > avoid merge conflicts, and ensure you have all the necessary pieces. Oops, I had no idea all this had gotten merged already. Quite honestly, I have no idea how to track this kind of stuff (i.e. did a given patch ever got merged and where?) but that's a different topic. >> Only with the latest merge did I get some conflict on >> kirkwood.dtsi: >> >> <<<<<<< HEAD >> ranges = <0x00000000 0xf1000000 0x0100000 >> 0xf4000000 0xf4000000 0x0000400 >> ======= >> ranges = <0x00000000 0xf1000000 0x4000000 >> 0xe0000000 0xe0000000 0x8100000 > > The first cannot work, because it lacks the range for the PCIe. The > second should work. The correct merge should be: > > ranges = <0x00000000 0xf1000000 0x0100000 > 0xf4000000 0xf4000000 0x0000400 > 0xe0000000 0xe0000000 0x8100000>; > > i.e, we've added the PCIe range (last line) and splitted the SRAM into > its own range (or something like that, don't remember the details, but > Ezequiel can confirm). OK that's a good starting point. >> <<<<<<< HEAD >> Kirkwood: MV88F6281-A0, TCLK=200000000. >> Feroceon L2: Cache support initialised, in WT override mode. >> mvebu-pcie pcie-controller.1: PCIe0.0: link up >> mvebu-pcie pcie-controller.1: PCI host bridge to bus 0000:00 >> pci_bus 0000:00: root bus resource [io 0x1000-0xfffff] >> pci_bus 0000:00: root bus resource [mem 0xffffffff-0x07fffffe] >> pci_bus 0000:00: root bus resource [bus 00-ff] >> pci 0000:00:01.0: [11ab:7846] type 01 class 0x060400 >> PCI: bus0: Fast back to back transfers disabled >> pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring >> pci 0000:01:00.0: [10ee:0008] type 00 class 0x050000 >> pci 0000:01:00.0: reg 10: [mem 0x00000000-0x00000fff] >> pci 0000:01:00.0: reg 14: [mem 0x00000000-0x07ffffff] >> pci 0000:01:00.0: reg 18: [mem 0x00000000-0x00000fff] >> pci 0000:01:00.0: reg 1c: [mem 0x00000000-0x007fffff] >> pci 0000:01:00.0: reg 20: [mem 0x00000000-0x00001fff] >> pci 0000:01:00.0: reg 24: [mem 0x00000000-0x00000fff] >> pci 0000:01:00.0: supports D1 D2 >> pci 0000:01:00.0: PME# supported from D0 D1 D2 D3hot >> PCI: bus1: Fast back to back transfers disabled >> pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01 >> pci 0000:00:01.0: BAR 8: can't assign mem (size 0xc000000) >> pci 0000:01:00.0: BAR 1: can't assign mem (size 0x8000000) >> pci 0000:01:00.0: BAR 3: can't assign mem (size 0x800000) >> pci 0000:01:00.0: BAR 4: can't assign mem (size 0x2000) >> pci 0000:01:00.0: BAR 0: can't assign mem (size 0x1000) >> pci 0000:01:00.0: BAR 2: can't assign mem (size 0x1000) >> pci 0000:01:00.0: BAR 5: can't assign mem (size 0x1000) >> pci 0000:00:01.0: PCI bridge to [bus 01] > > The first test you did cannot work at all, due to the incorrect ranges. > > If you have the PCIe working with the old driver, can you pastebin > somewhere the complete boot log, as well as the output of "lspci > -vvv" ? OK, I will. In the meantime, what I got to establish is that by manually disabling the two biggest resources >> pci 0000:00:01.0: BAR 8: can't assign mem (size 0xc000000) >> pci 0000:01:00.0: BAR 1: can't assign mem (size 0x8000000) i.e. something like: -281,6 +282,10 @@ static void assign_requested_resources_sorted(struct list_head *head, list_for_each_entry(dev_res, head, list) { res = dev_res->res; idx = res - &dev_res->dev->resource[0]; + + if (resource_size(res) < 0x8000000) + { + at least I can get the following ones to be assigned correctly: mvebu-pcie pcie-controller.2: PCIe0.0: link up mvebu-pcie pcie-controller.2: PCI host bridge to bus 0000:00 pci_bus 0000:00: root bus resource [io 0x1000-0xfffff] pci_bus 0000:00: root bus resource [mem 0xe0000000-0xe7ffffff] pci_bus 0000:00: root bus resource [bus 00-ff] pci 0000:00:01.0: [11ab:7846] type 01 class 0x060400 PCI: bus0: Fast back to back transfers disabled pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring pci 0000:01:00.0: [10ee:0008] type 00 class 0x050000 pci 0000:01:00.0: reg 10: [mem 0x00000000-0x00000fff] pci 0000:01:00.0: reg 14: [mem 0x00000000-0x07ffffff] pci 0000:01:00.0: reg 18: [mem 0x00000000-0x00000fff] pci 0000:01:00.0: reg 1c: [mem 0x00000000-0x007fffff] pci 0000:01:00.0: reg 20: [mem 0x00000000-0x00001fff] pci 0000:01:00.0: reg 24: [mem 0x00000000-0x00000fff] pci 0000:01:00.0: supports D1 D2 pci 0000:01:00.0: PME# supported from D0 D1 D2 D3hot PCI: bus1: Fast back to back transfers disabled pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01 pci 0000:01:00.0: BAR 3: assigned [mem 0x04000000-0x047fffff] pci 0000:01:00.0: BAR 3: set to [mem 0x04000000-0x047fffff] (PCI address [0x4000000-0x47fffff]) pci 0000:01:00.0: BAR 4: assigned [mem 0x04800000-0x04801fff] pci 0000:01:00.0: BAR 4: set to [mem 0x04800000-0x04801fff] (PCI address [0x4800000-0x4801fff]) pci 0000:01:00.0: BAR 0: assigned [mem 0x04802000-0x04802fff] pci 0000:01:00.0: BAR 0: set to [mem 0x04802000-0x04802fff] (PCI address [0x4802000-0x4802fff]) pci 0000:01:00.0: BAR 2: assigned [mem 0x04803000-0x04803fff] pci 0000:01:00.0: BAR 2: set to [mem 0x04803000-0x04803fff] (PCI address [0x4803000-0x4803fff]) pci 0000:01:00.0: BAR 5: assigned [mem 0x04804000-0x04804fff] pci 0000:01:00.0: BAR 5: set to [mem 0x04804000-0x04804fff] (PCI address [0x4804000-0x4804fff]) pci 0000:00:01.0: PCI bridge to [bus 01] pci 0000:00:01.0: bridge window [mem 0x04000000-0x0fffffff] PCI: enabling device 0000:00:01.0 (0140 -> 0143) Which is a bit weird because in the past these huge assignments would just fail but the following ones would work just fine. > >> Compared to a working configuration, here I see a spurious >> I assume the >> pci 0000:00:01.0: BAR 8: can't assign mem (size 0xc000000) >> comes from the switch but I have no idea how to find it out. I'm quite sure this is the first time I'm seeing BAR 8. >> which I don't understand, plus all others which are failing. >> >> It's weird how with the second configuration: >> >> mvebu-pcie pcie-controller.2: PCIe0.0: link up >> mvebu-pcie pcie-controller.2: PCI host bridge to bus 0000:00 >> pci_bus 0000:00: root bus resource [io 0x1000-0xfffff] >> pci_bus 0000:00: root bus resource [mem 0xe0000000-0xe7ffffff] >> >> I get a second mvebu-pcie pcie-controller.2, although with a more >> reasonable memory range. > > A second mvebu-pcie controller? Is your Device Tree correct? Whoops, my fault. There's just one pcie-controller.2, it's just that with the correct ranges the nand.1 node gets created as well, and these (platform?) devices are numbered sequentially, regardless of their type. > > I'm not really sure to understand what's going on here. Can you post > the complete boot log, and test with the latest Linus git tree, where > all the PCIe support got merged? I sure will. Thanks for the heads-up. Thanks a lot! Gerlando > > Thanks! > > Thomas > ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2013-07-10 17:31 ` Gerlando Falauto @ 2013-07-10 19:56 ` Gerlando Falauto 2013-07-11 7:03 ` Valentin Longchamp 2013-07-11 14:32 ` Thomas Petazzoni 2 siblings, 0 replies; 55+ messages in thread From: Gerlando Falauto @ 2013-07-10 19:56 UTC (permalink / raw) To: linux-arm-kernel Hi Thomas, I guess I understand now.... >>> pci 0000:00:01.0: BAR 8: can't assign mem (size 0xc000000) this is the BAR for the bridge, your virtual PCI host device, whose size is calculated dynamically depending on what is found on the underlying hardware. So compared to the legacy driver which was relying on the real hardware BARs (where I could get /some/ BARs to work, namely the biggest one which was taking up the whole 128M), here it's an all-or-nothing approach. As a matter of fact, everything works fine if I explicitly disable the biggest BAR with a trick: mvebu-pcie pcie-controller.1: PCIe0.0: link up mvebu-pcie pcie-controller.1: PCI host bridge to bus 0000:00 pci_bus 0000:00: root bus resource [io 0x1000-0xfffff] pci_bus 0000:00: root bus resource [mem 0xe0000000-0xe7ffffff] pci_bus 0000:00: root bus resource [bus 00-ff] pci 0000:00:01.0: [11ab:7846] type 01 class 0x060400 PCI: bus0: Fast back to back transfers disabled pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring pci 0000:01:00.0: [10ee:0008] type 00 class 0x050000 pci 0000:01:00.0: reg 10: [mem 0x00000000-0x00000fff] pci 0000:01:00.0: reg 14: [mem 0x00000000-0x07ffffff] pci 0000:01:00.0: reg 18: [mem 0x00000000-0x00000fff] pci 0000:01:00.0: reg 1c: [mem 0x00000000-0x007fffff] pci 0000:01:00.0: reg 20: [mem 0x00000000-0x00001fff] pci 0000:01:00.0: reg 24: [mem 0x00000000-0x00000fff] pci 0000:01:00.0: supports D1 D2 pci 0000:01:00.0: PME# supported from D0 D1 D2 D3hot PCI: bus1: Fast back to back transfers disabled pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01 pci 0000:01:00.0: disabling BAR 1: [mem 0x00000000-0x07ffffff] TOO BIG (alignment 0x1000) pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xe0bfffff] pci 0000:01:00.0: BAR 3: assigned [mem 0xe0000000-0xe07fffff] pci 0000:01:00.0: BAR 4: assigned [mem 0xe0800000-0xe0801fff] pci 0000:01:00.0: BAR 0: assigned [mem 0xe0802000-0xe0802fff] pci 0000:01:00.0: BAR 2: assigned [mem 0xe0803000-0xe0803fff] pci 0000:01:00.0: BAR 5: assigned [mem 0xe0804000-0xe0804fff] pci 0000:00:01.0: PCI bridge to [bus 01] pci 0000:00:01.0: bridge window [mem 0xe0000000-0xe0bfffff] So this seems to be the final solution (without the hack above): --- a/arch/arm/boot/dts/kirkwood-98dx4122.dtsi +++ b/arch/arm/boot/dts/kirkwood-98dx4122.dtsi bus-range = <0x00 0xff>; ranges = <0x82000000 0 0x00040000 0x00040000 0 0x00002000 /* Port 0.0 registers */ - 0x82000000 0 0xe0000000 0xe0000000 0 0x08000000 /* non-prefetchable memory */ + 0x82000000 0 0xe0000000 0xe0000000 0 0x0c000000 /* non-prefetchable memory */ 0x81000000 0 0 0xe8000000 0 0x00100000>; /* downstream I/O */ pcie at 1,0 { Does the above make sense? Am I setting up overlapping ranges this way? Could I make it 0x10000000 so to have 256M? Thanks a lot! Gerlando On 07/10/2013 07:31 PM, Gerlando Falauto wrote: > Hi Thomas, > > first of all thanks for your quick feedback. > > On 07/10/2013 06:57 PM, Thomas Petazzoni wrote: >> Gerlando, >> >> On Wed, 10 Jul 2013 18:15:32 +0200, Gerlando Falauto wrote: >> >>> I am trying to use the pci-mvebu driver on one of our km_kirkwood >>> boards. The board is based on Marvell's 98dx4122, which should >>> essentially be 6281 compatible. >> >> Was this platform working with the old PCIe driver in mach-kirkwood/ ? > > Yes, though we had to trick it a little bit to get both the internal > switch and this PCIe device working: > > - this PCIe device requires to map 256M of memory as opposed to just 128 > - we need a virtual PCIe device to connect to the internal switch, which > must be mapped at 0xf4000000 (normally used for the NAND which must then > move to 0xff000000) > > But apart from the huge BAR (0x07ffffff aka 128M) for the PCIe device > not being mappable, the rest was normally working just fine even without > the above changes (i.e. the other BARs were mapped fine). > >> >>> The code I took from jcooper's repo: >>> >>> http://git.infradead.org/users/jcooper/linux.git >>> >>> I took the tag >>> >>> dt-3.11-6 >>> >>> on top of which I merged: >>> >>> mvebu/pcie >>> mvebu/pcie_bridge >>> mvebu/pcie_kirkwood >> >> Could you instead use the latest master from Linus tree? That would >> avoid merge conflicts, and ensure you have all the necessary pieces. > > Oops, I had no idea all this had gotten merged already. > Quite honestly, I have no idea how to track this kind of stuff (i.e. did > a given patch ever got merged and where?) but that's a different topic. > >>> Only with the latest merge did I get some conflict on >>> kirkwood.dtsi: >>> >>> <<<<<<< HEAD >>> ranges = <0x00000000 0xf1000000 0x0100000 >>> 0xf4000000 0xf4000000 0x0000400 >>> ======= >>> ranges = <0x00000000 0xf1000000 0x4000000 >>> 0xe0000000 0xe0000000 0x8100000 >> >> The first cannot work, because it lacks the range for the PCIe. The >> second should work. The correct merge should be: >> >> ranges = <0x00000000 0xf1000000 0x0100000 >> 0xf4000000 0xf4000000 0x0000400 >> 0xe0000000 0xe0000000 0x8100000>; >> >> i.e, we've added the PCIe range (last line) and splitted the SRAM into >> its own range (or something like that, don't remember the details, but >> Ezequiel can confirm). > > OK that's a good starting point. > >>> <<<<<<< HEAD >>> Kirkwood: MV88F6281-A0, TCLK=200000000. >>> Feroceon L2: Cache support initialised, in WT override mode. >>> mvebu-pcie pcie-controller.1: PCIe0.0: link up >>> mvebu-pcie pcie-controller.1: PCI host bridge to bus 0000:00 >>> pci_bus 0000:00: root bus resource [io 0x1000-0xfffff] >>> pci_bus 0000:00: root bus resource [mem 0xffffffff-0x07fffffe] >>> pci_bus 0000:00: root bus resource [bus 00-ff] >>> pci 0000:00:01.0: [11ab:7846] type 01 class 0x060400 >>> PCI: bus0: Fast back to back transfers disabled >>> pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), >>> reconfiguring >>> pci 0000:01:00.0: [10ee:0008] type 00 class 0x050000 >>> pci 0000:01:00.0: reg 10: [mem 0x00000000-0x00000fff] >>> pci 0000:01:00.0: reg 14: [mem 0x00000000-0x07ffffff] >>> pci 0000:01:00.0: reg 18: [mem 0x00000000-0x00000fff] >>> pci 0000:01:00.0: reg 1c: [mem 0x00000000-0x007fffff] >>> pci 0000:01:00.0: reg 20: [mem 0x00000000-0x00001fff] >>> pci 0000:01:00.0: reg 24: [mem 0x00000000-0x00000fff] >>> pci 0000:01:00.0: supports D1 D2 >>> pci 0000:01:00.0: PME# supported from D0 D1 D2 D3hot >>> PCI: bus1: Fast back to back transfers disabled >>> pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01 >>> pci 0000:00:01.0: BAR 8: can't assign mem (size 0xc000000) >>> pci 0000:01:00.0: BAR 1: can't assign mem (size 0x8000000) >>> pci 0000:01:00.0: BAR 3: can't assign mem (size 0x800000) >>> pci 0000:01:00.0: BAR 4: can't assign mem (size 0x2000) >>> pci 0000:01:00.0: BAR 0: can't assign mem (size 0x1000) >>> pci 0000:01:00.0: BAR 2: can't assign mem (size 0x1000) >>> pci 0000:01:00.0: BAR 5: can't assign mem (size 0x1000) >>> pci 0000:00:01.0: PCI bridge to [bus 01] >> >> The first test you did cannot work at all, due to the incorrect ranges. >> >> If you have the PCIe working with the old driver, can you pastebin >> somewhere the complete boot log, as well as the output of "lspci >> -vvv" ? > > OK, I will. > In the meantime, what I got to establish is that by manually disabling > the two biggest resources > > >> pci 0000:00:01.0: BAR 8: can't assign mem (size 0xc000000) > >> pci 0000:01:00.0: BAR 1: can't assign mem (size 0x8000000) > > i.e. something like: > > -281,6 +282,10 @@ static void assign_requested_resources_sorted(struct > list_head *head, > list_for_each_entry(dev_res, head, list) { > res = dev_res->res; > idx = res - &dev_res->dev->resource[0]; > + > + if (resource_size(res) < 0x8000000) > + { > + > > at least I can get the following ones to be assigned correctly: > > mvebu-pcie pcie-controller.2: PCIe0.0: link up > mvebu-pcie pcie-controller.2: PCI host bridge to bus 0000:00 > pci_bus 0000:00: root bus resource [io 0x1000-0xfffff] > pci_bus 0000:00: root bus resource [mem 0xe0000000-0xe7ffffff] > pci_bus 0000:00: root bus resource [bus 00-ff] > pci 0000:00:01.0: [11ab:7846] type 01 class 0x060400 > PCI: bus0: Fast back to back transfers disabled > pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring > pci 0000:01:00.0: [10ee:0008] type 00 class 0x050000 > pci 0000:01:00.0: reg 10: [mem 0x00000000-0x00000fff] > pci 0000:01:00.0: reg 14: [mem 0x00000000-0x07ffffff] > pci 0000:01:00.0: reg 18: [mem 0x00000000-0x00000fff] > pci 0000:01:00.0: reg 1c: [mem 0x00000000-0x007fffff] > pci 0000:01:00.0: reg 20: [mem 0x00000000-0x00001fff] > pci 0000:01:00.0: reg 24: [mem 0x00000000-0x00000fff] > pci 0000:01:00.0: supports D1 D2 > pci 0000:01:00.0: PME# supported from D0 D1 D2 D3hot > PCI: bus1: Fast back to back transfers disabled > pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01 > pci 0000:01:00.0: BAR 3: assigned [mem 0x04000000-0x047fffff] > pci 0000:01:00.0: BAR 3: set to [mem 0x04000000-0x047fffff] (PCI address > [0x4000000-0x47fffff]) > pci 0000:01:00.0: BAR 4: assigned [mem 0x04800000-0x04801fff] > pci 0000:01:00.0: BAR 4: set to [mem 0x04800000-0x04801fff] (PCI address > [0x4800000-0x4801fff]) > pci 0000:01:00.0: BAR 0: assigned [mem 0x04802000-0x04802fff] > pci 0000:01:00.0: BAR 0: set to [mem 0x04802000-0x04802fff] (PCI address > [0x4802000-0x4802fff]) > pci 0000:01:00.0: BAR 2: assigned [mem 0x04803000-0x04803fff] > pci 0000:01:00.0: BAR 2: set to [mem 0x04803000-0x04803fff] (PCI address > [0x4803000-0x4803fff]) > pci 0000:01:00.0: BAR 5: assigned [mem 0x04804000-0x04804fff] > pci 0000:01:00.0: BAR 5: set to [mem 0x04804000-0x04804fff] (PCI address > [0x4804000-0x4804fff]) > pci 0000:00:01.0: PCI bridge to [bus 01] > pci 0000:00:01.0: bridge window [mem 0x04000000-0x0fffffff] > PCI: enabling device 0000:00:01.0 (0140 -> 0143) > > > Which is a bit weird because in the past these huge assignments would > just fail but the following ones would work just fine. > >> >>> Compared to a working configuration, here I see a spurious >>> > > I assume the > >>> pci 0000:00:01.0: BAR 8: can't assign mem (size 0xc000000) >>> > > comes from the switch but I have no idea how to find it out. > I'm quite sure this is the first time I'm seeing BAR 8. > >>> which I don't understand, plus all others which are failing. >>> >>> It's weird how with the second configuration: >>> >>> mvebu-pcie pcie-controller.2: PCIe0.0: link up >>> mvebu-pcie pcie-controller.2: PCI host bridge to bus 0000:00 >>> pci_bus 0000:00: root bus resource [io 0x1000-0xfffff] >>> pci_bus 0000:00: root bus resource [mem 0xe0000000-0xe7ffffff] >>> >>> I get a second mvebu-pcie pcie-controller.2, although with a more >>> reasonable memory range. >> >> A second mvebu-pcie controller? Is your Device Tree correct? > > Whoops, my fault. There's just one pcie-controller.2, it's just that > with the correct ranges the nand.1 node gets created as well, and these > (platform?) devices are numbered sequentially, regardless of their type. > >> >> I'm not really sure to understand what's going on here. Can you post >> the complete boot log, and test with the latest Linus git tree, where >> all the PCIe support got merged? > > I sure will. > Thanks for the heads-up. > > Thanks a lot! > Gerlando > >> >> Thanks! >> >> Thomas >> > ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2013-07-10 17:31 ` Gerlando Falauto 2013-07-10 19:56 ` Gerlando Falauto @ 2013-07-11 7:03 ` Valentin Longchamp 2013-07-12 8:59 ` Thomas Petazzoni 2013-07-11 14:32 ` Thomas Petazzoni 2 siblings, 1 reply; 55+ messages in thread From: Valentin Longchamp @ 2013-07-11 7:03 UTC (permalink / raw) To: linux-arm-kernel Hi Gerlando, I just want to give further information about the hardware and memory mapping we have with these kirkwood variants in our system. As I told you yesterday, I think it's very important to have a clear view of the actual memory map when setting these ranges. On 07/10/2013 07:31 PM, Falauto, Gerlando wrote: > Hi Thomas, > > first of all thanks for your quick feedback. > > On 07/10/2013 06:57 PM, Thomas Petazzoni wrote: >> Gerlando, >> >> On Wed, 10 Jul 2013 18:15:32 +0200, Gerlando Falauto wrote: >> >>> I am trying to use the pci-mvebu driver on one of our km_kirkwood >>> boards. The board is based on Marvell's 98dx4122, which should >>> essentially be 6281 compatible. >> >> Was this platform working with the old PCIe driver in mach-kirkwood/ ? > > Yes, though we had to trick it a little bit to get both the internal > switch and this PCIe device working: > > - this PCIe device requires to map 256M of memory as opposed to just 128 On the board you are currently using for your tests, it is the case (the whole map is not used ... things are scattered over the 256 MB, with one 128MB BAR). If you want to get rid of this problem, we have another board that does not require these (256 MB .. and I have one of them on my desk that you can use for your tests). On the kirkwood variant we use, there is only _one_ real PCIe controller. In order to map 256MB for the MEM space of this controller without having further memory map conflicts, what was done was to not enable the CPU windows for the usual 2nd PCIe controller and set a wider CPU window for the MEM space of the only PCIe controller. > - we need a virtual PCIe device to connect to the internal switch, which > must be mapped at 0xf4000000 (normally used for the NAND which must then > move to 0xff000000) > I think you can forget this for the time being. This is called (also in Marvell's doc) a virtual PCIe controller, but apart that it is then memory mapped, this has nothing to do with PCIe (although I don't know what is done internally in the SoC). It is a problem because the physical address chosen for this CPU window conflicts with the one that is used for the NAND controller in the current kirkwood Linux memory map. But this is another topic and it should not play any role in this PCIe topic. I hope this helps also the other better understand what's going on here. Valentin ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2013-07-11 7:03 ` Valentin Longchamp @ 2013-07-12 8:59 ` Thomas Petazzoni 2013-07-15 15:46 ` Valentin Longchamp 0 siblings, 1 reply; 55+ messages in thread From: Thomas Petazzoni @ 2013-07-12 8:59 UTC (permalink / raw) To: linux-arm-kernel Dear Valentin Longchamp, On Thu, 11 Jul 2013 09:03:59 +0200, Valentin Longchamp wrote: > On the board you are currently using for your tests, it is the case (the whole > map is not used ... things are scattered over the 256 MB, with one 128MB BAR). > If you want to get rid of this problem, we have another board that does not > require these (256 MB .. and I have one of them on my desk that you can use for > your tests). > > On the kirkwood variant we use, there is only _one_ real PCIe controller. In > order to map 256MB for the MEM space of this controller without having further > memory map conflicts, what was done was to not enable the CPU windows for the > usual 2nd PCIe controller and set a wider CPU window for the MEM space of the > only PCIe controller. Such tricks are no longer needed with the new PCIe driver. Instead of assigning address ranges per PCIe controller, the new PCIe driver (together with the mvebu-mbus driver) allows to specify one global range of addresses for PCIe mem, and the PCIe driver will automatically figure out which devices are available on which PCIe bus, how much PCIe mem then need, and create the MBus windows accordingly. However of course, as I pointed out in an earlier e-mail, this global range must be suitably sized to allow the mapping of all PCIe devices. By default, we've made it 128 MB large, but in this case, it looks like you would need 256 MB. But there's no need to disable the second PCIe controller anymore. If there's nothing connected to it, not PCIe window will be created for it. > > - we need a virtual PCIe device to connect to the internal switch, which > > must be mapped at 0xf4000000 (normally used for the NAND which must then > > move to 0xff000000) > > > > I think you can forget this for the time being. This is called (also in > Marvell's doc) a virtual PCIe controller, but apart that it is then memory > mapped, this has nothing to do with PCIe (although I don't know what is done > internally in the SoC). It is a problem because the physical address chosen for > this CPU window conflicts with the one that is used for the NAND controller in > the current kirkwood Linux memory map. But this is another topic and it should > not play any role in this PCIe topic. I'm not sure to follow this story of a virtual PCIe controller sitting at 0xf4000000. Can you give a few more details? Thomas -- Thomas Petazzoni, Free Electrons Kernel, drivers, real-time and embedded Linux development, consulting, training and support. http://free-electrons.com ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2013-07-12 8:59 ` Thomas Petazzoni @ 2013-07-15 15:46 ` Valentin Longchamp 2013-07-15 19:51 ` Thomas Petazzoni 0 siblings, 1 reply; 55+ messages in thread From: Valentin Longchamp @ 2013-07-15 15:46 UTC (permalink / raw) To: linux-arm-kernel Hello Thomas, On 07/12/2013 10:59 AM, Thomas Petazzoni wrote: > Dear Valentin Longchamp, > > On Thu, 11 Jul 2013 09:03:59 +0200, Valentin Longchamp wrote: > >> On the board you are currently using for your tests, it is the case (the whole >> map is not used ... things are scattered over the 256 MB, with one 128MB BAR). >> If you want to get rid of this problem, we have another board that does not >> require these (256 MB .. and I have one of them on my desk that you can use for >> your tests). >> >> On the kirkwood variant we use, there is only _one_ real PCIe controller. In >> order to map 256MB for the MEM space of this controller without having further >> memory map conflicts, what was done was to not enable the CPU windows for the >> usual 2nd PCIe controller and set a wider CPU window for the MEM space of the >> only PCIe controller. > > Such tricks are no longer needed with the new PCIe driver. Instead of > assigning address ranges per PCIe controller, the new PCIe driver > (together with the mvebu-mbus driver) allows to specify one global > range of addresses for PCIe mem, and the PCIe driver will automatically > figure out which devices are available on which PCIe bus, how much PCIe > mem then need, and create the MBus windows accordingly. > > However of course, as I pointed out in an earlier e-mail, this global > range must be suitably sized to allow the mapping of all PCIe devices. > By default, we've made it 128 MB large, but in this case, it looks like > you would need 256 MB. > > But there's no need to disable the second PCIe controller anymore. If > there's nothing connected to it, not PCIe window will be created for it. Thank you for this precision. That's a nice feature of the new PCIe driver. > >>> - we need a virtual PCIe device to connect to the internal switch, which >>> must be mapped at 0xf4000000 (normally used for the NAND which must then >>> move to 0xff000000) >>> >> >> I think you can forget this for the time being. This is called (also in >> Marvell's doc) a virtual PCIe controller, but apart that it is then memory >> mapped, this has nothing to do with PCIe (although I don't know what is done >> internally in the SoC). It is a problem because the physical address chosen for >> this CPU window conflicts with the one that is used for the NAND controller in >> the current kirkwood Linux memory map. But this is another topic and it should >> not play any role in this PCIe topic. > > I'm not sure to follow this story of a virtual PCIe controller sitting > at 0xf4000000. Can you give a few more details? > I'm not sure I can give all the details here as the documentation that we have for this is subject to an NDA. To keep it short, in the kirkwood SoC we use, there is an Ethernet Switch that is accessed by the kirkwood through an internal virtual PCIe controller. The switch management SW has some hard expectations about the physical address for this "PCIe" memory mapped window which conflicts with the ones defined in the current device trees. Valentin ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2013-07-15 15:46 ` Valentin Longchamp @ 2013-07-15 19:51 ` Thomas Petazzoni 0 siblings, 0 replies; 55+ messages in thread From: Thomas Petazzoni @ 2013-07-15 19:51 UTC (permalink / raw) To: linux-arm-kernel Dear Valentin Longchamp, On Mon, 15 Jul 2013 17:46:12 +0200, Valentin Longchamp wrote: > > I'm not sure to follow this story of a virtual PCIe controller sitting > > at 0xf4000000. Can you give a few more details? > > > > I'm not sure I can give all the details here as the documentation that we have > for this is subject to an NDA. To keep it short, in the kirkwood SoC we use, > there is an Ethernet Switch that is accessed by the kirkwood through an internal > virtual PCIe controller. The switch management SW has some hard expectations > about the physical address for this "PCIe" memory mapped window which conflicts > with the ones defined in the current device trees. Ok. Note that the addresses chosen in the Device Tree can easily be changed on a per-SoC or per-board basis, if needed. Best regards, Thomas -- Thomas Petazzoni, Free Electrons Kernel, drivers, real-time and embedded Linux development, consulting, training and support. http://free-electrons.com ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2013-07-10 17:31 ` Gerlando Falauto 2013-07-10 19:56 ` Gerlando Falauto 2013-07-11 7:03 ` Valentin Longchamp @ 2013-07-11 14:32 ` Thomas Petazzoni 2014-02-18 17:29 ` Gerlando Falauto 2 siblings, 1 reply; 55+ messages in thread From: Thomas Petazzoni @ 2013-07-11 14:32 UTC (permalink / raw) To: linux-arm-kernel Dear Gerlando Falauto, On Wed, 10 Jul 2013 19:31:56 +0200, Gerlando Falauto wrote: > Yes, though we had to trick it a little bit to get both the internal > switch and this PCIe device working: > > - this PCIe device requires to map 256M of memory as opposed to just 128 > - we need a virtual PCIe device to connect to the internal switch, which > must be mapped at 0xf4000000 (normally used for the NAND which must then > move to 0xff000000) Aah, if you need 256 MB, then you need to adjust the ranges, because by default there is only 128 MB for PCIe memory. So, you would need something like: So, within the pcie-controller node, you should do something like: ranges = <0x82000000 0 0x00040000 0x00040000 0 0x00002000 /* Port 0.0 registers */ 0x82000000 0 0xe0000000 0xe0000000 0 0x10000000 /* non-prefetchable memory */ 0x81000000 0 0 0xf0000000 0 0x00100000>; /* downstream I/O */ and in the ranges property at the ocp { } level, you should do something like: ranges = <0x00000000 0xf1000000 0x0100000 0xe0000000 0xe0000000 0x10100000 /* PCIE */ 0xf4000000 0xf4000000 0x0000400 0xf5000000 0xf5000000 0x0000400>; Basically, before the change the configuration was: * 128 MB of PCIe memory at 0xe0000000 * 1 MB of PCIe I/O at 0xe8000000 After the change, you have: * 256 MB of PCIe memory at 0xe0000000 * 1 MB of PCIe I/O at 0xf0000000 Best regards, Thomas -- Thomas Petazzoni, Free Electrons Kernel, drivers, real-time and embedded Linux development, consulting, training and support. http://free-electrons.com ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2013-07-11 14:32 ` Thomas Petazzoni @ 2014-02-18 17:29 ` Gerlando Falauto 2014-02-18 20:27 ` Thomas Petazzoni 0 siblings, 1 reply; 55+ messages in thread From: Gerlando Falauto @ 2014-02-18 17:29 UTC (permalink / raw) To: linux-arm-kernel Hi Thomas, sorry for bringing up an old topic again... On 07/11/2013 04:32 PM, Thomas Petazzoni wrote: > Dear Gerlando Falauto, > > On Wed, 10 Jul 2013 19:31:56 +0200, Gerlando Falauto wrote: > >> Yes, though we had to trick it a little bit to get both the internal >> switch and this PCIe device working: >> >> - this PCIe device requires to map 256M of memory as opposed to just 128 >> - we need a virtual PCIe device to connect to the internal switch, which >> must be mapped at 0xf4000000 (normally used for the NAND which must then >> move to 0xff000000) > > Aah, if you need 256 MB, then you need to adjust the ranges, because > by default there is only 128 MB for PCIe memory. So, you would need > something like: > > So, within the pcie-controller node, you should do something like: > > ranges = <0x82000000 0 0x00040000 0x00040000 0 0x00002000 /* Port 0.0 registers */ > 0x82000000 0 0xe0000000 0xe0000000 0 0x10000000 /* non-prefetchable memory */ > 0x81000000 0 0 0xf0000000 0 0x00100000>; /* downstream I/O */ > > and in the ranges property at the ocp { } level, you should do something like: > > > ranges = <0x00000000 0xf1000000 0x0100000 > 0xe0000000 0xe0000000 0x10100000 /* PCIE */ > 0xf4000000 0xf4000000 0x0000400 > 0xf5000000 0xf5000000 0x0000400>; > > Basically, before the change the configuration was: > > * 128 MB of PCIe memory at 0xe0000000 > * 1 MB of PCIe I/O at 0xe8000000 > > After the change, you have: > > * 256 MB of PCIe memory at 0xe0000000 > * 1 MB of PCIe I/O at 0xf0000000 > I tried these settings (a long time ago) and everything seemed to work fine. Except, we now have a different problem. Essentially, this device requires 128MB for a given BAR to provide a PCI-to-localbus bridge. (another BAR provides the configuration space to configure chip select regions and so on). Apparently, only the first 64MB of this BAR seem to work correctly with the new driver. As soon as you exceed that, reads (always?) return 0. Other BARs (which are then of course assigned a higher region) seem to work just fine, so it looks like a per-BAR limitation. This was not a problem with a 3.0 kernel. Do you have any idea what could be wrong here? I'm currently using a 3.10 kernel, where your patches for the pci-mvebu driver were forcibly brought in (without full support for the MBUS description at device tree level though). Thank you very much in advance, Gerlando P.S. Here's the relevant portion of the startup log so to give you an idea of the layout: mvebu-pcie pcie-controller.1: PCIe0.0: link up mvebu-pcie pcie-controller.1: PCI host bridge to bus 0000:00 pci_bus 0000:00: root bus resource [io 0x1000-0xfffff] pci_bus 0000:00: root bus resource [mem 0xe0000000-0xefffffff] pci_bus 0000:00: root bus resource [bus 00-ff] pci 0000:00:01.0: [11ab:7846] type 01 class 0x060400 PCI: bus0: Fast back to back transfers disabled pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring pci 0000:01:00.0: [10ee:0008] type 00 class 0x050000 pci 0000:01:00.0: reg 10: [mem 0x00000000-0x00000fff] pci 0000:01:00.0: reg 14: [mem 0x00000000-0x07ffffff] pci 0000:01:00.0: reg 18: [mem 0x00000000-0x00000fff] pci 0000:01:00.0: reg 1c: [mem 0x00000000-0x007fffff] pci 0000:01:00.0: reg 20: [mem 0x00000000-0x00001fff] pci 0000:01:00.0: reg 24: [mem 0x00000000-0x00000fff] pci 0000:01:00.0: supports D1 D2 pci 0000:01:00.0: PME# supported from D0 D1 D2 D3hot PCI: bus1: Fast back to back transfers disabled pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01 pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xebffffff] pci 0000:01:00.0: BAR 1: assigned [mem 0xe0000000-0xe7ffffff] pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff] pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff] pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff] pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff] pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff] pci 0000:00:01.0: PCI bridge to [bus 01] pci 0000:00:01.0: bridge window [mem 0xe0000000-0xebffffff] PCI: enabling device 0000:00:01.0 (0140 -> 0143) ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2014-02-18 17:29 ` Gerlando Falauto @ 2014-02-18 20:27 ` Thomas Petazzoni 2014-02-19 8:38 ` Gerlando Falauto 0 siblings, 1 reply; 55+ messages in thread From: Thomas Petazzoni @ 2014-02-18 20:27 UTC (permalink / raw) To: linux-arm-kernel Dear Gerlando Falauto, On Tue, 18 Feb 2014 18:29:56 +0100, Gerlando Falauto wrote: > I tried these settings (a long time ago) and everything seemed to work > fine. Except, we now have a different problem. > Essentially, this device requires 128MB for a given BAR to provide a > PCI-to-localbus bridge. (another BAR provides the configuration space to > configure chip select regions and so on). > Apparently, only the first 64MB of this BAR seem to work correctly with > the new driver. As soon as you exceed that, reads (always?) return 0. > Other BARs (which are then of course assigned a higher region) seem to > work just fine, so it looks like a per-BAR limitation. > > This was not a problem with a 3.0 kernel. Do you have any idea what > could be wrong here? > I'm currently using a 3.10 kernel, where your patches for the pci-mvebu > driver were forcibly brought in (without full support for the MBUS > description at device tree level though). [...] > mvebu-pcie pcie-controller.1: PCIe0.0: link up > mvebu-pcie pcie-controller.1: PCI host bridge to bus 0000:00 > pci_bus 0000:00: root bus resource [io 0x1000-0xfffff] > pci_bus 0000:00: root bus resource [mem 0xe0000000-0xefffffff] > pci_bus 0000:00: root bus resource [bus 00-ff] > pci 0000:00:01.0: [11ab:7846] type 01 class 0x060400 > PCI: bus0: Fast back to back transfers disabled > pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring > pci 0000:01:00.0: [10ee:0008] type 00 class 0x050000 > pci 0000:01:00.0: reg 10: [mem 0x00000000-0x00000fff] > pci 0000:01:00.0: reg 14: [mem 0x00000000-0x07ffffff] > pci 0000:01:00.0: reg 18: [mem 0x00000000-0x00000fff] > pci 0000:01:00.0: reg 1c: [mem 0x00000000-0x007fffff] > pci 0000:01:00.0: reg 20: [mem 0x00000000-0x00001fff] > pci 0000:01:00.0: reg 24: [mem 0x00000000-0x00000fff] > pci 0000:01:00.0: supports D1 D2 > pci 0000:01:00.0: PME# supported from D0 D1 D2 D3hot > PCI: bus1: Fast back to back transfers disabled > pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01 > pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xebffffff] > pci 0000:01:00.0: BAR 1: assigned [mem 0xe0000000-0xe7ffffff] So I guess this one is the 128 MB BAR, right? > pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff] > pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff] > pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff] > pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff] > pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff] So in total, for the device 0000:01:00, the memory region should go from 0xe0000000 to 0xe8804fff. This means that a 256 MB window is needed for this device, because only power of two sizes are possible for MBus windows. Can you give me the output of /sys/kernel/debug/mvebu-mbus/devices ? It will tell us how the MBus windows are configured, as I suspect the problem might be here. Thanks! Thomas -- Thomas Petazzoni, CTO, Free Electrons Embedded Linux, Kernel and Android engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2014-02-18 20:27 ` Thomas Petazzoni @ 2014-02-19 8:38 ` Gerlando Falauto 2014-02-19 9:26 ` Thomas Petazzoni 0 siblings, 1 reply; 55+ messages in thread From: Gerlando Falauto @ 2014-02-19 8:38 UTC (permalink / raw) To: linux-arm-kernel Hi Thomas, first of all thank you for your invaluable help! On 02/18/2014 09:27 PM, Thomas Petazzoni wrote: > Dear Gerlando Falauto, > > On Tue, 18 Feb 2014 18:29:56 +0100, Gerlando Falauto wrote: > >> I tried these settings (a long time ago) and everything seemed to work >> fine. Except, we now have a different problem. >> Essentially, this device requires 128MB for a given BAR to provide a >> PCI-to-localbus bridge. (another BAR provides the configuration space to >> configure chip select regions and so on). >> Apparently, only the first 64MB of this BAR seem to work correctly with >> the new driver. As soon as you exceed that, reads (always?) return 0. >> Other BARs (which are then of course assigned a higher region) seem to >> work just fine, so it looks like a per-BAR limitation. >> >> This was not a problem with a 3.0 kernel. Do you have any idea what >> could be wrong here? >> I'm currently using a 3.10 kernel, where your patches for the pci-mvebu >> driver were forcibly brought in (without full support for the MBUS >> description at device tree level though). > > [...] > >> mvebu-pcie pcie-controller.1: PCIe0.0: link up >> mvebu-pcie pcie-controller.1: PCI host bridge to bus 0000:00 >> pci_bus 0000:00: root bus resource [io 0x1000-0xfffff] >> pci_bus 0000:00: root bus resource [mem 0xe0000000-0xefffffff] >> pci_bus 0000:00: root bus resource [bus 00-ff] >> pci 0000:00:01.0: [11ab:7846] type 01 class 0x060400 >> PCI: bus0: Fast back to back transfers disabled >> pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring >> pci 0000:01:00.0: [10ee:0008] type 00 class 0x050000 >> pci 0000:01:00.0: reg 10: [mem 0x00000000-0x00000fff] >> pci 0000:01:00.0: reg 14: [mem 0x00000000-0x07ffffff] >> pci 0000:01:00.0: reg 18: [mem 0x00000000-0x00000fff] >> pci 0000:01:00.0: reg 1c: [mem 0x00000000-0x007fffff] >> pci 0000:01:00.0: reg 20: [mem 0x00000000-0x00001fff] >> pci 0000:01:00.0: reg 24: [mem 0x00000000-0x00000fff] >> pci 0000:01:00.0: supports D1 D2 >> pci 0000:01:00.0: PME# supported from D0 D1 D2 D3hot >> PCI: bus1: Fast back to back transfers disabled >> pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01 >> pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xebffffff] >> pci 0000:01:00.0: BAR 1: assigned [mem 0xe0000000-0xe7ffffff] > > So I guess this one is the 128 MB BAR, right? That's correct. > >> pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff] >> pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff] >> pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff] >> pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff] >> pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff] > > So in total, for the device 0000:01:00, the memory region should go > from 0xe0000000 to 0xe8804fff. This means that a 256 MB window is > needed for this device, because only power of two sizes are possible > for MBus windows. > > Can you give me the output of /sys/kernel/debug/mvebu-mbus/devices ? It > will tell us how the MBus windows are configured, as I suspect the > problem might be here. Here it goes: [00] disabled [01] disabled [02] disabled [03] disabled [04] 00000000ff000000 - 00000000ff010000 : nand [05] 00000000f4000000 - 00000000f8000000 : vpcie [06] 00000000fe000000 - 00000000fe010000 : dragonite [07] 00000000e0000000 - 00000000ec000000 : pcie0.0 So there's something wrong: a 256MB window should go all the way up to 0xf0000000, and we have 192MB instead and I don't know how that would be interpreted. I couldn't figure out where this range comes from though, as in the device tree I now have a size of 256MB (I stupidly set it to 192MB at some point, but I now changed it): # hexdump -C /proc/device-tree/ocp at f1000000/pcie-controller/ranges | cut -c1-58 00000000 82 00 00 00 00 00 00 00 00 04 00 00 00 04 00 00 00000010 00 00 00 00 00 00 20 00 82 00 00 00 00 00 00 00 00000020 e0 00 00 00 e0 00 00 00 00 00 00 00 10 00 00 00 ^^^^^^^^^^^ 00000030 81 00 00 00 00 00 00 00 00 00 00 00 f0 00 00 00 00000040 00 00 00 00 00 10 00 00 00000048 But apart from that, what I still don't understand is how that could have anything to do with my problem. The memory area I'm not able to access starts at 0xe4000000. BAR0, on the other hand, spawns 0xe8802000-0xe8802fff and seems to work fine. Any ideas? Thanks a lot! Gerlando ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2014-02-19 8:38 ` Gerlando Falauto @ 2014-02-19 9:26 ` Thomas Petazzoni 2014-02-19 9:39 ` Gerlando Falauto 0 siblings, 1 reply; 55+ messages in thread From: Thomas Petazzoni @ 2014-02-19 9:26 UTC (permalink / raw) To: linux-arm-kernel Dear Gerlando Falauto, On Wed, 19 Feb 2014 09:38:48 +0100, Gerlando Falauto wrote: > >> pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff] > >> pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff] > >> pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff] > >> pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff] > >> pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff] > > > > So in total, for the device 0000:01:00, the memory region should go > > from 0xe0000000 to 0xe8804fff. This means that a 256 MB window is > > needed for this device, because only power of two sizes are possible > > for MBus windows. > > > > Can you give me the output of /sys/kernel/debug/mvebu-mbus/devices ? It > > will tell us how the MBus windows are configured, as I suspect the > > problem might be here. > > Here it goes: > > [00] disabled > [01] disabled > [02] disabled > [03] disabled > [04] 00000000ff000000 - 00000000ff010000 : nand > [05] 00000000f4000000 - 00000000f8000000 : vpcie > [06] 00000000fe000000 - 00000000fe010000 : dragonite > [07] 00000000e0000000 - 00000000ec000000 : pcie0.0 > > So there's something wrong: a 256MB window should go all the way up to > 0xf0000000, and we have 192MB instead and I don't know how that would be > interpreted. My understanding is that a 192 MB window is illegal, because the window size should be encoded as a sequence of 1s followed by a sequence of 0s from the LSB to the MSB. To me, this means that only power of two window sizes are possible. > I couldn't figure out where this range comes from though, as in the > device tree I now have a size of 256MB (I stupidly set it to 192MB at > some point, but I now changed it): > > # hexdump -C /proc/device-tree/ocp at f1000000/pcie-controller/ranges > | cut -c1-58 > 00000000 82 00 00 00 00 00 00 00 00 04 00 00 00 04 00 00 > 00000010 00 00 00 00 00 00 20 00 82 00 00 00 00 00 00 00 > 00000020 e0 00 00 00 e0 00 00 00 00 00 00 00 10 00 00 00 > ^^^^^^^^^^^ Wow, that's an old DT representation that you have here :) But ok, let me try to explain. The 256 MB value that you define in the DT is the global PCIe memory aperture: it is the maximum amount of memory that we allow the PCIe driver to allocate for PCIe windows. But depending on which PCIe devices you have plugged in, and how large their BARs are, not necessarily all of these 256 MB will be used. So, you can very well have a 256 MB global PCIe memory aperture, and still have only one 1 MB PCIe memory window for PCIe 0.0 and a 256 KB PCIe memory window for PCIe 1.0, and that's it. Now, the 192 MB comes from the enumeration of your device. Linux enumerates the BAR of your device: pci 0000:01:00.0: BAR 1: assigned [mem 0xe0000000-0xe7ffffff] pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff] pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff] pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff] pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff] pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff] and then concludes that at the emulated bridge level, the memory region to be created is: pci 0000:00:01.0: bridge window [mem 0xe0000000-0xebffffff] which corresponds to the 192 MB window that we see created. But I believe a 192 MB memory window cannot work with MBus, it should be rounded up to the next power of 2. Can you try the below patch (not tested, not even compiled, might need some tweaks to apply to your 3.10 kernel) : diff --git a/drivers/pci/host/pci-mvebu.c b/drivers/pci/host/pci-mvebu.c index 13478ec..002229a 100644 --- a/drivers/pci/host/pci-mvebu.c +++ b/drivers/pci/host/pci-mvebu.c @@ -372,6 +372,11 @@ static void mvebu_pcie_handle_membase_change(struct mvebu_pcie_port *port) (((port->bridge.memlimit & 0xFFF0) << 16) | 0xFFFFF) - port->memwin_base; + pr_info("PCIE %d.%d: creating window at 0x%x, size 0x%x rounded up to 0x%x\n", + port->port, port->lane, port->memwin_base, + port->memwin_size, roundup_pow_of_two(port->memwin_size)); + port->memwin_size = roundup_pow_of_two(port->memwin_size); + mvebu_mbus_add_window_by_id(port->mem_target, port->mem_attr, port->memwin_base, port->memwin_size); } I'm obviously interested in seeing the message that gets shown, as well as the new mvebu-mbus debugfs output. For good measure, if you could also dump the registers of the PCIe window. In your case, it was window 7, so dumping 0xf1020070 and 0xf1020074 would be useful. > But apart from that, what I still don't understand is how that could > have anything to do with my problem. The memory area I'm not able to > access starts at 0xe4000000. > BAR0, on the other hand, spawns 0xe8802000-0xe8802fff and seems to work > fine. I am not sure, but since we are configuring an invalid memory size, maybe the MBus behavior is undefined, and we get some completely funky behavior, where parts of the 192 MB window are actually work, but parts of it are not. Thomas -- Thomas Petazzoni, CTO, Free Electrons Embedded Linux, Kernel and Android engineering http://free-electrons.com ^ permalink raw reply related [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2014-02-19 9:26 ` Thomas Petazzoni @ 2014-02-19 9:39 ` Gerlando Falauto 2014-02-19 13:37 ` Thomas Petazzoni 0 siblings, 1 reply; 55+ messages in thread From: Gerlando Falauto @ 2014-02-19 9:39 UTC (permalink / raw) To: linux-arm-kernel Hi Thomas, spoiler first: SUCCESS!!!! On 02/19/2014 10:26 AM, Thomas Petazzoni wrote: > Dear Gerlando Falauto, [...] >> >> # hexdump -C /proc/device-tree/ocp at f1000000/pcie-controller/ranges >> | cut -c1-58 >> 00000000 82 00 00 00 00 00 00 00 00 04 00 00 00 04 00 00 >> 00000010 00 00 00 00 00 00 20 00 82 00 00 00 00 00 00 00 >> 00000020 e0 00 00 00 e0 00 00 00 00 00 00 00 10 00 00 00 >> ^^^^^^^^^^^ > > Wow, that's an old DT representation that you have here :) Indeed... ;-) > But ok, let me try to explain. The 256 MB value that you define in the > DT is the global PCIe memory aperture: it is the maximum amount of > memory that we allow the PCIe driver to allocate for PCIe windows. But > depending on which PCIe devices you have plugged in, and how large > their BARs are, not necessarily all of these 256 MB will be used. > > So, you can very well have a 256 MB global PCIe memory aperture, and > still have only one 1 MB PCIe memory window for PCIe 0.0 and a 256 KB > PCIe memory window for PCIe 1.0, and that's it. > > Now, the 192 MB comes from the enumeration of your device. Linux > enumerates the BAR of your device: > > pci 0000:01:00.0: BAR 1: assigned [mem 0xe0000000-0xe7ffffff] > pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff] > pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff] > pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff] > pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff] > pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff] > > and then concludes that at the emulated bridge level, the memory region > to be created is: > > pci 0000:00:01.0: bridge window [mem 0xe0000000-0xebffffff] > > which corresponds to the 192 MB window that we see created. > > But I believe a 192 MB memory window cannot work with MBus, it should > be rounded up to the next power of 2. Can you try the below patch (not > tested, not even compiled, might need some tweaks to apply to your 3.10 > kernel) : > > diff --git a/drivers/pci/host/pci-mvebu.c b/drivers/pci/host/pci-mvebu.c > index 13478ec..002229a 100644 > --- a/drivers/pci/host/pci-mvebu.c > +++ b/drivers/pci/host/pci-mvebu.c > @@ -372,6 +372,11 @@ static void mvebu_pcie_handle_membase_change(struct mvebu_pcie_port *port) > (((port->bridge.memlimit & 0xFFF0) << 16) | 0xFFFFF) - > port->memwin_base; > > + pr_info("PCIE %d.%d: creating window at 0x%x, size 0x%x rounded up to 0x%x\n", > + port->port, port->lane, port->memwin_base, > + port->memwin_size, roundup_pow_of_two(port->memwin_size)); > + port->memwin_size = roundup_pow_of_two(port->memwin_size); > + > mvebu_mbus_add_window_by_id(port->mem_target, port->mem_attr, > port->memwin_base, port->memwin_size); > } > > I'm obviously interested in seeing the message that gets shown, as well > as the new mvebu-mbus debugfs output. ---------- pci 0000:00:01.0: bridge window [mem 0xe0000000-0xebffffff] PCIE 0.0: creating window at 0xe0000000, size 0xbffffff rounded up to 0x10000000 ---------- cat /sys/kernel/debug/mvebu-mbus/devices [00] disabled [01] disabled [02] disabled [03] disabled [04] 00000000ff000000 - 00000000ff010000 : nand [05] 00000000f4000000 - 00000000f8000000 : vpcie [06] 00000000fe000000 - 00000000fe010000 : dragonite [07] 00000000e0000000 - 00000000f0000000 : pcie0.0 > For good measure, if you could also dump the registers of the PCIe > window. In your case, it was window 7, so dumping 0xf1020070 and > 0xf1020074 would be useful. Isn't that where the output of debugfs comes from? >> But apart from that, what I still don't understand is how that could >> have anything to do with my problem. The memory area I'm not able to >> access starts at 0xe4000000. >> BAR0, on the other hand, spawns 0xe8802000-0xe8802fff and seems to work >> fine. > > I am not sure, but since we are configuring an invalid memory size, > maybe the MBus behavior is undefined, and we get some completely funky > behavior, where parts of the 192 MB window are actually work, but parts > of it are not. And... Ladies and gentlemen... it turns out YOU'RE RIGHT!!! With your patch now everything works fine!!! No words (or quads, for that matter) can express how grateful I am! ;-) Thank you so much!!! Gerlando > > Thomas > ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2014-02-19 9:39 ` Gerlando Falauto @ 2014-02-19 13:37 ` Thomas Petazzoni 2014-02-19 21:45 ` Bjorn Helgaas 0 siblings, 1 reply; 55+ messages in thread From: Thomas Petazzoni @ 2014-02-19 13:37 UTC (permalink / raw) To: linux-arm-kernel Gerlando, Bjorn, Bjorn, I added you as the To: because there is a PCI related question for you below :) On Wed, 19 Feb 2014 10:39:07 +0100, Gerlando Falauto wrote: > spoiler first: SUCCESS!!!! Awesome :) > > I'm obviously interested in seeing the message that gets shown, as well > > as the new mvebu-mbus debugfs output. > > ---------- > pci 0000:00:01.0: bridge window [mem 0xe0000000-0xebffffff] > PCIE 0.0: creating window at 0xe0000000, size 0xbffffff rounded up to > 0x10000000 Right, rounding from 192 MB to 265 MB. > cat /sys/kernel/debug/mvebu-mbus/devices > [00] disabled > [01] disabled > [02] disabled > [03] disabled > [04] 00000000ff000000 - 00000000ff010000 : nand > [05] 00000000f4000000 - 00000000f8000000 : vpcie > [06] 00000000fe000000 - 00000000fe010000 : dragonite > [07] 00000000e0000000 - 00000000f0000000 : pcie0.0 > > > For good measure, if you could also dump the registers of the PCIe > > window. In your case, it was window 7, so dumping 0xf1020070 and > > 0xf1020074 would be useful. > > Isn't that where the output of debugfs comes from? It is, but the mvebu-mbus is interpreting the sequence of 1s and 0s to give the real size, and this involves a little bit of magic of bit manipulation, which I wanted to check by having a look at the raw values of the registers. > > I am not sure, but since we are configuring an invalid memory size, > > maybe the MBus behavior is undefined, and we get some completely funky > > behavior, where parts of the 192 MB window are actually work, but parts > > of it are not. > > And... Ladies and gentlemen... it turns out YOU'RE RIGHT!!! > With your patch now everything works fine!!! > > No words (or quads, for that matter) can express how grateful I am! ;-) Cool. However, I am not sure my fix is really correct, because is you had another PCIe device that needed 64 MB of memory space, the PCIe core would have allocated addresses 0xec000000 -> 0xf0000000 to it, which would have conflicted with the forced "power of 2 up-rounding" we've applied on the memory space of the first device. Therefore, I believe this constraint should be taken into account by the PCIe core when allocating the different memory regions for each device. Bjorn, the mvebu PCIe host driver has the constraint that the I/O and memory regions associated to each PCIe device of the emulated bridge have a size that is a power of 2. I am currently using the ->align_resource() hook to ensure that the start address of the resource matches certain other constraints, but I don't see a way of telling the PCI core that I need the resource to have its size rounded up to the next power of 2 size. Is there a way of doing this? In the case described by Gerlando, the PCI core has assigned a 192 MB region, but the Marvell hardware can only create windows that have a power of two size, i.e 256 MB. Therefore, the PCI core should be told this constraint, so that it doesn't allocate the next resource right after the 192 MB one. Thanks! Thomas -- Thomas Petazzoni, CTO, Free Electrons Embedded Linux, Kernel and Android engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2014-02-19 13:37 ` Thomas Petazzoni @ 2014-02-19 21:45 ` Bjorn Helgaas 2014-02-20 8:55 ` Thomas Petazzoni 0 siblings, 1 reply; 55+ messages in thread From: Bjorn Helgaas @ 2014-02-19 21:45 UTC (permalink / raw) To: linux-arm-kernel On Wed, Feb 19, 2014 at 6:37 AM, Thomas Petazzoni <thomas.petazzoni@free-electrons.com> wrote: > Gerlando, Bjorn, > > Bjorn, I added you as the To: because there is a PCI related question > for you below :) > > On Wed, 19 Feb 2014 10:39:07 +0100, Gerlando Falauto wrote: > >> spoiler first: SUCCESS!!!! > > Awesome :) > >> > I'm obviously interested in seeing the message that gets shown, as well >> > as the new mvebu-mbus debugfs output. >> >> ---------- >> pci 0000:00:01.0: bridge window [mem 0xe0000000-0xebffffff] >> PCIE 0.0: creating window at 0xe0000000, size 0xbffffff rounded up to >> 0x10000000 > > Right, rounding from 192 MB to 265 MB. > >> cat /sys/kernel/debug/mvebu-mbus/devices >> [00] disabled >> [01] disabled >> [02] disabled >> [03] disabled >> [04] 00000000ff000000 - 00000000ff010000 : nand >> [05] 00000000f4000000 - 00000000f8000000 : vpcie >> [06] 00000000fe000000 - 00000000fe010000 : dragonite >> [07] 00000000e0000000 - 00000000f0000000 : pcie0.0 >> >> > For good measure, if you could also dump the registers of the PCIe >> > window. In your case, it was window 7, so dumping 0xf1020070 and >> > 0xf1020074 would be useful. >> >> Isn't that where the output of debugfs comes from? > > It is, but the mvebu-mbus is interpreting the sequence of 1s and 0s to > give the real size, and this involves a little bit of magic of bit > manipulation, which I wanted to check by having a look at the raw > values of the registers. > >> > I am not sure, but since we are configuring an invalid memory size, >> > maybe the MBus behavior is undefined, and we get some completely funky >> > behavior, where parts of the 192 MB window are actually work, but parts >> > of it are not. >> >> And... Ladies and gentlemen... it turns out YOU'RE RIGHT!!! >> With your patch now everything works fine!!! >> >> No words (or quads, for that matter) can express how grateful I am! ;-) > > Cool. However, I am not sure my fix is really correct, because is you > had another PCIe device that needed 64 MB of memory space, the PCIe > core would have allocated addresses 0xec000000 -> 0xf0000000 to it, > which would have conflicted with the forced "power of 2 up-rounding" > we've applied on the memory space of the first device. > > Therefore, I believe this constraint should be taken into account by > the PCIe core when allocating the different memory regions for each > device. > > Bjorn, the mvebu PCIe host driver has the constraint that the I/O and > memory regions associated to each PCIe device of the emulated bridge > have a size that is a power of 2. > > I am currently using the ->align_resource() hook to ensure that the > start address of the resource matches certain other constraints, but I > don't see a way of telling the PCI core that I need the resource to > have its size rounded up to the next power of 2 size. Is there a way of > doing this? > > In the case described by Gerlando, the PCI core has assigned a 192 MB > region, but the Marvell hardware can only create windows that have a > power of two size, i.e 256 MB. Therefore, the PCI core should be told > this constraint, so that it doesn't allocate the next resource right > after the 192 MB one. I'm not sure I understand this correctly, but I *think* this 192 MB region that gets rounded up to 256 MB because of the Marvell constraint is a host bridge aperture. If that's the case, it's entirely up to you (the host bridge driver author) to round it as needed before passing it to pci_add_resource_offset(). The PCI core will never allocate any space that is outside the host bridge apertures. But maybe I don't understand your situation well enough. Bjorn ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2014-02-19 21:45 ` Bjorn Helgaas @ 2014-02-20 8:55 ` Thomas Petazzoni 2014-02-20 17:35 ` Jason Gunthorpe 2014-02-20 19:18 ` Bjorn Helgaas 0 siblings, 2 replies; 55+ messages in thread From: Thomas Petazzoni @ 2014-02-20 8:55 UTC (permalink / raw) To: linux-arm-kernel Dear Bjorn Helgaas, + Jason Gunthorpe. On Wed, 19 Feb 2014 14:45:48 -0700, Bjorn Helgaas wrote: > > Cool. However, I am not sure my fix is really correct, because is you > > had another PCIe device that needed 64 MB of memory space, the PCIe > > core would have allocated addresses 0xec000000 -> 0xf0000000 to it, > > which would have conflicted with the forced "power of 2 up-rounding" > > we've applied on the memory space of the first device. > > > > Therefore, I believe this constraint should be taken into account by > > the PCIe core when allocating the different memory regions for each > > device. > > > > Bjorn, the mvebu PCIe host driver has the constraint that the I/O and > > memory regions associated to each PCIe device of the emulated bridge > > have a size that is a power of 2. > > > > I am currently using the ->align_resource() hook to ensure that the > > start address of the resource matches certain other constraints, but I > > don't see a way of telling the PCI core that I need the resource to > > have its size rounded up to the next power of 2 size. Is there a way of > > doing this? > > > > In the case described by Gerlando, the PCI core has assigned a 192 MB > > region, but the Marvell hardware can only create windows that have a > > power of two size, i.e 256 MB. Therefore, the PCI core should be told > > this constraint, so that it doesn't allocate the next resource right > > after the 192 MB one. > > I'm not sure I understand this correctly, but I *think* this 192 MB > region that gets rounded up to 256 MB because of the Marvell > constraint is a host bridge aperture. If that's the case, it's > entirely up to you (the host bridge driver author) to round it as > needed before passing it to pci_add_resource_offset(). > > The PCI core will never allocate any space that is outside the host > bridge apertures. Hum, I believe there is a misunderstanding here. We are already using pci_add_resource_offset() to define the global aperture for the entire PCI bridge. This is not causing any problem. Let me give a little bit of background first. On Marvell hardware, the physical address space layout is configurable, through the use of "MBus windows". A "MBus window" is defined by a base address, a size, and a target device. So if the CPU needs to access a given device (such as PCIe 0.0 for example), then we need to create a "MBus window" whose size and target device match PCIe 0.0. Since Armada XP has 10 PCIe interfaces, we cannot just statically create as many MBus windows as there are PCIe interfaces: it would both exhaust the number of MBus windows available, and also exhaust the physical address space, because we would have to create very large windows, just in case the PCIe device plugged behind this interface needs large BARs. So, what the pci-mvebu.c driver does is that it creates an emulated PCI bridge. This emulated bridge is used to let the Linux PCI core enumerate the real physical PCI devices behind the bridge, allocate a range of physical addresses that is available for each of these devices, and write them to the bridge registers. Since the bridge is not a real one, but emulated, but trap those writes, and use them to create the MBus windows that will allow the CPU to actually access the device, at the base address chosen by the Linux PCI core during the enumeration process. However, MBus windows have a certain constraint that they must have a power of two size, so the Linux PCI core should not write to one of the bridge PCI_MEMORY_BASE / PCI_MEMORY_LIMIT registers any range of address whose size is not a power of 2. Let me take the example of Gerlando: pci 0000:01:00.0: BAR 1: assigned [mem 0xe0000000-0xe7ffffff] pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff] pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff] pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff] pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff] pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff] pci 0000:00:01.0: PCI bridge to [bus 01] pci 0000:00:01.0: bridge window [mem 0xe0000000-0xebffffff] So, pci 0000:01:00 is the real device, which has a number of BARs of a certain size. Taking into account all those BARs, the Linux PCI core decides to assign [mem 0xe0000000-0xebffffff] to the bridge (last line of the log above). The problem is that [mem 0xe0000000-0xebffffff] is 192 MB, but we would like the Linux PCI core to extend that to 256 MB. As you can see it is not about the global aperture associated to the bridge, but about the size of the window associated to each "port" of the bridge. Does that make sense? Keep in mind that I'm still not completely familiar with the PCI terminology, so maybe the above explanation does not use the right terms. Thanks for your feedback, Thomas -- Thomas Petazzoni, CTO, Free Electrons Embedded Linux, Kernel and Android engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2014-02-20 8:55 ` Thomas Petazzoni @ 2014-02-20 17:35 ` Jason Gunthorpe 2014-02-20 20:29 ` Thomas Petazzoni 2014-02-20 19:18 ` Bjorn Helgaas 1 sibling, 1 reply; 55+ messages in thread From: Jason Gunthorpe @ 2014-02-20 17:35 UTC (permalink / raw) To: linux-arm-kernel On Thu, Feb 20, 2014 at 09:55:18AM +0100, Thomas Petazzoni wrote: > Does that make sense? Keep in mind that I'm still not completely > familiar with the PCI terminology, so maybe the above explanation does > not use the right terms. Stated another way, the Marvel PCI-E to PCI-E bridge config space has a quirk that requires the window BARs to be aligned on their size and sized to a power of 2. The first requirement is already being handled by hooking through ARM's 'align_resource' callback. One avenue would be to have mvebu_pcie_align_resource return a struct resource and manipulate the size as well. Assuming the PCI core will accommodate that. Jason ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2014-02-20 17:35 ` Jason Gunthorpe @ 2014-02-20 20:29 ` Thomas Petazzoni 2014-02-21 0:32 ` Jason Gunthorpe 0 siblings, 1 reply; 55+ messages in thread From: Thomas Petazzoni @ 2014-02-20 20:29 UTC (permalink / raw) To: linux-arm-kernel Dear Jason Gunthorpe, On Thu, 20 Feb 2014 10:35:18 -0700, Jason Gunthorpe wrote: > On Thu, Feb 20, 2014 at 09:55:18AM +0100, Thomas Petazzoni wrote: > > > Does that make sense? Keep in mind that I'm still not completely > > familiar with the PCI terminology, so maybe the above explanation does > > not use the right terms. > > Stated another way, the Marvel PCI-E to PCI-E bridge config space has > a quirk that requires the window BARs to be aligned on their size and > sized to a power of 2. Correct. > The first requirement is already being handled by hooking through > ARM's 'align_resource' callback. Absolutely. > One avenue would be to have mvebu_pcie_align_resource return a struct > resource and manipulate the size as well. Assuming the PCI core will > accommodate that. That would effectively be the easiest solution from the point of view of the PCIe driver. In practice, the story is a little bit more subtle than that: the PCIe driver may want to decide to either tell the PCI core to enlarge the window BAR up to the next power of two size, or to dedicate two windows to it. For example: * If the PCI core allocates a 96 KB BAR, we clearly want it to be enlarged to 128 KB, so that we have to create a single window for it. * However, if the PCI core allocates a 192 MB BAR, we may want to instead create two windows: a first one of 128 MB and a second one of 64 MB. This consumes two windows, but saves 64 MB of physical address space. (Note that I haven't tested myself the creation of two windows for the same target device, but I was told by Lior that it should work). As you can see from the two examples above, we may not necessarily want to enforce this power-of-two constraint in all cases. We may want to accept a non-power-of-2 size in the case of the 192 MB BAR, and let the mvebu-mbus driver figure out that it should allocate several consecutive windows to cover these 192 MB. But to begin with, rounding up all window BARs up to the next power of two size would be perfectly OK. Jason, would you mind maybe replying to Bjorn Helgaas email (Thu, 20 Feb 2014 12:18:42 -0700) ? I believe that a lot of the misunderstanding between Bjorn and me is due to the fact that I don't use the correct PCI terminology to describe how the Marvell hardware works, and how the Marvell PCIe driver copes with it. I'm sure you would explain it in a way that would be more easily understood by someone very familiar with the PCI terminology such as Bjorn. Thanks a lot! Best regards, Thomas -- Thomas Petazzoni, CTO, Free Electrons Embedded Linux, Kernel and Android engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2014-02-20 20:29 ` Thomas Petazzoni @ 2014-02-21 0:32 ` Jason Gunthorpe 2014-02-21 8:34 ` Thomas Petazzoni 0 siblings, 1 reply; 55+ messages in thread From: Jason Gunthorpe @ 2014-02-21 0:32 UTC (permalink / raw) To: linux-arm-kernel On Thu, Feb 20, 2014 at 09:29:14PM +0100, Thomas Petazzoni wrote: > In practice, the story is a little bit more subtle than that: the PCIe > driver may want to decide to either tell the PCI core to enlarge the > window BAR up to the next power of two size, or to dedicate two windows > to it. That is a smart, easy solution! Maybe that is the least invasive way to proceed for now? I have no idea how you decide when to round up and when to allocate more windows, that feels like a fairly complex optimization problem! Alternatively, I suspect you can use the PCI quirk mechanism to alter the resource sizing on a bridge? > Jason, would you mind maybe replying to Bjorn Helgaas email (Thu, 20 > Feb 2014 12:18:42 -0700) ? I believe that a lot of the misunderstanding > between Bjorn and me is due to the fact that I don't use the correct > PCI terminology to describe how the Marvell hardware works, and how the > Marvell PCIe driver copes with it. I'm sure you would explain it in a > way that would be more easily understood by someone very familiar with > the PCI terminology such as Bjorn. Thanks a lot! Done! Hope it helps, Jason ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2014-02-21 0:32 ` Jason Gunthorpe @ 2014-02-21 8:34 ` Thomas Petazzoni 2014-02-21 8:58 ` Gerlando Falauto 0 siblings, 1 reply; 55+ messages in thread From: Thomas Petazzoni @ 2014-02-21 8:34 UTC (permalink / raw) To: linux-arm-kernel Dear Jason Gunthorpe, On Thu, 20 Feb 2014 17:32:27 -0700, Jason Gunthorpe wrote: > > In practice, the story is a little bit more subtle than that: the PCIe > > driver may want to decide to either tell the PCI core to enlarge the > > window BAR up to the next power of two size, or to dedicate two windows > > to it. > > That is a smart, easy solution! Maybe that is the least invasive way > to proceed for now? So you suggest that the mvebu-mbus driver should accept a non power-of-two window size, and do internally the job of cutting that into several power-of-two sized areas and creating the corresponding windows? > I have no idea how you decide when to round up and when to allocate > more windows, that feels like a fairly complex optimization problem! Yes, it is a fairly complex problem. I was thinking of a threshold of "lost space". Below this threshold, it's better to enlarge the window, above the threshold it's better to create two windows. But not easy. > Alternatively, I suspect you can use the PCI quirk mechanism to alter > the resource sizing on a bridge? Can you give more details about this mechanism, and how it could be used to alter the size of resources on a bridge? > > Jason, would you mind maybe replying to Bjorn Helgaas email (Thu, 20 > > Feb 2014 12:18:42 -0700) ? I believe that a lot of the misunderstanding > > between Bjorn and me is due to the fact that I don't use the correct > > PCI terminology to describe how the Marvell hardware works, and how the > > Marvell PCIe driver copes with it. I'm sure you would explain it in a > > way that would be more easily understood by someone very familiar with > > the PCI terminology such as Bjorn. Thanks a lot! > > Done! Thanks a lot! Really appreciated. Best regards, Thomas -- Thomas Petazzoni, CTO, Free Electrons Embedded Linux, Kernel and Android engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2014-02-21 8:34 ` Thomas Petazzoni @ 2014-02-21 8:58 ` Gerlando Falauto 2014-02-21 9:12 ` Thomas Petazzoni 0 siblings, 1 reply; 55+ messages in thread From: Gerlando Falauto @ 2014-02-21 8:58 UTC (permalink / raw) To: linux-arm-kernel Hi guys, first of all thank you for your support and the explanations. I'm slowly starting to understand something more about this kind of stuff. On 02/21/2014 09:34 AM, Thomas Petazzoni wrote: > Dear Jason Gunthorpe, > > On Thu, 20 Feb 2014 17:32:27 -0700, Jason Gunthorpe wrote: > >>> In practice, the story is a little bit more subtle than that: the PCIe >>> driver may want to decide to either tell the PCI core to enlarge the >>> window BAR up to the next power of two size, or to dedicate two windows >>> to it. >> >> That is a smart, easy solution! Maybe that is the least invasive way >> to proceed for now? > > So you suggest that the mvebu-mbus driver should accept a > non power-of-two window size, and do internally the job of cutting that > into several power-of-two sized areas and creating the corresponding > windows? > >> I have no idea how you decide when to round up and when to allocate >> more windows, that feels like a fairly complex optimization problem! > > Yes, it is a fairly complex problem. I was thinking of a threshold of > "lost space". Below this threshold, it's better to enlarge the window, > above the threshold it's better to create two windows. But not easy. > >> Alternatively, I suspect you can use the PCI quirk mechanism to alter >> the resource sizing on a bridge? > > Can you give more details about this mechanism, and how it could be > used to alter the size of resources on a bridge? I'm not sure I understand all the details... but I guess some sort of rounding mechanism is indeed already in place somewhere: pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xebffffff] pci 0000:01:00.0: BAR 1: assigned [mem 0xe0000000-0xe7ffffff] pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff] pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff] pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff] pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff] pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff] pci 0000:00:01.0: PCI bridge to [bus 01] If you look at the numbers, the total size required by BAR0-5 is 0x8805000, so around 136MB, that is 128MB+8MB+2K+1K+1K. This gets rounded up (on this 'virtual' BAR 8) to 192MB (I don't know where or why), which is 1.5x a power of two (i.e. two consecutive bits followed by all zeroes). If that's not just a coincidence, finding a coverage subset becomes a trivial matter (128MB+64MB). In any case, even if we have an odd number like the above (0x8805000), I believe we could easily find a suboptimal coverage by just taking the most significant one and the second most significant one (possibly left shifted by 1 if there's a third one somewhere else). In the above case, that would be 0x8000000 + 0x1000000. That's 128MB+16MB, which is even smaller than the rounding above (192MB). What do you think? Thanks again! Gerlando ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2014-02-21 8:58 ` Gerlando Falauto @ 2014-02-21 9:12 ` Thomas Petazzoni 2014-02-21 9:16 ` Gerlando Falauto 0 siblings, 1 reply; 55+ messages in thread From: Thomas Petazzoni @ 2014-02-21 9:12 UTC (permalink / raw) To: linux-arm-kernel Dear Gerlando Falauto, On Fri, 21 Feb 2014 09:58:21 +0100, Gerlando Falauto wrote: > > Can you give more details about this mechanism, and how it could be > > used to alter the size of resources on a bridge? > > I'm not sure I understand all the details... but I guess some sort of > rounding mechanism is indeed already in place somewhere: > > pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xebffffff] > pci 0000:01:00.0: BAR 1: assigned [mem 0xe0000000-0xe7ffffff] > pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff] > pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff] > pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff] > pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff] > pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff] > pci 0000:00:01.0: PCI bridge to [bus 01] > > If you look at the numbers, the total size required by BAR0-5 is > 0x8805000, so around 136MB, that is 128MB+8MB+2K+1K+1K. > This gets rounded up (on this 'virtual' BAR 8) to 192MB (I don't know > where or why), which is 1.5x a power of two (i.e. two consecutive bits > followed by all zeroes). Would indeed be interesting to know who does this rounding, and why, and according to what rules. > If that's not just a coincidence, finding a coverage subset becomes a > trivial matter (128MB+64MB). > > In any case, even if we have an odd number like the above (0x8805000), I > believe we could easily find a suboptimal coverage by just taking the > most significant one and the second most significant one (possibly left > shifted by 1 if there's a third one somewhere else). > In the above case, that would be 0x8000000 + 0x1000000. That's > 128MB+16MB, which is even smaller than the rounding above (192MB). > > What do you think? Sure, but whichever choice we make, the Linux PCI core must know by how much we've enlarge the bridge window BAR, otherwise the Linux PCI core may allocate for the next bridge window BAR a range of addresses that doesn't overlap with what it has allocate for the previous bridge window BAR, but that ends up overlapping due to us "extending" the previous bridge window BAR to match the MBus requirements. Gerlando, would you be able to test a quick hack that creates 2 windows to cover exactly 128 MB + 64 MB ? This would at least allow us to confirm that the strategy of splitting in multiple windows is usable. Thanks! Thomas -- Thomas Petazzoni, CTO, Free Electrons Embedded Linux, Kernel and Android engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2014-02-21 9:12 ` Thomas Petazzoni @ 2014-02-21 9:16 ` Gerlando Falauto 2014-02-21 9:39 ` Thomas Petazzoni 0 siblings, 1 reply; 55+ messages in thread From: Gerlando Falauto @ 2014-02-21 9:16 UTC (permalink / raw) To: linux-arm-kernel Hi Thomas, On 02/21/2014 10:12 AM, Thomas Petazzoni wrote: > Dear Gerlando Falauto, > > On Fri, 21 Feb 2014 09:58:21 +0100, Gerlando Falauto wrote: > >>> Can you give more details about this mechanism, and how it could be >>> used to alter the size of resources on a bridge? >> >> I'm not sure I understand all the details... but I guess some sort of >> rounding mechanism is indeed already in place somewhere: >> >> pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xebffffff] >> pci 0000:01:00.0: BAR 1: assigned [mem 0xe0000000-0xe7ffffff] >> pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff] >> pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff] >> pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff] >> pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff] >> pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff] >> pci 0000:00:01.0: PCI bridge to [bus 01] >> >> If you look at the numbers, the total size required by BAR0-5 is >> 0x8805000, so around 136MB, that is 128MB+8MB+2K+1K+1K. >> This gets rounded up (on this 'virtual' BAR 8) to 192MB (I don't know >> where or why), which is 1.5x a power of two (i.e. two consecutive bits >> followed by all zeroes). > > Would indeed be interesting to know who does this rounding, and why, > and according to what rules. > >> If that's not just a coincidence, finding a coverage subset becomes a >> trivial matter (128MB+64MB). >> >> In any case, even if we have an odd number like the above (0x8805000), I >> believe we could easily find a suboptimal coverage by just taking the >> most significant one and the second most significant one (possibly left >> shifted by 1 if there's a third one somewhere else). >> In the above case, that would be 0x8000000 + 0x1000000. That's >> 128MB+16MB, which is even smaller than the rounding above (192MB). >> >> What do you think? > > Sure, but whichever choice we make, the Linux PCI core must know by how > much we've enlarge the bridge window BAR, otherwise the Linux PCI core > may allocate for the next bridge window BAR a range of addresses that > doesn't overlap with what it has allocate for the previous bridge > window BAR, but that ends up overlapping due to us "extending" the > previous bridge window BAR to match the MBus requirements. > > Gerlando, would you be able to test a quick hack that creates 2 windows > to cover exactly 128 MB + 64 MB ? This would at least allow us to > confirm that the strategy of splitting in multiple windows is usable. Sure, though probably not until next week. I guess it would then also be useful to restore my previous setup, where the total PCIe aperture is 192MB, right? Thank you guys! Gerlando ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2014-02-21 9:16 ` Gerlando Falauto @ 2014-02-21 9:39 ` Thomas Petazzoni 2014-02-21 12:24 ` Gerlando Falauto 0 siblings, 1 reply; 55+ messages in thread From: Thomas Petazzoni @ 2014-02-21 9:39 UTC (permalink / raw) To: linux-arm-kernel Dear Gerlando Falauto, On Fri, 21 Feb 2014 10:16:32 +0100, Gerlando Falauto wrote: > > Sure, but whichever choice we make, the Linux PCI core must know by how > > much we've enlarge the bridge window BAR, otherwise the Linux PCI core > > may allocate for the next bridge window BAR a range of addresses that > > doesn't overlap with what it has allocate for the previous bridge > > window BAR, but that ends up overlapping due to us "extending" the > > previous bridge window BAR to match the MBus requirements. > > > > Gerlando, would you be able to test a quick hack that creates 2 windows > > to cover exactly 128 MB + 64 MB ? This would at least allow us to > > confirm that the strategy of splitting in multiple windows is usable. > > Sure, though probably not until next week. No problem at all. > I guess it would then also be useful to restore my previous setup, where > the total PCIe aperture is 192MB, right? Yes, that's the case I'm interested in at the moment. If you could try the above (ugly) patch, and see if you can access all your device BARs, it would be interesting. It would tell us if two separate windows having the same target/attribute and consecutive placement in the physical address space can actually work to address a given PCIe device. As you will see, the patch makes a very ugly special case for 192 MB :-) diff --git a/drivers/bus/mvebu-mbus.c b/drivers/bus/mvebu-mbus.c index 2394e97..f763ecc 100644 --- a/drivers/bus/mvebu-mbus.c +++ b/drivers/bus/mvebu-mbus.c @@ -223,11 +223,13 @@ static int mvebu_mbus_window_conflicts(struct mvebu_mbus_state *mbus, if ((u64)base < wend && end > wbase) return 0; +#if 0 /* * Check if target/attribute conflicts */ if (target == wtarget && attr == wattr) return 0; +#endif } return 1; diff --git a/drivers/pci/host/pci-mvebu.c b/drivers/pci/host/pci-mvebu.c index 2aa7b77c..67fe6df 100644 --- a/drivers/pci/host/pci-mvebu.c +++ b/drivers/pci/host/pci-mvebu.c @@ -361,8 +361,15 @@ static void mvebu_pcie_handle_membase_change(struct mvebu_pcie_port *port) (((port->bridge.memlimit & 0xFFF0) << 16) | 0xFFFFF) - port->memwin_base; - mvebu_mbus_add_window_by_id(port->mem_target, port->mem_attr, - port->memwin_base, port->memwin_size); + if (port->memwin_size == (SZ_128M + SZ_64M)) { + mvebu_mbus_add_window_by_id(port->mem_target, port->mem_attr, + port->memwin_base, SZ_128M); + mvebu_mbus_add_window_by_id(port->mem_target, port->mem_attr, + port->memwin_base + SZ_128M, SZ_64M); + } else { + mvebu_mbus_add_window_by_id(port->mem_target, port->mem_attr, + port->memwin_base, port->memwin_size); + } } /* -- Thomas Petazzoni, CTO, Free Electrons Embedded Linux, Kernel and Android engineering http://free-electrons.com ^ permalink raw reply related [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2014-02-21 9:39 ` Thomas Petazzoni @ 2014-02-21 12:24 ` Gerlando Falauto 2014-02-21 13:47 ` Thomas Petazzoni 0 siblings, 1 reply; 55+ messages in thread From: Gerlando Falauto @ 2014-02-21 12:24 UTC (permalink / raw) To: linux-arm-kernel Hi Thomas, On 02/21/2014 10:39 AM, Thomas Petazzoni wrote: > Dear Gerlando Falauto, [...] >> I guess it would then also be useful to restore my previous setup, where >> the total PCIe aperture is 192MB, right? > > Yes, that's the case I'm interested in at the moment. If you could try > the above (ugly) patch, and see if you can access all your device BARs, > it would be interesting. It would tell us if two separate windows > having the same target/attribute and consecutive placement in the > physical address space can actually work to address a given PCIe > device. As you will see, the patch makes a very ugly special case for > 192 MB :-) > So I restored the total aperture size to 192MB. I had to rework your patch a bit because: a) I'm running an older kernel and driver b) sizes are actually 1-byte offset So here it is: diff --git a/drivers/bus/mvebu-mbus.c b/drivers/bus/mvebu-mbus.c index dd4445f..27fe162 100644 --- a/drivers/bus/mvebu-mbus.c +++ b/drivers/bus/mvebu-mbus.c @@ -251,11 +251,13 @@ static int mvebu_mbus_window_conflicts(struct mvebu_mbus_state *mbus, if ((u64)base < wend && end > wbase) return 0; +#if 0 /* * Check if target/attribute conflicts */ if (target == wtarget && attr == wattr) return 0; +#endif } return 1; diff --git a/drivers/pci/host/pci-mvebu.c b/drivers/pci/host/pci-mvebu.c index c8397c4..120a822 100644 --- a/drivers/pci/host/pci-mvebu.c +++ b/drivers/pci/host/pci-mvebu.c @@ -332,10 +332,21 @@ static void mvebu_pcie_handle_membase_change(struct mvebu_pcie_port *port) (((port->bridge.memlimit & 0xFFF0) << 16) | 0xFFFFF) - port->memwin_base; - mvebu_mbus_add_window_remap_flags(port->name, port->memwin_base, - port->memwin_size, - MVEBU_MBUS_NO_REMAP, - MVEBU_MBUS_PCI_MEM); + if (port->memwin_size + 1 == (SZ_128M + SZ_64M)) { + mvebu_mbus_add_window_remap_flags(port->name, port->memwin_base, + SZ_128M - 1, + MVEBU_MBUS_NO_REMAP, + MVEBU_MBUS_PCI_MEM); + mvebu_mbus_add_window_remap_flags(port->name, port->memwin_base + SZ_128M, + SZ_64M - 1, + MVEBU_MBUS_NO_REMAP, + MVEBU_MBUS_PCI_MEM); + } else { + mvebu_mbus_add_window_remap_flags(port->name, port->memwin_base, + port->memwin_size, + MVEBU_MBUS_NO_REMAP, + MVEBU_MBUS_PCI_MEM); + } } /* Here's the assignment (same as before): pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xebffffff] pci 0000:01:00.0: BAR 1: assigned [mem 0xe0000000-0xe7ffffff] pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff] pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff] pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff] pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff] pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff] And here's the output I get from: # cat /sys/kernel/debug/mvebu-mbus/devices [00] 00000000e8000000 - 00000000ec000000 : pcie0.0 (remap 00000000e8000000) [01] disabled [02] disabled [03] disabled [04] 00000000ff000000 - 00000000ff010000 : nand [05] 00000000f4000000 - 00000000f8000000 : vpcie [06] 00000000fe000000 - 00000000fe010000 : dragonite [07] 00000000e0000000 - 00000000e8000000 : pcie0.0 I did not get to test the whole address space thoroughly, but all the BARs are still accessible (mainly BAR0 which contains the control space and is mapped on the "new" MBUS window, and BAR1 which is the "big" one). So at least, the issues we had before are now gone. So I'd say this looks like a very promising approach. :-) Thank you, Gerlando ^ permalink raw reply related [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2014-02-21 12:24 ` Gerlando Falauto @ 2014-02-21 13:47 ` Thomas Petazzoni 2014-02-21 15:05 ` Arnd Bergmann ` (2 more replies) 0 siblings, 3 replies; 55+ messages in thread From: Thomas Petazzoni @ 2014-02-21 13:47 UTC (permalink / raw) To: linux-arm-kernel Dear Gerlando Falauto, On Fri, 21 Feb 2014 13:24:36 +0100, Gerlando Falauto wrote: > So I restored the total aperture size to 192MB. > I had to rework your patch a bit because: > > a) I'm running an older kernel and driver > b) sizes are actually 1-byte offset Hum, right. This is a bit weird, maybe I should change that, I don't think the mvebu-mbus driver should accept 1-byte offset sizes. > Here's the assignment (same as before): > > pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xebffffff] > pci 0000:01:00.0: BAR 1: assigned [mem 0xe0000000-0xe7ffffff] > pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff] > pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff] > pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff] > pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff] > pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff] > > And here's the output I get from: > > # cat /sys/kernel/debug/mvebu-mbus/devices > [00] 00000000e8000000 - 00000000ec000000 : pcie0.0 (remap 00000000e8000000) > [01] disabled > [02] disabled > [03] disabled > [04] 00000000ff000000 - 00000000ff010000 : nand > [05] 00000000f4000000 - 00000000f8000000 : vpcie > [06] 00000000fe000000 - 00000000fe010000 : dragonite > [07] 00000000e0000000 - 00000000e8000000 : pcie0.0 This seems correct: we have two windows pointing to the same device, and they have consecutive addresses. > I did not get to test the whole address space thoroughly, but all the > BARs are still accessible (mainly BAR0 which contains the control space > and is mapped on the "new" MBUS window, and BAR1 which is the "big" > one). So at least, the issues we had before are now gone. Did you check that what you read from BAR0 (which is mapped on the new MBUS window) is really what you expect, and not just the same thing as BAR1 accessible for the big window? I just want to make sure that the hardware indeed properly handles two windows for the same device. > So I'd say this looks like a very promising approach. :-) Indeed. However, I don't think this approach solves the entire problem, for two reasons: *) For small BARs that are not power-of-two sized, we may not want to consume two windows, but instead consume a little bit more address space. Using two windows to map a 96 KB BAR would be a waste of windows: using a single 128 KB window is much more efficient. *) I don't know if the algorithm to split the BAR into multiple windows is going to be trivial. Thomas -- Thomas Petazzoni, CTO, Free Electrons Embedded Linux, Kernel and Android engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2014-02-21 13:47 ` Thomas Petazzoni @ 2014-02-21 15:05 ` Arnd Bergmann 2014-02-21 15:11 ` Thomas Petazzoni 2014-02-21 16:39 ` Jason Gunthorpe 2014-02-21 18:18 ` Gerlando Falauto 2 siblings, 1 reply; 55+ messages in thread From: Arnd Bergmann @ 2014-02-21 15:05 UTC (permalink / raw) To: linux-arm-kernel On Friday 21 February 2014 14:47:08 Thomas Petazzoni wrote: > > > So I'd say this looks like a very promising approach. > > Indeed. However, I don't think this approach solves the entire problem, > for two reasons: > > *) For small BARs that are not power-of-two sized, we may not want to > consume two windows, but instead consume a little bit more address > space. Using two windows to map a 96 KB BAR would be a waste of > windows: using a single 128 KB window is much more efficient. definitely. > *) I don't know if the algorithm to split the BAR into multiple > windows is going to be trivial. The easiest solution would be to special case 'size is between 128MB+1 and 192MB' if that turns out to be the most interesting case. It's easy enough to make the second window smaller than 64MB if we want. If we want things to be a little fancier, we could use: switch (size) { case (SZ_32M+1) ... (SZ_32M+SZ_16M): size2 = size - SZ_32M; size -= SZ_32M; break; case (SZ_64M+1) ... (SZ_64M+SZ_32M): size2 = size - SZ_64M; size -= SZ_64M; break; case (SZ_128M+1) ... (SZ_128M+SZ_64M): size2 = size - SZ_128M; size -= SZ_128M; break; }; Arnd ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2014-02-21 15:05 ` Arnd Bergmann @ 2014-02-21 15:11 ` Thomas Petazzoni 2014-02-21 15:20 ` Arnd Bergmann 0 siblings, 1 reply; 55+ messages in thread From: Thomas Petazzoni @ 2014-02-21 15:11 UTC (permalink / raw) To: linux-arm-kernel Dear Arnd Bergmann, On Fri, 21 Feb 2014 16:05:16 +0100, Arnd Bergmann wrote: > > *) I don't know if the algorithm to split the BAR into multiple > > windows is going to be trivial. > > The easiest solution would be to special case 'size is between > 128MB+1 and 192MB' if that turns out to be the most interesting > case. It's easy enough to make the second window smaller than 64MB > if we want. > > If we want things to be a little fancier, we could use: > > switch (size) { > case (SZ_32M+1) ... (SZ_32M+SZ_16M): > size2 = size - SZ_32M; > size -= SZ_32M; > break; > case (SZ_64M+1) ... (SZ_64M+SZ_32M): > size2 = size - SZ_64M; > size -= SZ_64M; > break; > case (SZ_128M+1) ... (SZ_128M+SZ_64M): > size2 = size - SZ_128M; > size -= SZ_128M; > break; > }; What if the size of your BAR is 128 MB + 64 MB + 32 MB ? Then you need three windows, and your algorithm doesn't work :-) Thomas -- Thomas Petazzoni, CTO, Free Electrons Embedded Linux, Kernel and Android engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2014-02-21 15:11 ` Thomas Petazzoni @ 2014-02-21 15:20 ` Arnd Bergmann 2014-02-21 15:37 ` Thomas Petazzoni 0 siblings, 1 reply; 55+ messages in thread From: Arnd Bergmann @ 2014-02-21 15:20 UTC (permalink / raw) To: linux-arm-kernel On Friday 21 February 2014 16:11:08 Thomas Petazzoni wrote: > On Fri, 21 Feb 2014 16:05:16 +0100, Arnd Bergmann wrote: > > > > *) I don't know if the algorithm to split the BAR into multiple > > > windows is going to be trivial. > > > > The easiest solution would be to special case 'size is between > > 128MB+1 and 192MB' if that turns out to be the most interesting > > case. It's easy enough to make the second window smaller than 64MB > > if we want. > > > > If we want things to be a little fancier, we could use: > > > > switch (size) { > > case (SZ_32M+1) ... (SZ_32M+SZ_16M): > > size2 = size - SZ_32M; > > size -= SZ_32M; > > break; > > case (SZ_64M+1) ... (SZ_64M+SZ_32M): > > size2 = size - SZ_64M; > > size -= SZ_64M; > > break; > > case (SZ_128M+1) ... (SZ_128M+SZ_64M): > > size2 = size - SZ_128M; > > size -= SZ_128M; > > break; > > }; > > What if the size of your BAR is 128 MB + 64 MB + 32 MB ? Then you need > three windows, and your algorithm doesn't work I was hoping we could avoid using more than two windows. With the algorithm above we would round up to 256MB and fail if that doesn't fit, which is the same thing that happens when you run out of space. Arnd ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2014-02-21 15:20 ` Arnd Bergmann @ 2014-02-21 15:37 ` Thomas Petazzoni 0 siblings, 0 replies; 55+ messages in thread From: Thomas Petazzoni @ 2014-02-21 15:37 UTC (permalink / raw) To: linux-arm-kernel Dear Arnd Bergmann, On Fri, 21 Feb 2014 16:20:45 +0100, Arnd Bergmann wrote: > > What if the size of your BAR is 128 MB + 64 MB + 32 MB ? Then you need > > three windows, and your algorithm doesn't work > > I was hoping we could avoid using more than two windows. > With the algorithm above we would round up to 256MB and > fail if that doesn't fit, which is the same thing that > happens when you run out of space. The problem is precisely that we currently don't have any well to tell the Linux PCI core that we need to round up the size of a BAR. That's the whole starting point of the discussion :-) Thomas -- Thomas Petazzoni, CTO, Free Electrons Embedded Linux, Kernel and Android engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2014-02-21 13:47 ` Thomas Petazzoni 2014-02-21 15:05 ` Arnd Bergmann @ 2014-02-21 16:39 ` Jason Gunthorpe 2014-02-21 17:05 ` Thomas Petazzoni 2014-02-21 18:18 ` Gerlando Falauto 2 siblings, 1 reply; 55+ messages in thread From: Jason Gunthorpe @ 2014-02-21 16:39 UTC (permalink / raw) To: linux-arm-kernel On Fri, Feb 21, 2014 at 02:47:08PM +0100, Thomas Petazzoni wrote: > *) I don't know if the algorithm to split the BAR into multiple > windows is going to be trivial. physaddr_t base,size; while (size != 0) { physaddr_t window_size = 1 << log2_round_down(size); create_window(base,window_size); base += window_size; size -= window_size; } At the very worst log2_round_down is approxmiately unsigned int log2_round_down(unsigned int val) { unsigned int res = 0; while ((1<<res) <= val) res++; return res - 1; } Minimum PCI required alignment for windows is 1MB so it will always work out into some number of mbus windows.. Jason ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2014-02-21 16:39 ` Jason Gunthorpe @ 2014-02-21 17:05 ` Thomas Petazzoni 2014-02-21 17:31 ` Jason Gunthorpe 0 siblings, 1 reply; 55+ messages in thread From: Thomas Petazzoni @ 2014-02-21 17:05 UTC (permalink / raw) To: linux-arm-kernel Dear Jason Gunthorpe, On Fri, 21 Feb 2014 09:39:02 -0700, Jason Gunthorpe wrote: > On Fri, Feb 21, 2014 at 02:47:08PM +0100, Thomas Petazzoni wrote: > > *) I don't know if the algorithm to split the BAR into multiple > > windows is going to be trivial. > > physaddr_t base,size; > > while (size != 0) { > physaddr_t window_size = 1 << log2_round_down(size); > create_window(base,window_size); > base += window_size; > size -= window_size; > } > > At the very worst log2_round_down is approxmiately > > unsigned int log2_round_down(unsigned int val) > { > unsigned int res = 0; > while ((1<<res) <= val) > res++; > return res - 1; > } > > Minimum PCI required alignment for windows is 1MB so it will always > work out into some number of mbus windows.. Interesting! Thanks! Now I have another question: our mvebu_pcie_align_resource() function makes sure that the base address of the BAR is aligned on its size, because it is a requirement of MBus windows. However, if you later split the BAR into multiple windows, will this continue to work out? Let's take an example: a 96 MB BAR. If it gets put at 0xe0000000, then no problem: we create one 64 MB window at 0xe0000000 and a 32 MB window at 0xe4000000. Both base addresses are aligned on the size of the window. However, if the 96 MB BAR gets put at 0xea000000 (which is aligned on a 96 MB boundary, as required by our mvebu_pcie_align_resource). We create one 64 MB window at 0xea000000, and one 32 MB window at 0xee000000. Unfortunately, while 0xea000000 is aligned on a 96 MB boundary, it is not aligned on a 64 MB boundary, so the 64 MB window we have created is wrong. Which also makes me think that our mvebu_pcie_align_resource() function uses round_up(start, size), which most likely doesn't work with non power-of-two sizes. Thomas -- Thomas Petazzoni, CTO, Free Electrons Embedded Linux, Kernel and Android engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2014-02-21 17:05 ` Thomas Petazzoni @ 2014-02-21 17:31 ` Jason Gunthorpe 2014-02-21 18:05 ` Arnd Bergmann 0 siblings, 1 reply; 55+ messages in thread From: Jason Gunthorpe @ 2014-02-21 17:31 UTC (permalink / raw) To: linux-arm-kernel On Fri, Feb 21, 2014 at 06:05:08PM +0100, Thomas Petazzoni wrote: > Now I have another question: our mvebu_pcie_align_resource() function > makes sure that the base address of the BAR is aligned on its size, > because it is a requirement of MBus windows. However, if you later > split the BAR into multiple windows, will this continue to work out? No, you must align to (1 << log2_round_down(size)) - that will always be the largest mbus window created and thus the highest starting alignment requirement. I looked for a bit to see if I could find why the core is rounding up to 196MB and it wasn't clear to me either. Gerlando, if you instrument the code in setup-bus.c, particularly pbus_size_mem, you will probably find out. Jason ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2014-02-21 17:31 ` Jason Gunthorpe @ 2014-02-21 18:05 ` Arnd Bergmann 2014-02-21 18:29 ` Gerlando Falauto 0 siblings, 1 reply; 55+ messages in thread From: Arnd Bergmann @ 2014-02-21 18:05 UTC (permalink / raw) To: linux-arm-kernel On Friday 21 February 2014 10:31:05 Jason Gunthorpe wrote: > On Fri, Feb 21, 2014 at 06:05:08PM +0100, Thomas Petazzoni wrote: > > > Now I have another question: our mvebu_pcie_align_resource() function > > makes sure that the base address of the BAR is aligned on its size, > > because it is a requirement of MBus windows. However, if you later > > split the BAR into multiple windows, will this continue to work out? > > No, you must align to (1 << log2_round_down(size)) - that will always > be the largest mbus window created and thus the highest starting > alignment requirement. Unless you allow reordering the two windows. If you want a 96MB window, you only need 32MB alignment because you can either put the actual 64MB window first if you have 64MB alignment, or you put the 32MB window first if you don't and then the following 64MB will be aligned. It gets more complicated if you want to allow a 72MB window (16MB+64MB), as that could either be 64MB aligned or start 16MB before the next multiple of 64MB. I don't think there is any reason why code anywhere should align the window to a multiple of the size though if the size is not power-of-two, such as aligning to multiples of 96MB. That wouldn't help anyone. Arnd ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2014-02-21 18:05 ` Arnd Bergmann @ 2014-02-21 18:29 ` Gerlando Falauto 0 siblings, 0 replies; 55+ messages in thread From: Gerlando Falauto @ 2014-02-21 18:29 UTC (permalink / raw) To: linux-arm-kernel On 02/21/2014 07:05 PM, Arnd Bergmann wrote: > On Friday 21 February 2014 10:31:05 Jason Gunthorpe wrote: >> On Fri, Feb 21, 2014 at 06:05:08PM +0100, Thomas Petazzoni wrote: >> >>> Now I have another question: our mvebu_pcie_align_resource() function >>> makes sure that the base address of the BAR is aligned on its size, >>> because it is a requirement of MBus windows. However, if you later >>> split the BAR into multiple windows, will this continue to work out? >> >> No, you must align to (1 << log2_round_down(size)) - that will always >> be the largest mbus window created and thus the highest starting >> alignment requirement. > > Unless you allow reordering the two windows. If you want a 96MB > window, you only need 32MB alignment because you can either put > the actual 64MB window first if you have 64MB alignment, or you > put the 32MB window first if you don't and then the following > 64MB will be aligned. > > It gets more complicated if you want to allow a 72MB window > (16MB+64MB), as that could either be 64MB aligned or start 16MB > before the next multiple of 64MB. > > I don't think there is any reason why code anywhere should align > the window to a multiple of the size though if the size is not > power-of-two, such as aligning to multiples of 96MB. That wouldn't > help anyone. I also don't see why in the world there would be a requirement of having a given "oddly-sized" range (e.g. 96MB) aligned to a multiple of its size. In the end, AFAIK aligment requirements' only purpose is to make hardware simpler. Cant'see how aligning to an "odd" number would help achieving this purpose. But that's just me, of course. Thanks guys! Gerlando ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2014-02-21 13:47 ` Thomas Petazzoni 2014-02-21 15:05 ` Arnd Bergmann 2014-02-21 16:39 ` Jason Gunthorpe @ 2014-02-21 18:18 ` Gerlando Falauto 2014-02-21 18:45 ` Thomas Petazzoni 2 siblings, 1 reply; 55+ messages in thread From: Gerlando Falauto @ 2014-02-21 18:18 UTC (permalink / raw) To: linux-arm-kernel Dear Thomas, On 02/21/2014 02:47 PM, Thomas Petazzoni wrote: > Dear Gerlando Falauto, > > On Fri, 21 Feb 2014 13:24:36 +0100, Gerlando Falauto wrote: > >> So I restored the total aperture size to 192MB. >> I had to rework your patch a bit because: >> >> a) I'm running an older kernel and driver >> b) sizes are actually 1-byte offset > > Hum, right. This is a bit weird, maybe I should change that, I don't > think the mvebu-mbus driver should accept 1-byte offset sizes. I don't know anything about this, I only know the size dumped is of the form 0x...ffff, that's all. >> Here's the assignment (same as before): >> >> pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xebffffff] >> pci 0000:01:00.0: BAR 1: assigned [mem 0xe0000000-0xe7ffffff] >> pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff] >> pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff] >> pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff] >> pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff] >> pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff] >> >> And here's the output I get from: >> >> # cat /sys/kernel/debug/mvebu-mbus/devices >> [00] 00000000e8000000 - 00000000ec000000 : pcie0.0 (remap 00000000e8000000) >> [01] disabled >> [02] disabled >> [03] disabled >> [04] 00000000ff000000 - 00000000ff010000 : nand >> [05] 00000000f4000000 - 00000000f8000000 : vpcie >> [06] 00000000fe000000 - 00000000fe010000 : dragonite >> [07] 00000000e0000000 - 00000000e8000000 : pcie0.0 > > This seems correct: we have two windows pointing to the same device, > and they have consecutive addresses. I don't know how to interpret the (remap ... ) bit, but yes, this looks right to me as well. I just don't know why mbus window 7 gets picked before 0, but apart from that, it looks nice. >> I did not get to test the whole address space thoroughly, but all the >> BARs are still accessible (mainly BAR0 which contains the control space >> and is mapped on the "new" MBUS window, and BAR1 which is the "big" >> one). So at least, the issues we had before are now gone. > > Did you check that what you read from BAR0 (which is mapped on the new > MBUS window) is really what you expect, and not just the same thing as > BAR1 accessible for the big window? I just want to make sure that the > hardware indeed properly handles two windows for the same device. Yes, there's no way the two BARs could be aliased. It's a fairly complex FPGA design, where BAR1 is the huge address space for a PCI-to-localbus bridge (whose connected devices are recognized correctly) and BAR0 is the control BAR (and its registers are read and written without a problem). >> So I'd say this looks like a very promising approach. :-) > > Indeed. However, I don't think this approach solves the entire problem, > for two reasons: > > *) For small BARs that are not power-of-two sized, we may not want to > consume two windows, but instead consume a little bit more address > space. Using two windows to map a 96 KB BAR would be a waste of > windows: using a single 128 KB window is much more efficient. > > *) I don't know if the algorithm to split the BAR into multiple > windows is going to be trivial. I see others have already replied and I pretty much agree with them. Thanks, Gerlando ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2014-02-21 18:18 ` Gerlando Falauto @ 2014-02-21 18:45 ` Thomas Petazzoni 0 siblings, 0 replies; 55+ messages in thread From: Thomas Petazzoni @ 2014-02-21 18:45 UTC (permalink / raw) To: linux-arm-kernel Dear Gerlando Falauto, On Fri, 21 Feb 2014 19:18:25 +0100, Gerlando Falauto wrote: > > Hum, right. This is a bit weird, maybe I should change that, I don't > > think the mvebu-mbus driver should accept 1-byte offset sizes. > > I don't know anything about this, I only know the size dumped is of the > form 0x...ffff, that's all. I'll have to look into this. > >> # cat /sys/kernel/debug/mvebu-mbus/devices > >> [00] 00000000e8000000 - 00000000ec000000 : pcie0.0 (remap 00000000e8000000) > >> [01] disabled > >> [02] disabled > >> [03] disabled > >> [04] 00000000ff000000 - 00000000ff010000 : nand > >> [05] 00000000f4000000 - 00000000f8000000 : vpcie > >> [06] 00000000fe000000 - 00000000fe010000 : dragonite > >> [07] 00000000e0000000 - 00000000e8000000 : pcie0.0 > > > > This seems correct: we have two windows pointing to the same device, > > and they have consecutive addresses. > > I don't know how to interpret the (remap ... ) bit, but yes, this looks > right to me as well. I just don't know why mbus window 7 gets picked > before 0, but apart from that, it looks nice. Basically, some windows have an additional capability: they are "remappable". On Kirkwood, the first 4 windows are remappable, and the last 4 are not. Therefore, unless you request a remappable window, we allocate a non-remappable one, which is why window 4 to 7 get used first. And then, even though we don't need the remappable feature for the last window, there are no more non-remappable windows available, so window 0 gets allocated for our second PCIe window. It matches fine with the expected behavior of the mvebu-mbus driver. > > Did you check that what you read from BAR0 (which is mapped on the new > > MBUS window) is really what you expect, and not just the same thing as > > BAR1 accessible for the big window? I just want to make sure that the > > hardware indeed properly handles two windows for the same device. > > Yes, there's no way the two BARs could be aliased. It's a fairly complex > FPGA design, where BAR1 is the huge address space for a PCI-to-localbus > bridge (whose connected devices are recognized correctly) and BAR0 is > the control BAR (and its registers are read and written without a problem). Great, so it means that it really works! Thomas -- Thomas Petazzoni, CTO, Free Electrons Embedded Linux, Kernel and Android engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2014-02-20 8:55 ` Thomas Petazzoni 2014-02-20 17:35 ` Jason Gunthorpe @ 2014-02-20 19:18 ` Bjorn Helgaas 2014-02-21 0:24 ` Jason Gunthorpe 1 sibling, 1 reply; 55+ messages in thread From: Bjorn Helgaas @ 2014-02-20 19:18 UTC (permalink / raw) To: linux-arm-kernel On Thu, Feb 20, 2014 at 1:55 AM, Thomas Petazzoni <thomas.petazzoni@free-electrons.com> wrote: > Dear Bjorn Helgaas, > > + Jason Gunthorpe. > > On Wed, 19 Feb 2014 14:45:48 -0700, Bjorn Helgaas wrote: > >> > Cool. However, I am not sure my fix is really correct, because is you >> > had another PCIe device that needed 64 MB of memory space, the PCIe >> > core would have allocated addresses 0xec000000 -> 0xf0000000 to it, >> > which would have conflicted with the forced "power of 2 up-rounding" >> > we've applied on the memory space of the first device. >> > >> > Therefore, I believe this constraint should be taken into account by >> > the PCIe core when allocating the different memory regions for each >> > device. >> > >> > Bjorn, the mvebu PCIe host driver has the constraint that the I/O and >> > memory regions associated to each PCIe device of the emulated bridge >> > have a size that is a power of 2. >> > >> > I am currently using the ->align_resource() hook to ensure that the >> > start address of the resource matches certain other constraints, but I >> > don't see a way of telling the PCI core that I need the resource to >> > have its size rounded up to the next power of 2 size. Is there a way of >> > doing this? >> > >> > In the case described by Gerlando, the PCI core has assigned a 192 MB >> > region, but the Marvell hardware can only create windows that have a >> > power of two size, i.e 256 MB. Therefore, the PCI core should be told >> > this constraint, so that it doesn't allocate the next resource right >> > after the 192 MB one. >> >> I'm not sure I understand this correctly, but I *think* this 192 MB >> region that gets rounded up to 256 MB because of the Marvell >> constraint is a host bridge aperture. If that's the case, it's >> entirely up to you (the host bridge driver author) to round it as >> needed before passing it to pci_add_resource_offset(). >> >> The PCI core will never allocate any space that is outside the host >> bridge apertures. > > Hum, I believe there is a misunderstanding here. We are already using > pci_add_resource_offset() to define the global aperture for the entire > PCI bridge. This is not causing any problem. > > Let me give a little bit of background first. > > On Marvell hardware, the physical address space layout is configurable, > through the use of "MBus windows". A "MBus window" is defined by a base > address, a size, and a target device. So if the CPU needs to access a > given device (such as PCIe 0.0 for example), then we need to create a > "MBus window" whose size and target device match PCIe 0.0. I was assuming "PCIe 0.0" was a host bridge, but it sounds like maybe that's not true. Is it really a PCIe root port? That would mean the MBus windows are some non-PCIe-compliant thing between the root complex and the root ports, I guess. > Since Armada XP has 10 PCIe interfaces, we cannot just statically > create as many MBus windows as there are PCIe interfaces: it would both > exhaust the number of MBus windows available, and also exhaust the > physical address space, because we would have to create very large > windows, just in case the PCIe device plugged behind this interface > needs large BARs. Everybody else in the world *does* statically configure host bridge apertures before enumerating the devices below the bridge. I see why you want to know what devices are there before deciding whether and how large to make an MBus window. But that is new functionality that we don't have today, and the general idea is not Marvell-specific, so other systems might want something like this, too. So I'm not sure if using quirks to try to wedge it into the current PCI core is the right approach. I don't have another proposal, but we should at least think about what direction we want to take. > So, what the pci-mvebu.c driver does is that it creates an emulated PCI > bridge. This emulated bridge is used to let the Linux PCI core > enumerate the real physical PCI devices behind the bridge, allocate a > range of physical addresses that is available for each of these > devices, and write them to the bridge registers. Since the bridge is > not a real one, but emulated, but trap those writes, and use them to > create the MBus windows that will allow the CPU to actually access the > device, at the base address chosen by the Linux PCI core during the > enumeration process. > > However, MBus windows have a certain constraint that they must have a > power of two size, so the Linux PCI core should not write to one of the > bridge PCI_MEMORY_BASE / PCI_MEMORY_LIMIT registers any range of > address whose size is not a power of 2. I'm still not sure I understand what's going on here. It sounds like your emulated bridge basically wraps the host bridge and makes it look like a PCI-PCI bridge. But I assume the host bridge itself is also visible, and has apertures (I guess these are the MBus windows?) So when you first discover the host bridge, before enumerating anything below it, what apertures does it have? Do you leave them disabled until after we enumerate the devices, figure out how much space they need, and configure the emulated PCI-PCI bridge to enable the MBus windows? It'd be nice if dmesg mentioned the host bridge explicitly as we do on other architectures; maybe that would help understand what's going on under the covers. Maybe a longer excerpt would already have this; you already use pci_add_resource_offset(), which is used when creating the root bus, so you must have some sort of aperture before enumerating. > Let me take the example of Gerlando: > > pci 0000:01:00.0: BAR 1: assigned [mem 0xe0000000-0xe7ffffff] > pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff] > pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff] > pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff] > pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff] > pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff] > pci 0000:00:01.0: PCI bridge to [bus 01] > pci 0000:00:01.0: bridge window [mem 0xe0000000-0xebffffff] > > So, pci 0000:01:00 is the real device, which has a number of BARs of a > certain size. Taking into account all those BARs, the Linux PCI core > decides to assign [mem 0xe0000000-0xebffffff] to the bridge (last line > of the log above). The problem is that [mem 0xe0000000-0xebffffff] is > 192 MB, but we would like the Linux PCI core to extend that to 256 MB. If 01:00.0 is a PCIe endpoint, it must have a root port above it, so that means 00:01.0 must be the root port. But I think you're saying that 00:01.0 is actually *emulated* and isn't PCIe-compliant, e.g., it has extra window alignment restrictions. I'm scared about what other non-PCIe-compliant things there might be. What happens when the PCI core configures MPS, ASPM, etc., > As you can see it is not about the global aperture associated to the > bridge, but about the size of the window associated to each "port" of > the bridge. Bjorn ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2014-02-20 19:18 ` Bjorn Helgaas @ 2014-02-21 0:24 ` Jason Gunthorpe 2014-02-21 19:05 ` Bjorn Helgaas 0 siblings, 1 reply; 55+ messages in thread From: Jason Gunthorpe @ 2014-02-21 0:24 UTC (permalink / raw) To: linux-arm-kernel On Thu, Feb 20, 2014 at 12:18:42PM -0700, Bjorn Helgaas wrote: > > On Marvell hardware, the physical address space layout is configurable, > > through the use of "MBus windows". A "MBus window" is defined by a base > > address, a size, and a target device. So if the CPU needs to access a > > given device (such as PCIe 0.0 for example), then we need to create a > > "MBus window" whose size and target device match PCIe 0.0. > > I was assuming "PCIe 0.0" was a host bridge, but it sounds like maybe > that's not true. Is it really a PCIe root port? That would mean the > MBus windows are some non-PCIe-compliant thing between the root > complex and the root ports, I guess. It really is a root port. The hardware acts like a root port at the TLP level. It has all the root port specific stuff in some format but critically, completely lacks a compliant config space for a root port bridge. So the driver creates a 'compliant' config space for the root port. Building the config space requires harmonizing registers related to the PCI-E and registers related to internal routing and dealing with the mismatch between what the hardware can actualy provide and what the PCI spec requires it provide. The only mismatch that gets exposed to the PCI core we know about is the bridge window address alignment restrictions. This is what Thomas has been asking about. > > Since Armada XP has 10 PCIe interfaces, we cannot just statically > > create as many MBus windows as there are PCIe interfaces: it would both > > exhaust the number of MBus windows available, and also exhaust the > > physical address space, because we would have to create very large > > windows, just in case the PCIe device plugged behind this interface > > needs large BARs. > > Everybody else in the world *does* statically configure host bridge > apertures before enumerating the devices below the bridge. The original PCI-E driver for this hardware did use a 1 root port per host bridge model, with static host bridge aperture allocation and so forth. It works fine, just like everyone else in the world, as long as you have only 1 or 2 ports. The XP hardware had *10* ports on a single 32 bit machine. You run out of address space, you run out of HW routing resources, it just doesn't work acceptably. > I see why you want to know what devices are there before deciding > whether and how large to make an MBus window. But that is new > functionality that we don't have today, and the general idea is not Well, in general, it isn't new core functionality, it is functionality that already exists to support PCI bridges. Choosing to use a one host bridge to N root port bridge model lets the driver use all that functionality and the only wrinkle that becomes visible to the PCI core as a whole is the non-compliant alignment restriction on the bridge window BAR. This also puts the driver in alignment with the PCI-E specs for root complexes, which means user space can actually see things like the PCI-E root port link capability block and makes it hot plug work properly (I am actively using hot plug with this driver) I personaly think this is a reasonable way to support this highly flexible HW. > I'm still not sure I understand what's going on here. It sounds like > your emulated bridge basically wraps the host bridge and makes it look > like a PCI-PCI bridge. But I assume the host bridge itself is also > visible, and has apertures (I guess these are the MBus windows?) No, there is only one bridge, it is a per-physical-port MBUS / PCI-E bridge. It performs an identical function to the root port bridge described in PCI-E. MBUS serves as the root-complex internal bus 0. There isn't 2 levels of bridging, so the MBUS / PCI-E bridge can claim any system address and there is no such thing as a 'host bridge'. What Linux calls 'the host bridge aperture' is simply a wack of otherwise unused physical address space, it has no special properties. > It'd be nice if dmesg mentioned the host bridge explicitly as we do on > other architectures; maybe that would help understand what's going on > under the covers. Maybe a longer excerpt would already have this; you > already use pci_add_resource_offset(), which is used when creating the > root bus, so you must have some sort of aperture before enumerating. Well, /proc/iomem looks like this: e0000000-efffffff : PCI MEM 0000 e0000000-e00fffff : PCI Bus 0000:01 e0000000-e001ffff : 0000:01:00.0 'PCI MEM 0000' is the 'host bridge aperture' it is an arbitary range of address space that doesn't overlap anything. 'PCI Bus 0000:01' is the MBUS / PCI-E root port bridge for physical port 0 '0000:01:00.0' is BAR 0 of an an off-chip device. > If 01:00.0 is a PCIe endpoint, it must have a root port above it, so > that means 00:01.0 must be the root port. But I think you're saying > that 00:01.0 is actually *emulated* and isn't PCIe-compliant, e.g., it > has extra window alignment restrictions. It is important to understand that the emulation is only of the root port bridge configuration space. The underlying TLP processing is done in HW and is compliant. > I'm scared about what other non-PCIe-compliant things there might > be. What happens when the PCI core configures MPS, ASPM, etc., As the TLP processing and the underlying PHY are all compliant these things are all supported in HW. MPS is supported directly by the HW ASPM is supported by the HW, as is the entire link capability and status block. AER is supported directly by the HW But here is the thing, without the software emulated config space there would be no sane way for the Linux PCI core to access these features. The HW simply does not present them in a way that the core code can understand without a SW intervention of some kind. Jason ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2014-02-21 0:24 ` Jason Gunthorpe @ 2014-02-21 19:05 ` Bjorn Helgaas 2014-02-21 19:21 ` Thomas Petazzoni 2014-02-21 19:53 ` Benjamin Herrenschmidt 0 siblings, 2 replies; 55+ messages in thread From: Bjorn Helgaas @ 2014-02-21 19:05 UTC (permalink / raw) To: linux-arm-kernel [+cc Gavin, Ben for EEH alignment question below] On Thu, Feb 20, 2014 at 5:24 PM, Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote: > On Thu, Feb 20, 2014 at 12:18:42PM -0700, Bjorn Helgaas wrote: > >> > On Marvell hardware, the physical address space layout is configurable, >> > through the use of "MBus windows". A "MBus window" is defined by a base >> > address, a size, and a target device. So if the CPU needs to access a >> > given device (such as PCIe 0.0 for example), then we need to create a >> > "MBus window" whose size and target device match PCIe 0.0. > ... > So the driver creates a 'compliant' config space for the root > port. Building the config space requires harmonizing registers related > to the PCI-E and registers related to internal routing and dealing > with the mismatch between what the hardware can actualy provide and > what the PCI spec requires it provide. > ... >> > Since Armada XP has 10 PCIe interfaces, we cannot just statically >> > create as many MBus windows as there are PCIe interfaces: it would both >> > exhaust the number of MBus windows available, and also exhaust the >> > physical address space, because we would have to create very large >> > windows, just in case the PCIe device plugged behind this interface >> > needs large BARs. > ... >> I'm still not sure I understand what's going on here. It sounds like >> your emulated bridge basically wraps the host bridge and makes it look >> like a PCI-PCI bridge. But I assume the host bridge itself is also >> visible, and has apertures (I guess these are the MBus windows?) > > No, there is only one bridge, it is a per-physical-port MBUS / PCI-E > bridge. It performs an identical function to the root port bridge > described in PCI-E. MBUS serves as the root-complex internal bus 0. > > There isn't 2 levels of bridging, so the MBUS / PCI-E bridge can > claim any system address and there is no such thing as a 'host > bridge'. > > What Linux calls 'the host bridge aperture' is simply a wack of > otherwise unused physical address space, it has no special properties. > >> It'd be nice if dmesg mentioned the host bridge explicitly as we do on >> other architectures; maybe that would help understand what's going on >> under the covers. Maybe a longer excerpt would already have this; you >> already use pci_add_resource_offset(), which is used when creating the >> root bus, so you must have some sort of aperture before enumerating. > > Well, /proc/iomem looks like this: > > e0000000-efffffff : PCI MEM 0000 > e0000000-e00fffff : PCI Bus 0000:01 > e0000000-e001ffff : 0000:01:00.0 > > 'PCI MEM 0000' is the 'host bridge aperture' it is an arbitary > range of address space that doesn't overlap anything. > > 'PCI Bus 0000:01' is the MBUS / PCI-E root port bridge for physical > port 0 Thanks for making this more concrete. Let me see if I understand any better: - e0000000-efffffff is the "host bridge aperture" but it doesn't correspond to an actual aperture in hardware (there are no registers where you set this range). The only real use for this range is to be the arena within which the PCI core can assign space to the Root Ports. This is static and you don't need to change it based on what devices we discover. - There may be several MBus/PCIe Root Ports, and you want to configure their apertures at enumeration-time based on what devices are below them. As you say, the PCI core supports this except that MBus apertures must be a power-of-two in size and aligned on their size, while ordinary PCI bridge windows only need to start and end on 1MB boundaries. - e0000000-e00fffff is an example of one MBus/PCIe aperture, and this space is available on PCI bus 01. This one happens to be 1MB in size, but it could be 2MB, 4MB, etc., but not 3MB like a normal bridge window could be. - You're currently using the ARM ->align_resource() hook (part of pcibios_align_resource()), which is used in the bowels of the allocator (__find_resource()) and affects the starting address of the region we allocate, but not the size. So you can force the start of an MBus aperture to be power-of-two aligned, but not the end. The allocate_resource() alignf argument is only used by PCI and PCMCIA, so it doesn't seem like it would be too terrible to extend the alignf interface so it could control the size, too. Would something like that solve this problem? I first wondered if you could use pcibios_window_alignment(), but it doesn't know the amount of space we need below the bridge, and it also can't affect the size of the window or the ending address, so I don't think it will help. But I wonder if powerpc has a similar issue here: I think EEH might need, for example 16MB bridge window alignment. Since pcibios_window_alignment() only affects the *starting* address, could the core assign a 9MB window whose starting address is 16MB-aligned? Could EEH deal with that? What if the PCI core assigned the space right after the 9MB window to another device? Bjorn ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2014-02-21 19:05 ` Bjorn Helgaas @ 2014-02-21 19:21 ` Thomas Petazzoni 2014-02-21 19:53 ` Benjamin Herrenschmidt 1 sibling, 0 replies; 55+ messages in thread From: Thomas Petazzoni @ 2014-02-21 19:21 UTC (permalink / raw) To: linux-arm-kernel Dear Bjorn Helgaas, On Fri, 21 Feb 2014 12:05:49 -0700, Bjorn Helgaas wrote: > Thanks for making this more concrete. Let me see if I understand any better: Good to see that Jason Gunthorpe could explain this in better words. I'll try to answer to your questions below, I'm sure Jason will correct me if I say incorrect things, or things that are imprecise. > - e0000000-efffffff is the "host bridge aperture" but it doesn't > correspond to an actual aperture in hardware (there are no registers > where you set this range). The only real use for this range is to be > the arena within which the PCI core can assign space to the Root > Ports. This is static and you don't need to change it based on what > devices we discover. Correct. We don't configure this in any hardware register. We just give this aperture to the Linux PCI core to tell it "please allocate all BAR physical ranges from this global aperture". > - There may be several MBus/PCIe Root Ports, and you want to configure > their apertures at enumeration-time based on what devices are below > them. As you say, the PCI core supports this except that MBus > apertures must be a power-of-two in size and aligned on their size, > while ordinary PCI bridge windows only need to start and end on 1MB > boundaries. Exactly. > - e0000000-e00fffff is an example of one MBus/PCIe aperture, and this > space is available on PCI bus 01. This one happens to be 1MB in size, > but it could be 2MB, 4MB, etc., but not 3MB like a normal bridge > window could be. Absolutely. Note that we have the possibility of mapping a 3 MB BAR, by using a 2 MB window followed by a 1 MB window. However, since the number of windows is limited (8 on Kirkwood, 20 on Armada 370/XP), we will prefer to enlarge the BAR size if it's size is fairly small, and only resort to using multiple windows if the amount of lost physical space is big. So, for a 3 MB BAR, we will definitely prefer to extend it to a single 4 MB window, because losing 1 MB of physical address space is preferable over losing one window. For a 192 MB BAR, we may prefer to use one 128 MB window followed by one 64 MB window. But as long as the pci-mvebu driver can control the size of the BAR, it can decide on its own whether its prefers enlarging the BAR, or using multiple windows. > - You're currently using the ARM ->align_resource() hook (part of > pcibios_align_resource()), which is used in the bowels of the > allocator (__find_resource()) and affects the starting address of the > region we allocate, but not the size. So you can force the start of > an MBus aperture to be power-of-two aligned, but not the end. Correct. Happy to see that we've managed to get an understanding on what the problem. > The allocate_resource() alignf argument is only used by PCI and > PCMCIA, so it doesn't seem like it would be too terrible to extend the > alignf interface so it could control the size, too. Would something > like that solve this problem? I don't know, I would have to look more precisely into this alignf argument, and see how it could be extended to solve our constraints. > I first wondered if you could use pcibios_window_alignment(), but it > doesn't know the amount of space we need below the bridge, and it also > can't affect the size of the window or the ending address, so I don't > think it will help. > > But I wonder if powerpc has a similar issue here: I think EEH might > need, for example 16MB bridge window alignment. Since > pcibios_window_alignment() only affects the *starting* address, could > the core assign a 9MB window whose starting address is 16MB-aligned? > Could EEH deal with that? What if the PCI core assigned the space > right after the 9MB window to another device? I'll let the other PCI people answer this :-) Thanks a lot for your feedback! Thomas -- Thomas Petazzoni, CTO, Free Electrons Embedded Linux, Kernel and Android engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2014-02-21 19:05 ` Bjorn Helgaas 2014-02-21 19:21 ` Thomas Petazzoni @ 2014-02-21 19:53 ` Benjamin Herrenschmidt 2014-02-23 3:43 ` Gavin Shan 1 sibling, 1 reply; 55+ messages in thread From: Benjamin Herrenschmidt @ 2014-02-21 19:53 UTC (permalink / raw) To: linux-arm-kernel On Fri, 2014-02-21 at 12:05 -0700, Bjorn Helgaas wrote: > But I wonder if powerpc has a similar issue here: I think EEH might > need, for example 16MB bridge window alignment. Since > pcibios_window_alignment() only affects the *starting* address, could > the core assign a 9MB window whose starting address is 16MB-aligned? > Could EEH deal with that? What if the PCI core assigned the space > right after the 9MB window to another device? Gavin, did you guys deal with that at all ? Are we aligning the size as well somewhat ? Cheers, Ben. ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2014-02-21 19:53 ` Benjamin Herrenschmidt @ 2014-02-23 3:43 ` Gavin Shan 0 siblings, 0 replies; 55+ messages in thread From: Gavin Shan @ 2014-02-23 3:43 UTC (permalink / raw) To: linux-arm-kernel On Sat, Feb 22, 2014 at 06:53:23AM +1100, Benjamin Herrenschmidt wrote: >On Fri, 2014-02-21 at 12:05 -0700, Bjorn Helgaas wrote: >> But I wonder if powerpc has a similar issue here: I think EEH might >> need, for example 16MB bridge window alignment. Since >> pcibios_window_alignment() only affects the *starting* address, could >> the core assign a 9MB window whose starting address is 16MB-aligned? >> Could EEH deal with that? What if the PCI core assigned the space >> right after the 9MB window to another device? > >Gavin, did you guys deal with that at all ? Are we aligning the size as >well somewhat ? > Yeah, we can handle it well because pcibios_window_alignment() affects both starting address and size of PCI bridge window. More details could be found in pci/drivers/setup-bus.c::pbus_size_mem(): starting address, "size0", "size1", "size1-size0" are aligned to "min_align", which is coming from pcibios_window_alignment() (16MB as mentioned). Thanks, Gavin ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2013-07-10 16:15 pci-mvebu driver on km_kirkwood Gerlando Falauto 2013-07-10 16:57 ` Thomas Petazzoni @ 2013-07-31 8:03 ` Thomas Petazzoni 2013-07-31 8:26 ` Gerlando Falauto 1 sibling, 1 reply; 55+ messages in thread From: Thomas Petazzoni @ 2013-07-31 8:03 UTC (permalink / raw) To: linux-arm-kernel Dear Gerlando Falauto, On Wed, 10 Jul 2013 18:15:32 +0200, Gerlando Falauto wrote: > I am trying to use the pci-mvebu driver on one of our km_kirkwood > boards. The board is based on Marvell's 98dx4122, which should > essentially be 6281 compatible. In the end, did you manage to get the pci-mvebu driver to work on your platform? Thanks, Thomas -- Thomas Petazzoni, Free Electrons Kernel, drivers, real-time and embedded Linux development, consulting, training and support. http://free-electrons.com ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2013-07-31 8:03 ` Thomas Petazzoni @ 2013-07-31 8:26 ` Gerlando Falauto 2013-07-31 9:00 ` Thomas Petazzoni 0 siblings, 1 reply; 55+ messages in thread From: Gerlando Falauto @ 2013-07-31 8:26 UTC (permalink / raw) To: linux-arm-kernel Hi Thomas, On 07/31/2013 10:03 AM, Thomas Petazzoni wrote: > Dear Gerlando Falauto, > > On Wed, 10 Jul 2013 18:15:32 +0200, Gerlando Falauto wrote: > >> I am trying to use the pci-mvebu driver on one of our km_kirkwood >> boards. The board is based on Marvell's 98dx4122, which should >> essentially be 6281 compatible. > > In the end, did you manage to get the pci-mvebu driver to work on your > platform? Yes, I did -- though I didn't go much beyond simple device probing (i.e. no real, intense usage of devices). AND I'm not using the DT-based mbus driver (i.e. addresses are still hardcoded within the source code). Actually, the main reason for trying to use this driver was because I wanted to model a PCIe *device* within the device tree, so to expose its GPIOs and IRQs to be referenced (through phandles) from other device tree nodes. The way I understand it, turns out this is not the way to go, as PCI/PCIe are essentially enumerated busses, so you're not supposed to -and it's not a trivial task to- put any information about real devices within the device tree. Do you have any suggestion about that? On the other hand, for our use case I'm afraid there might be some hardcoded values within drivers or userspace code, where a certain PCIe device is expected to be connected within a given bus number with a given device number (bleah!). If I understand correctly, your driver creates a virtual PCI-to-PCI bridge, so our devices would be connected to BUS #1 as opposed to #0 -- which might break existing (cr*ee*ppy) code. But that's not your fault of course. If you're interested, I can keep you posted as soon as we proceed further with this (most likely in September or so). Next step would be to test Ezequiel's MBus DT binding [PATCH v8], but I'm afraid that'll have to wait too, until end of August or so, as I am about to leave for vacation. Thank you! Gerlando ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2013-07-31 8:26 ` Gerlando Falauto @ 2013-07-31 9:00 ` Thomas Petazzoni 2013-07-31 20:50 ` Jason Gunthorpe 0 siblings, 1 reply; 55+ messages in thread From: Thomas Petazzoni @ 2013-07-31 9:00 UTC (permalink / raw) To: linux-arm-kernel Dear Gerlando Falauto, [ Device Tree mailing list readers: there is a question for you below. ] On Wed, 31 Jul 2013 10:26:44 +0200, Gerlando Falauto wrote: > >> I am trying to use the pci-mvebu driver on one of our km_kirkwood > >> boards. The board is based on Marvell's 98dx4122, which should > >> essentially be 6281 compatible. > > > > In the end, did you manage to get the pci-mvebu driver to work on your > > platform? > > Yes, I did -- though I didn't go much beyond simple device probing (i.e. > no real, intense usage of devices). Ok, good. > AND I'm not using the DT-based mbus > driver (i.e. addresses are still hardcoded within the source code). Ok, that will be the next step, but I don't expect you to face many issues. The DT-based mbus doesn't change much the internal logic, it's really just the DT representation that's different. On the other hand, the new PCIe driver was completely changing the internal logic, by adding the emulated PCI-to-PCI bridge. > Actually, the main reason for trying to use this driver was because I > wanted to model a PCIe *device* within the device tree, so to expose its > GPIOs and IRQs to be referenced (through phandles) from other device > tree nodes. The way I understand it, turns out this is not the way to > go, as PCI/PCIe are essentially enumerated busses, so you're not > supposed to -and it's not a trivial task to- put any information about > real devices within the device tree. > Do you have any suggestion about that? Indeed, PCI/PCIe devices are enumerated dynamically, so they are not listed in the Device Tree, so there's no way to "attach" more information to them. Device Tree people, any suggestion about the above question? > On the other hand, for our use case I'm afraid there might be some > hardcoded values within drivers or userspace code, where a certain PCIe > device is expected to be connected within a given bus number with a > given device number (bleah!). > If I understand correctly, your driver creates a virtual PCI-to-PCI > bridge, so our devices would be connected to BUS #1 as opposed to #0 -- > which might break existing (cr*ee*ppy) code. > But that's not your fault of course. Yeah, I believe normally userspace code shouldn't rely on a particular PCI bus topology. > If you're interested, I can keep you posted as soon as we proceed > further with this (most likely in September or so). Sure. > Next step would be to test Ezequiel's MBus DT binding [PATCH v8], but > I'm afraid that'll have to wait too, until end of August or so, as I am > about to leave for vacation. Ok, thanks! Thomas -- Thomas Petazzoni, Free Electrons Kernel, drivers, real-time and embedded Linux development, consulting, training and support. http://free-electrons.com ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2013-07-31 9:00 ` Thomas Petazzoni @ 2013-07-31 20:50 ` Jason Gunthorpe 2013-08-09 14:01 ` Thierry Reding 0 siblings, 1 reply; 55+ messages in thread From: Jason Gunthorpe @ 2013-07-31 20:50 UTC (permalink / raw) To: linux-arm-kernel On Wed, Jul 31, 2013 at 11:00:45AM +0200, Thomas Petazzoni wrote: > > Actually, the main reason for trying to use this driver was because I > > wanted to model a PCIe *device* within the device tree, so to expose its > > GPIOs and IRQs to be referenced (through phandles) from other device > > tree nodes. The way I understand it, turns out this is not the way to > > go, as PCI/PCIe are essentially enumerated busses, so you're not > > supposed to -and it's not a trivial task to- put any information about > > real devices within the device tree. > > Do you have any suggestion about that? > > Indeed, PCI/PCIe devices are enumerated dynamically, so they are not > listed in the Device Tree, so there's no way to "attach" more > information to them. > > Device Tree people, any suggestion about the above question? No, that isn't true. Device tree can include the discovered PCI devices, you have to use the special reg encoding and all that weirdness, but it does work. The of_node will be attached to the struct pci device automatically. On server/etc DT platforms the firmware will do PCI discovery and resource assignment then dump all those results into DT for the OS to reference. This is a major reason why we wanted to see the standard PCI DT be used for Marvell/etc, the existing infrastructure for this is valuable. AFAIK, Thierry has tested this on tegra, and I am doing it on Kirkwood (though not yet with the new driver). It is useful for exactly the reason stated - you can describe GPIOs, I2C busses, etc, etc in DT and then upon load of the PCI driver engage the DT code to populate and connect all that downstream infrastructure. I understand someday DT overlays might be a better alternative for this, but AFAIK today in mainline this is what we have.. That said, the guideline to not include discoverable information in DT is a good guideline for upstream DTs.. Jason ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2013-07-31 20:50 ` Jason Gunthorpe @ 2013-08-09 14:01 ` Thierry Reding 2013-08-26 9:27 ` Gerlando Falauto 0 siblings, 1 reply; 55+ messages in thread From: Thierry Reding @ 2013-08-09 14:01 UTC (permalink / raw) To: linux-arm-kernel On Wed, Jul 31, 2013 at 02:50:34PM -0600, Jason Gunthorpe wrote: > On Wed, Jul 31, 2013 at 11:00:45AM +0200, Thomas Petazzoni wrote: > > > > Actually, the main reason for trying to use this driver was because I > > > wanted to model a PCIe *device* within the device tree, so to expose its > > > GPIOs and IRQs to be referenced (through phandles) from other device > > > tree nodes. The way I understand it, turns out this is not the way to > > > go, as PCI/PCIe are essentially enumerated busses, so you're not > > > supposed to -and it's not a trivial task to- put any information about > > > real devices within the device tree. > > > Do you have any suggestion about that? > > > > Indeed, PCI/PCIe devices are enumerated dynamically, so they are not > > listed in the Device Tree, so there's no way to "attach" more > > information to them. > > > > Device Tree people, any suggestion about the above question? > > No, that isn't true. > > Device tree can include the discovered PCI devices, you have to use > the special reg encoding and all that weirdness, but it does work. The > of_node will be attached to the struct pci device automatically. > > On server/etc DT platforms the firmware will do PCI discovery and > resource assignment then dump all those results into DT for the OS to > reference. > > This is a major reason why we wanted to see the standard PCI DT be > used for Marvell/etc, the existing infrastructure for this is > valuable. > > AFAIK, Thierry has tested this on tegra, and I am doing it on Kirkwood > (though not yet with the new driver). > > It is useful for exactly the reason stated - you can describe GPIOs, > I2C busses, etc, etc in DT and then upon load of the PCI driver engage > the DT code to populate and connect all that downstream > infrastructure. Obviously this doesn't work in general purpose systems because the PCI hierarchy needs to be hardcoded in the DT. If you start adding and removing PCI devices that will likely change the hierarchy and break this matching of PCI device to DT node. It's quite unlikely to have a need to hook up GPIOs or IRQs via DT in a general purpose system, though, so I don't really see that being a big problem. Thierry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 836 bytes Desc: not available URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20130809/52a28f3e/attachment.sig> ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2013-08-09 14:01 ` Thierry Reding @ 2013-08-26 9:27 ` Gerlando Falauto 2013-08-26 12:02 ` Thierry Reding 0 siblings, 1 reply; 55+ messages in thread From: Gerlando Falauto @ 2013-08-26 9:27 UTC (permalink / raw) To: linux-arm-kernel Hi guys [particularly Jason and Thierry], sorry for the prolonged silence, here I am back again... On 08/09/2013 04:01 PM, Thierry Reding wrote: > On Wed, Jul 31, 2013 at 02:50:34PM -0600, Jason Gunthorpe wrote: >> On Wed, Jul 31, 2013 at 11:00:45AM +0200, Thomas Petazzoni wrote: >> >>>> Actually, the main reason for trying to use this driver was because I >>>> wanted to model a PCIe *device* within the device tree, so to expose its >>>> GPIOs and IRQs to be referenced (through phandles) from other device >>>> tree nodes. The way I understand it, turns out this is not the way to >>>> go, as PCI/PCIe are essentially enumerated busses, so you're not >>>> supposed to -and it's not a trivial task to- put any information about >>>> real devices within the device tree. >>>> Do you have any suggestion about that? >>> >>> Indeed, PCI/PCIe devices are enumerated dynamically, so they are not >>> listed in the Device Tree, so there's no way to "attach" more >>> information to them. >>> >>> Device Tree people, any suggestion about the above question? >> >> No, that isn't true. >> >> Device tree can include the discovered PCI devices, you have to use >> the special reg encoding and all that weirdness, but it does work. The >> of_node will be attached to the struct pci device automatically. So you mean that, assuming I knew the topology, I could populate the device tree in advance (e.g. statically), so that it already includes *devices* which will be further discovered during probing? Or else you mean the {firmware,u-boot} can do that prior to starting the OS? If either of the above is true, could you please suggest some example (or some way to get one)? I assume the "reg" property (and the after-"@" node name) will need to encode (at least) the device number, is that right? I tried reading the "PCI Bus Binding to Open Firmware" but I could not make complete sense out of it... >> On server/etc DT platforms the firmware will do PCI discovery and >> resource assignment then dump all those results into DT for the OS to >> reference. >> >> This is a major reason why we wanted to see the standard PCI DT be >> used for Marvell/etc, the existing infrastructure for this is >> valuable. >> >> AFAIK, Thierry has tested this on tegra, and I am doing it on Kirkwood >> (though not yet with the new driver). Could you please give a pointer to some example of this? I'm not quite sure I understand what you guys are talking about. >> >> It is useful for exactly the reason stated - you can describe GPIOs, >> I2C busses, etc, etc in DT and then upon load of the PCI driver engage >> the DT code to populate and connect all that downstream >> infrastructure. I'm not 100% sure I made myself clear though. What I would like to do is to have *other* parts of the device tree be able to reference (i.e., connect to, through phandles) a PCI device (because it provides a GPIO, for instance). Is that also what you mean? > Obviously this doesn't work in general purpose systems because the PCI > hierarchy needs to be hardcoded in the DT. If you start adding and > removing PCI devices that will likely change the hierarchy and break > this matching of PCI device to DT node. Yes, I guess in that case (if ever) we would need some other way that the device number (is that the same as the physical slot?) to specify a particular "hotplug" device (i.e. maybe a serial number or so)? But that's definitely out of scope here. > > It's quite unlikely to have a need to hook up GPIOs or IRQs via DT in a > general purpose system, though, so I don't really see that being a big > problem. Agreed. > > Thierry Thanks again for your patience... Gerlando ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2013-08-26 9:27 ` Gerlando Falauto @ 2013-08-26 12:02 ` Thierry Reding 2013-08-26 14:49 ` Gerlando Falauto 0 siblings, 1 reply; 55+ messages in thread From: Thierry Reding @ 2013-08-26 12:02 UTC (permalink / raw) To: linux-arm-kernel On Mon, Aug 26, 2013 at 11:27:06AM +0200, Gerlando Falauto wrote: > Hi guys [particularly Jason and Thierry], > > sorry for the prolonged silence, here I am back again... > > On 08/09/2013 04:01 PM, Thierry Reding wrote: > >On Wed, Jul 31, 2013 at 02:50:34PM -0600, Jason Gunthorpe wrote: > >>On Wed, Jul 31, 2013 at 11:00:45AM +0200, Thomas Petazzoni wrote: > >> > >>>>Actually, the main reason for trying to use this driver was because I > >>>>wanted to model a PCIe *device* within the device tree, so to expose its > >>>>GPIOs and IRQs to be referenced (through phandles) from other device > >>>>tree nodes. The way I understand it, turns out this is not the way to > >>>>go, as PCI/PCIe are essentially enumerated busses, so you're not > >>>>supposed to -and it's not a trivial task to- put any information about > >>>>real devices within the device tree. > >>>>Do you have any suggestion about that? > >>> > >>>Indeed, PCI/PCIe devices are enumerated dynamically, so they are not > >>>listed in the Device Tree, so there's no way to "attach" more > >>>information to them. > >>> > >>>Device Tree people, any suggestion about the above question? > >> > >>No, that isn't true. > >> > >>Device tree can include the discovered PCI devices, you have to use > >>the special reg encoding and all that weirdness, but it does work. The > >>of_node will be attached to the struct pci device automatically. > > So you mean that, assuming I knew the topology, I could populate the > device tree in advance (e.g. statically), so that it already > includes *devices* which will be further discovered during probing? > Or else you mean the {firmware,u-boot} can do that prior to starting the OS? > If either of the above is true, could you please suggest some > example (or some way to get one)? > I assume the "reg" property (and the after-"@" node name) will need > to encode (at least) the device number, is that right? > > I tried reading the "PCI Bus Binding to Open Firmware" but I could > not make complete sense out of it... You can find an example of this here: https://gitorious.org/thierryreding/linux/commit/b85c03d73288f6e376fc158ceac30f29680b4192 It's been quite some time that I've actually tested that, but it used to work properly. What you basically need to do is represent the whole bus hierarchy within the DT. In the above example there's the top-level root port (pci at 1,0), which provides a bus (1) on which there's a switch named pci at 0,0. That switch provides another bus (2) on which more devices are listed (pci@[012345],0). Those are all downstream ports providing separate busses again and have a single device attached to them. You can pretty much arbitrarily nest nodes that way to represent any hierarchy you want. The tricky part is to get the node numbering right but `lspci -t' helps quite a bit with that. > >>It is useful for exactly the reason stated - you can describe GPIOs, > >>I2C busses, etc, etc in DT and then upon load of the PCI driver engage > >>the DT code to populate and connect all that downstream > >>infrastructure. > > I'm not 100% sure I made myself clear though. > What I would like to do is to have *other* parts of the device tree > be able to reference (i.e., connect to, through phandles) a PCI > device (because it provides a GPIO, for instance). > Is that also what you mean? Yes. In the example above you'll see that there's actually a GPIO controller (pci at 1,0/pci at 0,0/pci at 2,0/pci at 0,0), so you could simply associate a phandle with it, as in: gpioext: pci at 0,0 { ... }; And then hook up other devices to it using the regular notation: foo { ... enable-gpios = <&gpioext 0 0>; ... }; That's not done in this example but I've actually used something very similar to that on an x86 platform to hook up the reset pin of an I2C touchscreen controller to a GPIO controller, where both the I2C and GPIO controllers were on the PCI bus. I can't find that snippet right now, but I can look more thoroughly if the above doesn't help you at all. Thierry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 836 bytes Desc: not available URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20130826/16513cf2/attachment-0001.sig> ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2013-08-26 12:02 ` Thierry Reding @ 2013-08-26 14:49 ` Gerlando Falauto 2013-08-26 19:16 ` Jason Gunthorpe 0 siblings, 1 reply; 55+ messages in thread From: Gerlando Falauto @ 2013-08-26 14:49 UTC (permalink / raw) To: linux-arm-kernel Hi Thierry, On 08/26/2013 02:02 PM, Thierry Reding wrote: > On Mon, Aug 26, 2013 at 11:27:06AM +0200, Gerlando Falauto wrote: >> Hi guys [particularly Jason and Thierry], >> >> sorry for the prolonged silence, here I am back again... >> >> On 08/09/2013 04:01 PM, Thierry Reding wrote: >>> On Wed, Jul 31, 2013 at 02:50:34PM -0600, Jason Gunthorpe wrote: >>>> On Wed, Jul 31, 2013 at 11:00:45AM +0200, Thomas Petazzoni wrote: >>>> >>>>>> Actually, the main reason for trying to use this driver was because I >>>>>> wanted to model a PCIe *device* within the device tree, so to expose its >>>>>> GPIOs and IRQs to be referenced (through phandles) from other device >>>>>> tree nodes. The way I understand it, turns out this is not the way to >>>>>> go, as PCI/PCIe are essentially enumerated busses, so you're not >>>>>> supposed to -and it's not a trivial task to- put any information about >>>>>> real devices within the device tree. >>>>>> Do you have any suggestion about that? >>>>> >>>>> Indeed, PCI/PCIe devices are enumerated dynamically, so they are not >>>>> listed in the Device Tree, so there's no way to "attach" more >>>>> information to them. >>>>> >>>>> Device Tree people, any suggestion about the above question? >>>> >>>> No, that isn't true. >>>> >>>> Device tree can include the discovered PCI devices, you have to use >>>> the special reg encoding and all that weirdness, but it does work. The >>>> of_node will be attached to the struct pci device automatically. >> >> So you mean that, assuming I knew the topology, I could populate the >> device tree in advance (e.g. statically), so that it already >> includes *devices* which will be further discovered during probing? >> Or else you mean the {firmware,u-boot} can do that prior to starting the OS? >> If either of the above is true, could you please suggest some >> example (or some way to get one)? >> I assume the "reg" property (and the after-"@" node name) will need >> to encode (at least) the device number, is that right? >> >> I tried reading the "PCI Bus Binding to Open Firmware" but I could >> not make complete sense out of it... > > You can find an example of this here: > > https://gitorious.org/thierryreding/linux/commit/b85c03d73288f6e376fc158ceac30f29680b4192 > Thanks for your precious feedback. Unfortunately gitorious' servers are offline right now... :-( > It's been quite some time that I've actually tested that, but it used to > work properly. What you basically need to do is represent the whole bus > hierarchy within the DT. In the above example there's the top-level root > port (pci at 1,0), which provides a bus (1) on which there's a switch named > pci at 0,0. That switch provides another bus (2) on which more devices are > listed (pci@[012345],0). Those are all downstream ports providing > separate busses again and have a single device attached to them. > > You can pretty much arbitrarily nest nodes that way to represent any > hierarchy you want. The tricky part is to get the node numbering right > but `lspci -t' helps quite a bit with that. One last question though... what does then the numbering ("@a,b") stand for? I assume if the output of a plain (i.e. no params) 'lspci' is bb:dd.f (bus:device.function) I should only have a "pci at dd,f" node, with the bus numbering being imposed by the hierarchy after an actual probing, right? So the actual bus number is never listed in the device tree (whereas the "@device,function" is). Is that right? >>>> It is useful for exactly the reason stated - you can describe GPIOs, >>>> I2C busses, etc, etc in DT and then upon load of the PCI driver engage >>>> the DT code to populate and connect all that downstream >>>> infrastructure. >> >> I'm not 100% sure I made myself clear though. >> What I would like to do is to have *other* parts of the device tree >> be able to reference (i.e., connect to, through phandles) a PCI >> device (because it provides a GPIO, for instance). >> Is that also what you mean? > > Yes. In the example above you'll see that there's actually a GPIO > controller (pci at 1,0/pci at 0,0/pci at 2,0/pci at 0,0), so you could simply > associate a phandle with it, as in: > > gpioext: pci at 0,0 { > ... > }; > > And then hook up other devices to it using the regular notation: > > foo { > ... > enable-gpios = <&gpioext 0 0>; > ... > }; > > That's not done in this example but I've actually used something very > similar to that on an x86 platform to hook up the reset pin of an I2C > touchscreen controller to a GPIO controller, where both the I2C and > GPIO controllers were on the PCI bus. > > I can't find that snippet right now, but I can look more thoroughly if > the above doesn't help you at all. > > Thierry > I guess I'll have to wait until gitorious.org actually does come back up... then you'll definitely hear from me again. :-) Thanks! Gerlando ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2013-08-26 14:49 ` Gerlando Falauto @ 2013-08-26 19:16 ` Jason Gunthorpe 2013-11-04 14:49 ` Gerlando Falauto 0 siblings, 1 reply; 55+ messages in thread From: Jason Gunthorpe @ 2013-08-26 19:16 UTC (permalink / raw) To: linux-arm-kernel On Mon, Aug 26, 2013 at 04:49:23PM +0200, Gerlando Falauto wrote: > One last question though... what does then the numbering ("@a,b") > stand for? I assume if the output of a plain (i.e. no params) > 'lspci' is It is device,function, but it is only descriptive and not used by Linux. > I should only have a "pci at dd,f" node, with the bus numbering being > imposed by the hierarchy after an actual probing, right? > So the actual bus number is never listed in the device tree (whereas > the "@device,function" is). Is that right? The reg must encode the bus number according to the OF format: 33222222 22221111 11111100 00000000 10987654 32109876 54321098 76543210 phys.hi cell: npt000ss bbbbbbbb dddddfff rrrrrrrr phys.mid cell: hhhhhhhh hhhhhhhh hhhhhhhh hhhhhhhh phys.lo cell: llllllll llllllll llllllll llllllll bbbbbbbb is the 8-bit Bus Number ddddd is the 5-bit Device Number fff is the 3-bit Function Number Others are 0. Jason ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2013-08-26 19:16 ` Jason Gunthorpe @ 2013-11-04 14:49 ` Gerlando Falauto 2013-11-05 8:13 ` Thierry Reding 0 siblings, 1 reply; 55+ messages in thread From: Gerlando Falauto @ 2013-11-04 14:49 UTC (permalink / raw) To: linux-arm-kernel Hi folks, thank you for your patience... So, thanks to Thierry's example: > https://gitorious.org/thierryreding/linux/commit/b85c03d73288f6e376fc158ceac30f29680b4192 and Jason's explanation: > The reg must encode the bus number according to the OF format: > > 33222222 22221111 11111100 00000000 > 10987654 32109876 54321098 76543210 > phys.hi cell: npt000ss bbbbbbbb dddddfff rrrrrrrr > phys.mid cell: hhhhhhhh hhhhhhhh hhhhhhhh hhhhhhhh > phys.lo cell: llllllll llllllll llllllll llllllll > > bbbbbbbb is the 8-bit Bus Number > ddddd is the 5-bit Device Number > fff is the 3-bit Function Number > > Others are 0. I'm finally starting to make some sense out of this, and I checked that Jason's statement is indeed true, at least on 3.10: > Device tree can include the discovered PCI devices, you have to use > the special reg encoding and all that weirdness, but it does work. The > of_node will be attached to the struct pci device automatically. [Hi latency was also due to other activities, not just the low throughput of my brain cells] ;-) I have one last question for Thierry though: what's the point of things such as + pci at 0,0 { + compatible = "opencores,spi"; (apart from clarity, of course)? I mean, wouldn't the driver be bound to the device through its PCI vendor ID / device ID? Are we also supposed to register a platform driver based on a compatible string instead? Thanks again guys! Gerlando ^ permalink raw reply [flat|nested] 55+ messages in thread
* pci-mvebu driver on km_kirkwood 2013-11-04 14:49 ` Gerlando Falauto @ 2013-11-05 8:13 ` Thierry Reding 0 siblings, 0 replies; 55+ messages in thread From: Thierry Reding @ 2013-11-05 8:13 UTC (permalink / raw) To: linux-arm-kernel On Mon, Nov 04, 2013 at 03:49:59PM +0100, Gerlando Falauto wrote: > Hi folks, > > thank you for your patience... > > So, thanks to Thierry's example: > > > https://gitorious.org/thierryreding/linux/commit/b85c03d73288f6e376fc158ceac30f29680b4192 > > and Jason's explanation: > > > The reg must encode the bus number according to the OF format: > > > > 33222222 22221111 11111100 00000000 > > 10987654 32109876 54321098 76543210 > > phys.hi cell: npt000ss bbbbbbbb dddddfff rrrrrrrr > > phys.mid cell: hhhhhhhh hhhhhhhh hhhhhhhh hhhhhhhh > > phys.lo cell: llllllll llllllll llllllll llllllll > > > > bbbbbbbb is the 8-bit Bus Number > > ddddd is the 5-bit Device Number > > fff is the 3-bit Function Number > > > > Others are 0. > > I'm finally starting to make some sense out of this, and I checked > that Jason's statement is indeed true, at least on 3.10: > > > Device tree can include the discovered PCI devices, you have to use > > the special reg encoding and all that weirdness, but it does work. The > > of_node will be attached to the struct pci device automatically. > > [Hi latency was also due to other activities, not just the low > throughput of my brain cells] ;-) > > I have one last question for Thierry though: what's the point of > things such as > > + pci at 0,0 { > + compatible = "opencores,spi"; > > (apart from clarity, of course)? > I mean, wouldn't the driver be bound to the device through its PCI > vendor ID / device ID? > Are we also supposed to register a platform driver based on a > compatible string instead? I think that compatible property is completely bogus. Or at least the value is. The primary reason why I included them was for descriptive purposes. According to section 2.5 of the PCI Bus Binding to Open Firmware[0] this should be something like: compatible = "pciVVVV,DDDD"; where VVVV is the vendor ID and DDDD is the device ID, both in hexadecimal. Section 2.5 lists a few more, but I'm not sure exactly which would really be required. I'm not even sure that they really are required at all. The drivers will certainly be able to bind to them via the standard vendor and device ID matching as you say. And no, no platform driver required. Thierry [0]: http://www.openfirmware.org/1275/bindings/pci/pci2_1.pdf -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 836 bytes Desc: not available URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20131105/3402da29/attachment.sig> ^ permalink raw reply [flat|nested] 55+ messages in thread
end of thread, other threads:[~2014-02-23 3:43 UTC | newest] Thread overview: 55+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-07-10 16:15 pci-mvebu driver on km_kirkwood Gerlando Falauto 2013-07-10 16:57 ` Thomas Petazzoni 2013-07-10 17:31 ` Gerlando Falauto 2013-07-10 19:56 ` Gerlando Falauto 2013-07-11 7:03 ` Valentin Longchamp 2013-07-12 8:59 ` Thomas Petazzoni 2013-07-15 15:46 ` Valentin Longchamp 2013-07-15 19:51 ` Thomas Petazzoni 2013-07-11 14:32 ` Thomas Petazzoni 2014-02-18 17:29 ` Gerlando Falauto 2014-02-18 20:27 ` Thomas Petazzoni 2014-02-19 8:38 ` Gerlando Falauto 2014-02-19 9:26 ` Thomas Petazzoni 2014-02-19 9:39 ` Gerlando Falauto 2014-02-19 13:37 ` Thomas Petazzoni 2014-02-19 21:45 ` Bjorn Helgaas 2014-02-20 8:55 ` Thomas Petazzoni 2014-02-20 17:35 ` Jason Gunthorpe 2014-02-20 20:29 ` Thomas Petazzoni 2014-02-21 0:32 ` Jason Gunthorpe 2014-02-21 8:34 ` Thomas Petazzoni 2014-02-21 8:58 ` Gerlando Falauto 2014-02-21 9:12 ` Thomas Petazzoni 2014-02-21 9:16 ` Gerlando Falauto 2014-02-21 9:39 ` Thomas Petazzoni 2014-02-21 12:24 ` Gerlando Falauto 2014-02-21 13:47 ` Thomas Petazzoni 2014-02-21 15:05 ` Arnd Bergmann 2014-02-21 15:11 ` Thomas Petazzoni 2014-02-21 15:20 ` Arnd Bergmann 2014-02-21 15:37 ` Thomas Petazzoni 2014-02-21 16:39 ` Jason Gunthorpe 2014-02-21 17:05 ` Thomas Petazzoni 2014-02-21 17:31 ` Jason Gunthorpe 2014-02-21 18:05 ` Arnd Bergmann 2014-02-21 18:29 ` Gerlando Falauto 2014-02-21 18:18 ` Gerlando Falauto 2014-02-21 18:45 ` Thomas Petazzoni 2014-02-20 19:18 ` Bjorn Helgaas 2014-02-21 0:24 ` Jason Gunthorpe 2014-02-21 19:05 ` Bjorn Helgaas 2014-02-21 19:21 ` Thomas Petazzoni 2014-02-21 19:53 ` Benjamin Herrenschmidt 2014-02-23 3:43 ` Gavin Shan 2013-07-31 8:03 ` Thomas Petazzoni 2013-07-31 8:26 ` Gerlando Falauto 2013-07-31 9:00 ` Thomas Petazzoni 2013-07-31 20:50 ` Jason Gunthorpe 2013-08-09 14:01 ` Thierry Reding 2013-08-26 9:27 ` Gerlando Falauto 2013-08-26 12:02 ` Thierry Reding 2013-08-26 14:49 ` Gerlando Falauto 2013-08-26 19:16 ` Jason Gunthorpe 2013-11-04 14:49 ` Gerlando Falauto 2013-11-05 8:13 ` Thierry Reding
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).