From: Jacob Keller <jacob.e.keller@intel.com>
To: <intel-wired-lan@osuosl.org>, <mustafa.ismail@intel.com>,
<david.m.ertman@intel.com>
Subject: Re: [Intel-wired-lan] [PATCH net-next v2 05/13] ice: Fix RDMA latency issue by allowing write-combining
Date: Tue, 31 Jan 2023 14:57:23 -0800 [thread overview]
Message-ID: <7a40c936-ce1b-d6d7-34f8-2ca9590b1b67@intel.com> (raw)
In-Reply-To: <PH0PR11MB5095133D185FFC8E81B06558D6D39@PH0PR11MB5095.namprd11.prod.outlook.com>
On 1/30/2023 3:34 PM, Keller, Jacob E wrote:
>
>
>> -----Original Message-----
>> From: Lobakin, Alexandr <alexandr.lobakin@intel.com>
>> Sent: Monday, January 30, 2023 2:03 AM
>> To: Keller, Jacob E <jacob.e.keller@intel.com>
>> Cc: Intel Wired LAN <intel-wired-lan@lists.osuosl.org>; Nguyen, Anthony L
>> <anthony.l.nguyen@intel.com>; Linga, Pavan Kumar
>> <pavan.kumar.linga@intel.com>; netdev@vger.kernel.org
>> Subject: Re: [Intel-wired-lan] [PATCH net-next v2 05/13] ice: Fix RDMA latency
>> issue by allowing write-combining
>>
>> From: Jacob Keller <jacob.e.keller@intel.com>
>> Date: Wed, 18 Jan 2023 17:16:45 -0800
>>
>>> The current method of mapping the entire BAR region as a single uncacheable
>>> region does not allow RDMA to use write combining (WC). This results in
>>> increased latency with RDMA.
>>>
>>> To fix this, we initially planned to reduce the size of the map made by the
>>> PF driver to include only up to the beginning of the RDMA space.
>>> Unfortunately this will not work in the future as there are some hardware
>>> features which use registers beyond the RDMA area. This includes Scalable
>>> IOV, a virtualization feature being worked on currently.
>>>
>>> Instead of simply reducing the size of the map, we need a solution which
>>> will allow access to all areas of the address space while leaving the RDMA
>>> area open to be mapped with write combining.
>>>
>>> To allow for this, and fix the RMDA latency issue without blocking the
>>> higher areas of the BAR, we need to create multiple separate memory maps.
>>> Doing so will create a sparse mapping rather than a contiguous single area.
>>>
>>> Replace the void *hw_addr with a special ice_hw_addr structure which
>>> represents the multiple mappings as a flexible array.
>>>
>>> Based on the available BAR size, map up to 3 regions:
>>>
>>> * The space before the RDMA section
>>> * The RDMA section which wants write combining behavior
>>> * The space after the RDMA section
>>
>> Please don't.
>>
>> You have[0]:
>>
>> * io_mapping_init_wc() (+ io_mapping_fini());
>> * io_mapping_create_wc() (+ io_mapping_free());
>>
>> ^ they do the same (the second just allocates a struct ad-hoc, but it
>> can be allocated manually or embedded into a driver structure),
>>
>> * arch_phys_wc_add() (+ arch_phys_wc_del())[1];
>>
>> ^ optional to make MTRR happy
>>
>> -- precisely for the case when you need to remap *a part* of BAR in a
>> different mode.
>>
>> Splitting BARs, dropping pcim_iomap_regions() and so on, is very wrong.
>> Not speaking of that it's PCI driver which must own and map all the
>> memory the device advertises in its PCI config space, and in case of
>> ice, PCI driver is combined with Ethernet, so it's ice which must own
>> and map all the memory.
>> Not speaking of that using a structure with a flex array and creating a
>> static inline to calculate the pointer each time you need to read/write
>> a register, hurts performance and looks properly ugly.
>>
>> The interfaces above must be used by the RDMA driver, right before
>> mapping its part in WC mode. PCI driver has no idea that someone else
>> wants to remap its memory differently, so the code doesn't belong here.
>> I'd drop the patch and let the RDMA team fix/improve their driver.
>>
>
> Appreciate the review! I proposed this option after the original change was to simply reduce the initial size of our bar mapping, resulting in losing access to the registers beyond the RDMA section, which was a non-starter for us once we finish implementing Scalable IOV support.
>
> Searching for io_mapping_init_wc and io_mapping_create_wc there are only a handful of users and not much documentation so no wonder I had trouble locating it! Thanks for helping me learn about it.
>
> @Dave.Ertman@intel.com, @Saleem, Shiraz it looks like we need to drop this patch and modify the iRDMA driver's method of requesting write combined regions to use these new interfaces.
>
> @Nguyen, Anthony L Can you drop this patch from the series on IWL or should I send a v3?
>
> Thanks,
> Jake
Mustafa,
It looks like Shiraz is on vacation and you're his coverage. Can you
help investigate this and the proposed io_mapping solution from Olek?
Dave,
I tried to add you on the original thread above but it got lost due to
an incorrect email.
Thanks,
Jake
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
next prev parent reply other threads:[~2023-01-31 22:57 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-19 1:16 [Intel-wired-lan] [PATCH net-next v2 00/13] ice: various virtualization cleanups Jacob Keller
2023-01-19 1:16 ` [Intel-wired-lan] [PATCH net-next v2 01/13] ice: fix function comment referring to ice_vsi_alloc Jacob Keller
2023-01-19 1:16 ` [Intel-wired-lan] [PATCH net-next v2 02/13] ice: drop unnecessary VF parameter from several VSI functions Jacob Keller
2023-01-26 9:36 ` Szlosek, Marek
2023-01-19 1:16 ` [Intel-wired-lan] [PATCH net-next v2 03/13] ice: refactor VSI setup to use parameter structure Jacob Keller
2023-01-24 3:15 ` G, GurucharanX
2023-01-19 1:16 ` [Intel-wired-lan] [PATCH net-next v2 04/13] ice: move vsi_type assignment from ice_vsi_alloc to ice_vsi_cfg Jacob Keller
2023-01-24 3:16 ` G, GurucharanX
2023-01-19 1:16 ` [Intel-wired-lan] [PATCH net-next v2 05/13] ice: Fix RDMA latency issue by allowing write-combining Jacob Keller
2023-01-25 15:21 ` Andrysiak, Jakub
2023-01-30 10:03 ` Alexander Lobakin
2023-01-30 23:34 ` Keller, Jacob E
2023-01-31 0:26 ` Tony Nguyen
2023-01-31 22:57 ` Jacob Keller [this message]
2023-01-19 1:16 ` [Intel-wired-lan] [PATCH net-next v2 06/13] ice: move ice_vf_vsi_release into ice_vf_lib.c Jacob Keller
2023-01-26 9:36 ` Szlosek, Marek
2023-01-19 1:16 ` [Intel-wired-lan] [PATCH net-next v2 07/13] ice: Pull common tasks into ice_vf_post_vsi_rebuild Jacob Keller
2023-01-26 9:37 ` Szlosek, Marek
2023-01-19 1:16 ` [Intel-wired-lan] [PATCH net-next v2 08/13] ice: add a function to initialize vf entry Jacob Keller
2023-01-26 9:38 ` Szlosek, Marek
2023-01-19 1:16 ` [Intel-wired-lan] [PATCH net-next v2 09/13] ice: introduce ice_vf_init_host_cfg function Jacob Keller
2023-01-26 9:38 ` Szlosek, Marek
2023-01-19 1:16 ` [Intel-wired-lan] [PATCH net-next v2 10/13] ice: convert vf_ops .vsi_rebuild to .create_vsi Jacob Keller
2023-01-19 1:16 ` [Intel-wired-lan] [PATCH net-next v2 11/13] ice: introduce clear_reset_state operation Jacob Keller
2023-01-19 8:42 ` Paul Menzel
2023-01-19 19:18 ` Jacob Keller
2023-01-27 9:48 ` Szlosek, Marek
2023-01-19 1:16 ` [Intel-wired-lan] [PATCH net-next v2 12/13] ice: introduce .irq_close VF operation Jacob Keller
2023-01-27 9:48 ` Szlosek, Marek
2023-01-19 1:16 ` [Intel-wired-lan] [PATCH net-next v2 13/13] ice: remove unnecessary virtchnl_ether_addr struct use Jacob Keller
2023-01-27 9:49 ` Szlosek, Marek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7a40c936-ce1b-d6d7-34f8-2ca9590b1b67@intel.com \
--to=jacob.e.keller@intel.com \
--cc=david.m.ertman@intel.com \
--cc=intel-wired-lan@osuosl.org \
--cc=mustafa.ismail@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox