* [PATCH V1 net-next 0/2] Add pgtable API to query if write combining is available @ 2014-10-05 8:22 Or Gerlitz 2014-10-05 8:22 ` [PATCH V1 net-next 1/2] pgtable: Add " Or Gerlitz 2014-10-05 8:22 ` [PATCH V1 net-next 2/2] net/mlx4_core: Disable BF when write combining is not available Or Gerlitz 0 siblings, 2 replies; 11+ messages in thread From: Or Gerlitz @ 2014-10-05 8:22 UTC (permalink / raw) To: David S. Miller Cc: netdev, Amir Vadai, Jack Morgenstein, Moshe Lazer, Tal Alon, Yevgeny Petrilin, Or Gerlitz Currently the kernel write-combining interface provides a best effort mechanism in which the caller simply invokes pgprot_writecombine(). If write combining is available, the region is mapped for it, otherwise the region is (silently) mapped as non-cached. In some cases, however, the calling driver must know if write combining is available, so a silent best effort mechanism is not sufficient. Add writecombine_available(), which returns 1 if the system supports write combining and 0 if it doesn't. In mlx4 for better latency, we write send descriptors to a write-combining (WC) mapped buffer instead of ringing a doorbell and having the HW fetch the descriptor from system memory. However, if write-combining is not supported on the host, then we obtain better latency by using the doorbell-ring/HW fetch mechanism. This series from Moshe and Jack adds the API and uses in in mlx4. We are sending through netdev to get feedback from the networking community and extend the reviewer audience if required. Per the reviewers request, here are some results from these three different configurations: [1] bf=on with wc [2] bf=on without wc [3] bf=off and doorbell The 1st set of results was obtained from running latency test with the HCA being passthrough-ed into VM running over KVM host -- so WC isn't available. The problematic range is 32-128B, for example with 128 bytes message, using BF has latency of 1.47us and no usage of BF only 1us. When WC isn't really available every write of 64B would actually translate into 8 writes of 8 bytes which obviously hurts the latency. # /usr/bin/taskset -c 0 ib_write_lat -d mlx4_0 -i 1 -F -a -n 1000000 [2] BF on without WC #bytes #iterations t_min[usec] t_max[usec] t_typical[usec] 2 1000000 0.74 186.16 0.79 4 1000000 0.70 103.62 0.78 8 1000000 0.74 77.02 0.78 16 1000000 0.65 640.75 0.86 32 1000000 0.90 134.63 0.96 64 1000000 1.05 808.52 1.11 128 1000000 1.05 405.58 1.47 [3] BF off and using doorbell #bytes #iterations t_min[usec] t_max[usec] t_typical[usec] 2 1000000 0.85 107.29 0.89 4 1000000 0.84 705.90 0.89 8 1000000 0.85 457.72 0.89 16 1000000 0.85 1041.43 0.90 32 1000000 0.88 773.67 0.92 64 1000000 0.90 82.70 0.93 128 1000000 0.96 78.20 1.00 The 2nd set of results was obtained from running latency test over bare-metal host where WC is available. Clearly we gain better latency when BF is used vs. the doorbell base. # /usr/bin/taskset -c 0 ib_write_lat -d mlx4_0 -i 1 -F -a -n 1000000 [1] BF on, WC available #bytes #iterations t_min[usec] t_max[usec] t_typical[usec] 2 1000000 0.74 131.62 0.79 4 1000000 0.74 134.51 0.79 8 1000000 0.74 154.30 0.79 16 1000000 0.74 1437.57 0.79 32 1000000 0.79 138.23 0.83 64 1000000 0.82 135.86 0.85 128 1000000 0.94 131.11 0.98 [3] BF off and using doorbell #bytes #iterations t_min[usec] t_max[usec] t_typical[usec] 2 1000000 1.05 137.55 1.10 4 1000000 1.04 422.50 1.10 8 1000000 1.05 141.26 1.10 16 1000000 1.06 1261.99 1.11 32 1000000 1.09 141.47 1.14 64 1000000 1.11 435.44 1.16 128 1000000 1.22 212.19 1.27 Moshe and Or. changes from V0: - changed the WC helper to return bool value Moshe Lazer (2): pgtable: Add API to query if write combining is available net/mlx4_core: Disable BF when write combining is not available arch/arm/include/asm/pgtable.h | 6 ++++++ arch/arm64/include/asm/pgtable.h | 5 +++++ arch/ia64/include/asm/pgtable.h | 6 ++++++ arch/powerpc/include/asm/pgtable.h | 6 ++++++ arch/x86/include/asm/pgtable_types.h | 2 ++ arch/x86/mm/pat.c | 9 +++++++++ drivers/net/ethernet/mellanox/mlx4/fw.c | 2 +- include/asm-generic/pgtable.h | 8 ++++++++ 8 files changed, 43 insertions(+), 1 deletions(-) ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH V1 net-next 1/2] pgtable: Add API to query if write combining is available 2014-10-05 8:22 [PATCH V1 net-next 0/2] Add pgtable API to query if write combining is available Or Gerlitz @ 2014-10-05 8:22 ` Or Gerlitz 2014-10-07 19:44 ` David Miller 2014-10-05 8:22 ` [PATCH V1 net-next 2/2] net/mlx4_core: Disable BF when write combining is not available Or Gerlitz 1 sibling, 1 reply; 11+ messages in thread From: Or Gerlitz @ 2014-10-05 8:22 UTC (permalink / raw) To: David S. Miller Cc: netdev, Amir Vadai, Jack Morgenstein, Moshe Lazer, Tal Alon, Yevgeny Petrilin, Or Gerlitz From: Moshe Lazer <moshel@mellanox.com> Currently the kernel write-combining interface provides a best effort mechanism in which the caller simply invokes pgprot_writecombine(). If write combining is available, the region is mapped for it, otherwise the region is (silently) mapped as non-cached. In some cases, however, the calling driver must know if write combining is available, so a silent best effort mechanism is not sufficient. Add writecombine_available(), which returns true if the system supports write combining and false if it doesn't. Signed-off-by: Moshe Lazer <moshel@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> --- arch/arm/include/asm/pgtable.h | 6 ++++++ arch/arm64/include/asm/pgtable.h | 5 +++++ arch/ia64/include/asm/pgtable.h | 6 ++++++ arch/powerpc/include/asm/pgtable.h | 6 ++++++ arch/x86/include/asm/pgtable_types.h | 2 ++ arch/x86/mm/pat.c | 9 +++++++++ include/asm-generic/pgtable.h | 8 ++++++++ 7 files changed, 42 insertions(+), 0 deletions(-) diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h index 01baef0..ce06b64 100644 --- a/arch/arm/include/asm/pgtable.h +++ b/arch/arm/include/asm/pgtable.h @@ -119,6 +119,12 @@ extern pgprot_t pgprot_s2_device; #define pgprot_writecombine(prot) \ __pgprot_modify(prot, L_PTE_MT_MASK, L_PTE_MT_BUFFERABLE) +#define writecombine_available writecombine_available +static inline bool writecombine_available(void) +{ + return true; +} + #define pgprot_stronglyordered(prot) \ __pgprot_modify(prot, L_PTE_MT_MASK, L_PTE_MT_UNCACHED) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index ffe1ba0..6ab0630 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -296,6 +296,11 @@ static inline int has_transparent_hugepage(void) __pgprot_modify(prot, PTE_ATTRINDX_MASK, PTE_ATTRINDX(MT_DEVICE_nGnRnE) | PTE_PXN | PTE_UXN) #define pgprot_writecombine(prot) \ __pgprot_modify(prot, PTE_ATTRINDX_MASK, PTE_ATTRINDX(MT_NORMAL_NC) | PTE_PXN | PTE_UXN) +#define writecombine_available writecombine_available +static inline bool writecombine_available(void) +{ + return true; +} #define __HAVE_PHYS_MEM_ACCESS_PROT struct file; extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn, diff --git a/arch/ia64/include/asm/pgtable.h b/arch/ia64/include/asm/pgtable.h index 7935115..2e44501 100644 --- a/arch/ia64/include/asm/pgtable.h +++ b/arch/ia64/include/asm/pgtable.h @@ -356,6 +356,12 @@ static inline void set_pte(pte_t *ptep, pte_t pteval) #define pgprot_noncached(prot) __pgprot((pgprot_val(prot) & ~_PAGE_MA_MASK) | _PAGE_MA_UC) #define pgprot_writecombine(prot) __pgprot((pgprot_val(prot) & ~_PAGE_MA_MASK) | _PAGE_MA_WC) +#define writecombine_available writecombine_available +static inline bool writecombine_available(void) +{ + return true; +} + struct file; extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn, unsigned long size, pgprot_t vma_prot); diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h index d98c1ec..3232d98 100644 --- a/arch/powerpc/include/asm/pgtable.h +++ b/arch/powerpc/include/asm/pgtable.h @@ -267,6 +267,12 @@ extern int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long addre #define pgprot_writecombine pgprot_noncached_wc +#define writecombine_available writecombine_available +static inline bool writecombine_available(void) +{ + return true; +} + struct file; extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn, unsigned long size, pgprot_t vma_prot); diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index f216963..7d3dc79 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -337,6 +337,8 @@ extern int nx_enabled; #define pgprot_writecombine pgprot_writecombine extern pgprot_t pgprot_writecombine(pgprot_t prot); +#define writecombine_available writecombine_available +bool writecombine_available(void); /* Indicate that x86 has its own track and untrack pfn vma functions */ #define __HAVE_PFNMAP_TRACKING diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c index 6574388..851ee51 100644 --- a/arch/x86/mm/pat.c +++ b/arch/x86/mm/pat.c @@ -797,6 +797,15 @@ pgprot_t pgprot_writecombine(pgprot_t prot) } EXPORT_SYMBOL_GPL(pgprot_writecombine); +bool writecombine_available(void) +{ + if (pat_enabled) + return true; + + return false; +} +EXPORT_SYMBOL_GPL(writecombine_available); + #if defined(CONFIG_DEBUG_FS) && defined(CONFIG_X86_PAT) static struct memtype *memtype_get_idx(loff_t pos) diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index 53b2acc..2cb40d9 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -249,6 +249,14 @@ static inline int pmd_same(pmd_t pmd_a, pmd_t pmd_b) #define pgprot_writecombine pgprot_noncached #endif +#ifndef writecombine_available +#define writecombine_available writecombine_available +static inline bool writecombine_available(void) +{ + return false; +} +#endif + /* * When walking page tables, get the address of the next boundary, * or the end address of the range if that comes earlier. Although no -- 1.7.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH V1 net-next 1/2] pgtable: Add API to query if write combining is available 2014-10-05 8:22 ` [PATCH V1 net-next 1/2] pgtable: Add " Or Gerlitz @ 2014-10-07 19:44 ` David Miller [not found] ` <925ad10b2ec44e228e69bf0cbe6c0a0e@AMSPR05MB002.eurprd05.prod.outlook.com> 0 siblings, 1 reply; 11+ messages in thread From: David Miller @ 2014-10-07 19:44 UTC (permalink / raw) To: ogerlitz; +Cc: netdev, amirv, jackm, moshel, talal, yevgenyp From: Or Gerlitz <ogerlitz@mellanox.com> Date: Sun, 5 Oct 2014 11:22:21 +0300 > From: Moshe Lazer <moshel@mellanox.com> > > Currently the kernel write-combining interface provides a best effort > mechanism in which the caller simply invokes pgprot_writecombine(). > > If write combining is available, the region is mapped for it, otherwise > the region is (silently) mapped as non-cached. > > In some cases, however, the calling driver must know if write combining > is available, so a silent best effort mechanism is not sufficient. > > Add writecombine_available(), which returns true if the system > supports write combining and false if it doesn't. > > Signed-off-by: Moshe Lazer <moshel@mellanox.com> > Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> > Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> This needs some ACKs from MM developers. But also the situation is more complicated than a simple boolean test. On some platforms you have to test first whether the range you are trying to write combine can legally be marked in that way. The DRM layer has all of these per-arch tests to do this properly. #if defined(__i386__) || defined(__x86_64__) if (map->type == _DRM_REGISTERS && !(map->flags & _DRM_WRITE_COMBINING)) tmp = pgprot_noncached(tmp); else tmp = pgprot_writecombine(tmp); #elif defined(__powerpc__) pgprot_val(tmp) |= _PAGE_NO_CACHE; if (map->type == _DRM_REGISTERS) pgprot_val(tmp) |= _PAGE_GUARDED; #elif defined(__ia64__) if (efi_range_is_wc(vma->vm_start, vma->vm_end - vma->vm_start)) tmp = pgprot_writecombine(tmp); else tmp = pgprot_noncached(tmp); #elif defined(__sparc__) || defined(__arm__) || defined(__mips__) tmp = pgprot_noncached(tmp); #endif ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <925ad10b2ec44e228e69bf0cbe6c0a0e@AMSPR05MB002.eurprd05.prod.outlook.com>]
* Re: FW: [PATCH V1 net-next 1/2] pgtable: Add API to query if write combining is available [not found] ` <925ad10b2ec44e228e69bf0cbe6c0a0e@AMSPR05MB002.eurprd05.prod.outlook.com> @ 2014-10-08 8:44 ` Moshe Lazer 2014-10-08 8:50 ` David Laight 2014-10-08 16:24 ` David Miller 0 siblings, 2 replies; 11+ messages in thread From: Moshe Lazer @ 2014-10-08 8:44 UTC (permalink / raw) To: davem Cc: Or Gerlitz, Jack Morgenstein, Tal Alon, Yevgeny Petrilin, netdev, Amir Vadai > From: David Miller [mailto:davem@davemloft.net] > Sent: Tuesday, October 07, 2014 10:44 PM > To: Or Gerlitz > Cc: netdev@vger.kernel.org; Amir Vadai; jackm@dev.mellanox.co.il; Moshe Lazer; Tal Alon; Yevgeny Petrilin > Subject: Re: [PATCH V1 net-next 1/2] pgtable: Add API to query if write combining is available > > From: Or Gerlitz <ogerlitz@mellanox.com> > Date: Sun, 5 Oct 2014 11:22:21 +0300 > >> From: Moshe Lazer <moshel@mellanox.com> >> >> Currently the kernel write-combining interface provides a best effort >> mechanism in which the caller simply invokes pgprot_writecombine(). >> >> If write combining is available, the region is mapped for it, >> otherwise the region is (silently) mapped as non-cached. >> >> In some cases, however, the calling driver must know if write >> combining is available, so a silent best effort mechanism is not sufficient. >> >> Add writecombine_available(), which returns true if the system >> supports write combining and false if it doesn't. >> >> Signed-off-by: Moshe Lazer <moshel@mellanox.com> >> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> >> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> > This needs some ACKs from MM developers. > > But also the situation is more complicated than a simple boolean test. > > On some platforms you have to test first whether the range you are trying to write combine can legally be marked in that way. The DRM layer has all of these per-arch tests to do this properly. > > #if defined(__i386__) || defined(__x86_64__) > if (map->type == _DRM_REGISTERS && !(map->flags & _DRM_WRITE_COMBINING)) > tmp = pgprot_noncached(tmp); > else > tmp = pgprot_writecombine(tmp); > #elif defined(__powerpc__) > pgprot_val(tmp) |= _PAGE_NO_CACHE; > if (map->type == _DRM_REGISTERS) > pgprot_val(tmp) |= _PAGE_GUARDED; > #elif defined(__ia64__) > if (efi_range_is_wc(vma->vm_start, vma->vm_end - > vma->vm_start)) > tmp = pgprot_writecombine(tmp); > else > tmp = pgprot_noncached(tmp); > #elif defined(__sparc__) || defined(__arm__) || defined(__mips__) > tmp = pgprot_noncached(tmp); > #endif The idea was to provide an indication as for whether the arch supports write-combining in general. If we want to benefit from blue flame operations, we need to map the blue flame registers as write combining - otherwise there is no benefit. So we would like to know if write combining is supported by the system or not. ^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: FW: [PATCH V1 net-next 1/2] pgtable: Add API to query if write combining is available 2014-10-08 8:44 ` FW: " Moshe Lazer @ 2014-10-08 8:50 ` David Laight 2014-10-08 16:24 ` David Miller 1 sibling, 0 replies; 11+ messages in thread From: David Laight @ 2014-10-08 8:50 UTC (permalink / raw) To: 'Moshe Lazer', davem@davemloft.net Cc: Or Gerlitz, Jack Morgenstein, Tal Alon, Yevgeny Petrilin, netdev@vger.kernel.org, Amir Vadai From: Moshe Lazer > > From: David Miller [mailto:davem@davemloft.net] > > Sent: Tuesday, October 07, 2014 10:44 PM > > To: Or Gerlitz > > Cc: netdev@vger.kernel.org; Amir Vadai; jackm@dev.mellanox.co.il; Moshe Lazer; Tal Alon; Yevgeny > Petrilin > > Subject: Re: [PATCH V1 net-next 1/2] pgtable: Add API to query if write combining is available > > > > From: Or Gerlitz <ogerlitz@mellanox.com> > > Date: Sun, 5 Oct 2014 11:22:21 +0300 > > > >> From: Moshe Lazer <moshel@mellanox.com> > >> > >> Currently the kernel write-combining interface provides a best effort > >> mechanism in which the caller simply invokes pgprot_writecombine(). > >> > >> If write combining is available, the region is mapped for it, > >> otherwise the region is (silently) mapped as non-cached. > >> > >> In some cases, however, the calling driver must know if write > >> combining is available, so a silent best effort mechanism is not sufficient. > >> > >> Add writecombine_available(), which returns true if the system > >> supports write combining and false if it doesn't. > >> > >> Signed-off-by: Moshe Lazer <moshel@mellanox.com> > >> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> > >> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> > > This needs some ACKs from MM developers. > > > > But also the situation is more complicated than a simple boolean test. > > > > On some platforms you have to test first whether the range you are trying to write combine can > legally be marked in that way. The DRM layer has all of these per-arch tests to do this properly. > > > > #if defined(__i386__) || defined(__x86_64__) > > if (map->type == _DRM_REGISTERS && !(map->flags & _DRM_WRITE_COMBINING)) > > tmp = pgprot_noncached(tmp); > > else > > tmp = pgprot_writecombine(tmp); > > #elif defined(__powerpc__) > > pgprot_val(tmp) |= _PAGE_NO_CACHE; > > if (map->type == _DRM_REGISTERS) > > pgprot_val(tmp) |= _PAGE_GUARDED; > > #elif defined(__ia64__) > > if (efi_range_is_wc(vma->vm_start, vma->vm_end - > > vma->vm_start)) > > tmp = pgprot_writecombine(tmp); > > else > > tmp = pgprot_noncached(tmp); > > #elif defined(__sparc__) || defined(__arm__) || defined(__mips__) > > tmp = pgprot_noncached(tmp); > > #endif > The idea was to provide an indication as for whether the arch supports > write-combining in general. > If we want to benefit from blue flame operations, we need to map the > blue flame registers as write combining - otherwise there is no benefit. > So we would like to know if write combining is supported by the system > or not. Sounds like the mapping functions should have a mode where 'write combining' is mandatory, then you can try to map the registers as 'write combining' and if the map request fails use the alternate scheme. That moves the test into the architecture specific mapping code where it belongs. David ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH V1 net-next 1/2] pgtable: Add API to query if write combining is available 2014-10-08 8:44 ` FW: " Moshe Lazer 2014-10-08 8:50 ` David Laight @ 2014-10-08 16:24 ` David Miller 2014-10-12 9:54 ` Moshe Lazer 1 sibling, 1 reply; 11+ messages in thread From: David Miller @ 2014-10-08 16:24 UTC (permalink / raw) To: moshel; +Cc: ogerlitz, jackm, talal, yevgenyp, netdev, amirv From: Moshe Lazer <moshel@dev.mellanox.co.il> Date: Wed, 08 Oct 2014 11:44:57 +0300 >> #if defined(__i386__) || defined(__x86_64__) >> if (map->type == _DRM_REGISTERS && !(map->flags & _DRM_WRITE_COMBINING)) >> tmp = pgprot_noncached(tmp); >> else >> tmp = pgprot_writecombine(tmp); >> #elif defined(__powerpc__) >> pgprot_val(tmp) |= _PAGE_NO_CACHE; >> if (map->type == _DRM_REGISTERS) >> pgprot_val(tmp) |= _PAGE_GUARDED; >> #elif defined(__ia64__) >> if (efi_range_is_wc(vma->vm_start, vma->vm_end - >> vma->vm_start)) >> tmp = pgprot_writecombine(tmp); >> else >> tmp = pgprot_noncached(tmp); >> #elif defined(__sparc__) || defined(__arm__) || defined(__mips__) >> tmp = pgprot_noncached(tmp); >> #endif > The idea was to provide an indication as for whether the arch supports > write-combining in general. > If we want to benefit from blue flame operations, we need to map the > blue flame registers as write combining - otherwise there is no > benefit. So we would like to know if write combining is supported by > the system or not. > You completely miss my point. On a given architectuire it might be _illegal_ to map certain address ranges as write-combining without checks like the ones above that ia64 needs. Therefore your proposed interface is by definition insufficient. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH V1 net-next 1/2] pgtable: Add API to query if write combining is available 2014-10-08 16:24 ` David Miller @ 2014-10-12 9:54 ` Moshe Lazer 2014-10-26 15:00 ` Moshe Lazer 2014-11-27 6:48 ` Or Gerlitz 0 siblings, 2 replies; 11+ messages in thread From: Moshe Lazer @ 2014-10-12 9:54 UTC (permalink / raw) To: David Miller; +Cc: ogerlitz, jackm, talal, yevgenyp, netdev, amirv, moshel On 10/8/2014 7:24 PM, David Miller wrote: > From: Moshe Lazer <moshel@dev.mellanox.co.il> > Date: Wed, 08 Oct 2014 11:44:57 +0300 > >>> #if defined(__i386__) || defined(__x86_64__) >>> if (map->type == _DRM_REGISTERS && !(map->flags & _DRM_WRITE_COMBINING)) >>> tmp = pgprot_noncached(tmp); >>> else >>> tmp = pgprot_writecombine(tmp); >>> #elif defined(__powerpc__) >>> pgprot_val(tmp) |= _PAGE_NO_CACHE; >>> if (map->type == _DRM_REGISTERS) >>> pgprot_val(tmp) |= _PAGE_GUARDED; >>> #elif defined(__ia64__) >>> if (efi_range_is_wc(vma->vm_start, vma->vm_end - >>> vma->vm_start)) >>> tmp = pgprot_writecombine(tmp); >>> else >>> tmp = pgprot_noncached(tmp); >>> #elif defined(__sparc__) || defined(__arm__) || defined(__mips__) >>> tmp = pgprot_noncached(tmp); >>> #endif >> The idea was to provide an indication as for whether the arch supports >> write-combining in general. >> If we want to benefit from blue flame operations, we need to map the >> blue flame registers as write combining - otherwise there is no >> benefit. So we would like to know if write combining is supported by >> the system or not. >> > You completely miss my point. On a given architectuire it might be > _illegal_ to map certain address ranges as write-combining without > checks like the ones above that ia64 needs. > > Therefore your proposed interface is by definition insufficient. Thanks David, I'll try to clarify my point. For me the writecombine_available() is a way to know if the pgprot_writecombine() is effective or just cover call to the pgprot_noncached(). I want to use the writecombine_available() regardless to the mapping address. For example in mlx4 query_device I want to indicate that blue-flame is not supported if `writecombine_available() == false`. In this case we don't have the mapping address yet. Later on if an arch has write-combining (writecombine_available() == true) when we try to map the blue-flame registers (in mlx4_ib_mmap): vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot); if (io_remap_pfn_range(vma, vma->vm_start, to_mucontext(context)->uar.pfn + dev->dev->caps.num_uars, PAGE_SIZE, vma->vm_page_prot)) return -EAGAIN; I can be sure that pgprot_writecombine() is not a cover for pgprot_noncached(). The address checks that you mentioned should be part of io_remap_pfn_range, this function should fail if the vma->vm_start is not compatible to the vma->vm_page_prot. Please let me know if I misunderstood something. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH V1 net-next 1/2] pgtable: Add API to query if write combining is available 2014-10-12 9:54 ` Moshe Lazer @ 2014-10-26 15:00 ` Moshe Lazer 2014-11-27 6:48 ` Or Gerlitz 1 sibling, 0 replies; 11+ messages in thread From: Moshe Lazer @ 2014-10-26 15:00 UTC (permalink / raw) To: David Miller; +Cc: ogerlitz, jackm, talal, yevgenyp, netdev, amirv, moshel On 10/12/2014 12:54 PM, Moshe Lazer wrote: > > On 10/8/2014 7:24 PM, David Miller wrote: >> From: Moshe Lazer <moshel@dev.mellanox.co.il> >> Date: Wed, 08 Oct 2014 11:44:57 +0300 >> >>>> #if defined(__i386__) || defined(__x86_64__) >>>> if (map->type == _DRM_REGISTERS && !(map->flags & >>>> _DRM_WRITE_COMBINING)) >>>> tmp = pgprot_noncached(tmp); >>>> else >>>> tmp = pgprot_writecombine(tmp); >>>> #elif defined(__powerpc__) >>>> pgprot_val(tmp) |= _PAGE_NO_CACHE; >>>> if (map->type == _DRM_REGISTERS) >>>> pgprot_val(tmp) |= _PAGE_GUARDED; >>>> #elif defined(__ia64__) >>>> if (efi_range_is_wc(vma->vm_start, vma->vm_end - >>>> vma->vm_start)) >>>> tmp = pgprot_writecombine(tmp); >>>> else >>>> tmp = pgprot_noncached(tmp); >>>> #elif defined(__sparc__) || defined(__arm__) || defined(__mips__) >>>> tmp = pgprot_noncached(tmp); >>>> #endif >>> The idea was to provide an indication as for whether the arch supports >>> write-combining in general. >>> If we want to benefit from blue flame operations, we need to map the >>> blue flame registers as write combining - otherwise there is no >>> benefit. So we would like to know if write combining is supported by >>> the system or not. >>> >> You completely miss my point. On a given architectuire it might be >> _illegal_ to map certain address ranges as write-combining without >> checks like the ones above that ia64 needs. >> >> Therefore your proposed interface is by definition insufficient. > Thanks Dave, I'll try to clarify my point. > For me the writecombine_available() is a way to know if the > pgprot_writecombine() is effective or just cover call to the > pgprot_noncached(). > I want to use the writecombine_available() regardless to the mapping > address. > For example in mlx4 query_device I want to indicate that blue-flame is > not supported if `writecombine_available() == false`. > In this case we don't have the mapping address yet. > > Later on if an arch has write-combining (writecombine_available() == > true) when we try to map the blue-flame registers (in mlx4_ib_mmap): > > vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot); > > if (io_remap_pfn_range(vma, vma->vm_start, > to_mucontext(context)->uar.pfn + > dev->dev->caps.num_uars, > PAGE_SIZE, vma->vm_page_prot)) > return -EAGAIN; > > I can be sure that pgprot_writecombine() is not a cover for > pgprot_noncached(). > The address checks that you mentioned should be part of > io_remap_pfn_range, this function should fail if the vma->vm_start is > not compatible to the vma->vm_page_prot. > Please let me know if I misunderstood something. > Dave, is this clears my point or I still missing something? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH V1 net-next 1/2] pgtable: Add API to query if write combining is available 2014-10-12 9:54 ` Moshe Lazer 2014-10-26 15:00 ` Moshe Lazer @ 2014-11-27 6:48 ` Or Gerlitz 2014-12-12 20:36 ` David Miller 1 sibling, 1 reply; 11+ messages in thread From: Or Gerlitz @ 2014-11-27 6:48 UTC (permalink / raw) To: Moshe Lazer Cc: David Miller, Or Gerlitz, Jack Morgenstein, talal@mellanox.com, Yevgeny Petrilin, Linux Netdev List, Amir Vadai, moshel On Sun, Oct 12, 2014 at 11:54 AM, Moshe Lazer <moshel@dev.mellanox.co.il> wrote: > > On 10/8/2014 7:24 PM, David Miller wrote: >> >> From: Moshe Lazer <moshel@dev.mellanox.co.il> >> Date: Wed, 08 Oct 2014 11:44:57 +0300 >> >>>> #if defined(__i386__) || defined(__x86_64__) >>>> if (map->type == _DRM_REGISTERS && !(map->flags & >>>> _DRM_WRITE_COMBINING)) >>>> tmp = pgprot_noncached(tmp); >>>> else >>>> tmp = pgprot_writecombine(tmp); >>>> #elif defined(__powerpc__) >>>> pgprot_val(tmp) |= _PAGE_NO_CACHE; >>>> if (map->type == _DRM_REGISTERS) >>>> pgprot_val(tmp) |= _PAGE_GUARDED; >>>> #elif defined(__ia64__) >>>> if (efi_range_is_wc(vma->vm_start, vma->vm_end - >>>> vma->vm_start)) >>>> tmp = pgprot_writecombine(tmp); >>>> else >>>> tmp = pgprot_noncached(tmp); >>>> #elif defined(__sparc__) || defined(__arm__) || defined(__mips__) >>>> tmp = pgprot_noncached(tmp); >>>> #endif >>> >>> The idea was to provide an indication as for whether the arch supports >>> write-combining in general. >>> If we want to benefit from blue flame operations, we need to map the >>> blue flame registers as write combining - otherwise there is no >>> benefit. So we would like to know if write combining is supported by >>> the system or not. >>> >> You completely miss my point. On a given architectuire it might be >> _illegal_ to map certain address ranges as write-combining without >> checks like the ones above that ia64 needs. >> >> Therefore your proposed interface is by definition insufficient. > > Thanks David, I'll try to clarify my point. > For me the writecombine_available() is a way to know if the > pgprot_writecombine() is effective or just cover call to the > pgprot_noncached(). > I want to use the writecombine_available() regardless to the mapping > address. > For example in mlx4 query_device I want to indicate that blue-flame is not > supported if `writecombine_available() == false`. > In this case we don't have the mapping address yet. > > Later on if an arch has write-combining (writecombine_available() == true) > when we try to map the blue-flame registers (in mlx4_ib_mmap): > > vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot); > > if (io_remap_pfn_range(vma, vma->vm_start, > to_mucontext(context)->uar.pfn + > dev->dev->caps.num_uars, > PAGE_SIZE, vma->vm_page_prot)) > return -EAGAIN; > > I can be sure that pgprot_writecombine() is not a cover for > pgprot_noncached(). > The address checks that you mentioned should be part of io_remap_pfn_range, > this function should fail if the vma->vm_start is not compatible to the > vma->vm_page_prot. > Please let me know if I misunderstood something. Hi Dave, Pinging you... could you respond on Moshe's email which hopefully addresses your comments? Or. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH V1 net-next 1/2] pgtable: Add API to query if write combining is available 2014-11-27 6:48 ` Or Gerlitz @ 2014-12-12 20:36 ` David Miller 0 siblings, 0 replies; 11+ messages in thread From: David Miller @ 2014-12-12 20:36 UTC (permalink / raw) To: gerlitz.or Cc: moshel, ogerlitz, jackm, talal, yevgenyp, netdev, amirv, moshel From: Or Gerlitz <gerlitz.or@gmail.com> Date: Thu, 27 Nov 2014 08:48:24 +0200 > On Sun, Oct 12, 2014 at 11:54 AM, Moshe Lazer <moshel@dev.mellanox.co.il> wrote: >> >> On 10/8/2014 7:24 PM, David Miller wrote: >>> >>> From: Moshe Lazer <moshel@dev.mellanox.co.il> >>> Date: Wed, 08 Oct 2014 11:44:57 +0300 >>> >>>>> #if defined(__i386__) || defined(__x86_64__) >>>>> if (map->type == _DRM_REGISTERS && !(map->flags & >>>>> _DRM_WRITE_COMBINING)) >>>>> tmp = pgprot_noncached(tmp); >>>>> else >>>>> tmp = pgprot_writecombine(tmp); >>>>> #elif defined(__powerpc__) >>>>> pgprot_val(tmp) |= _PAGE_NO_CACHE; >>>>> if (map->type == _DRM_REGISTERS) >>>>> pgprot_val(tmp) |= _PAGE_GUARDED; >>>>> #elif defined(__ia64__) >>>>> if (efi_range_is_wc(vma->vm_start, vma->vm_end - >>>>> vma->vm_start)) >>>>> tmp = pgprot_writecombine(tmp); >>>>> else >>>>> tmp = pgprot_noncached(tmp); >>>>> #elif defined(__sparc__) || defined(__arm__) || defined(__mips__) >>>>> tmp = pgprot_noncached(tmp); >>>>> #endif >>>> >>>> The idea was to provide an indication as for whether the arch supports >>>> write-combining in general. >>>> If we want to benefit from blue flame operations, we need to map the >>>> blue flame registers as write combining - otherwise there is no >>>> benefit. So we would like to know if write combining is supported by >>>> the system or not. >>>> >>> You completely miss my point. On a given architectuire it might be >>> _illegal_ to map certain address ranges as write-combining without >>> checks like the ones above that ia64 needs. >>> >>> Therefore your proposed interface is by definition insufficient. >> >> Thanks David, I'll try to clarify my point. >> For me the writecombine_available() is a way to know if the >> pgprot_writecombine() is effective or just cover call to the >> pgprot_noncached(). >> I want to use the writecombine_available() regardless to the mapping >> address. >> For example in mlx4 query_device I want to indicate that blue-flame is not >> supported if `writecombine_available() == false`. >> In this case we don't have the mapping address yet. ... > Pinging you... could you respond on Moshe's email which hopefully > addresses your comments? As I stated, some platforms have restrictions on certain address ranges when it comes to supporting write combining. Therefore, it is a _requirement_, not optional, that we have an address available so that the platform specific code can give an accurate response to the question "is write combining available" because the answer to that question is address dependent. ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH V1 net-next 2/2] net/mlx4_core: Disable BF when write combining is not available 2014-10-05 8:22 [PATCH V1 net-next 0/2] Add pgtable API to query if write combining is available Or Gerlitz 2014-10-05 8:22 ` [PATCH V1 net-next 1/2] pgtable: Add " Or Gerlitz @ 2014-10-05 8:22 ` Or Gerlitz 1 sibling, 0 replies; 11+ messages in thread From: Or Gerlitz @ 2014-10-05 8:22 UTC (permalink / raw) To: David S. Miller Cc: netdev, Amir Vadai, Jack Morgenstein, Moshe Lazer, Tal Alon, Yevgeny Petrilin, Or Gerlitz From: Moshe Lazer <moshel@mellanox.com> In mlx4 for better latency, we write send descriptors to a write-combining (WC) mapped buffer instead of ringing a doorbell and having the HW fetch the descriptor from system memory. However, if write-combining is not supported on the host, then we obtain better latency by using the doorbell-ring/HW fetch mechanism. The mechanism that uses WC is called Blue-Flame (BF). BF is beneficial only when the system supports write combining. When the BF buffer is mapped as a write-combine buffer, the HCA receives data in multi-word bursts. However, if the BF buffer is mapped only as non-cached, the HCA receives data in individual dword chunks, which harms performance. Therefore, disable blueflame when write combining is not available. Signed-off-by: Moshe Lazer <moshel@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> --- drivers/net/ethernet/mellanox/mlx4/fw.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c index 2e88a23..f7bb548 100644 --- a/drivers/net/ethernet/mellanox/mlx4/fw.c +++ b/drivers/net/ethernet/mellanox/mlx4/fw.c @@ -671,7 +671,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) dev_cap->min_page_sz = 1 << field; MLX4_GET(field, outbox, QUERY_DEV_CAP_BF_OFFSET); - if (field & 0x80) { + if ((field & 0x80) && writecombine_available()) { MLX4_GET(field, outbox, QUERY_DEV_CAP_LOG_BF_REG_SZ_OFFSET); dev_cap->bf_reg_size = 1 << (field & 0x1f); MLX4_GET(field, outbox, QUERY_DEV_CAP_LOG_MAX_BF_REGS_PER_PAGE_OFFSET); -- 1.7.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
end of thread, other threads:[~2014-12-12 20:36 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-05 8:22 [PATCH V1 net-next 0/2] Add pgtable API to query if write combining is available Or Gerlitz
2014-10-05 8:22 ` [PATCH V1 net-next 1/2] pgtable: Add " Or Gerlitz
2014-10-07 19:44 ` David Miller
[not found] ` <925ad10b2ec44e228e69bf0cbe6c0a0e@AMSPR05MB002.eurprd05.prod.outlook.com>
2014-10-08 8:44 ` FW: " Moshe Lazer
2014-10-08 8:50 ` David Laight
2014-10-08 16:24 ` David Miller
2014-10-12 9:54 ` Moshe Lazer
2014-10-26 15:00 ` Moshe Lazer
2014-11-27 6:48 ` Or Gerlitz
2014-12-12 20:36 ` David Miller
2014-10-05 8:22 ` [PATCH V1 net-next 2/2] net/mlx4_core: Disable BF when write combining is not available Or Gerlitz
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).