Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
* Re: [PATCH 1/7] IB/rxe: Allocate enough space for an IPv6 addr
From: Bart Van Assche @ 2016-11-22 23:28 UTC (permalink / raw)
  To: Yonatan Cohen, Andrew Boyer, monis-VPRAkNaXOzVWk0Htik3J/w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <975a89bc-033e-14ab-72f2-4244c0205e59-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>



On 11/22/2016 07:21 AM, Yonatan Cohen wrote:
> On 11/18/2016 4:36 PM, Andrew Boyer wrote:
>> Avoid smashing the stack when an ICRC error occurs on an IPv6 network.
>>
>> Signed-off-by: Andrew Boyer <andrew.boyer-8PEkshWhKlo@public.gmane.org>
>> ---
>>  drivers/infiniband/sw/rxe/rxe_recv.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c
>> b/drivers/infiniband/sw/rxe/rxe_recv.c
>> index 46f0628..b40ab8d 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_recv.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_recv.c
>> @@ -391,7 +391,7 @@ int rxe_rcv(struct sk_buff *skb)
>>                   payload_size(pkt));
>>      calc_icrc = cpu_to_be32(~calc_icrc);
>>      if (unlikely(calc_icrc != pack_icrc)) {
>> -        char saddr[sizeof(struct in6_addr)];
>> +        char saddr[64];
>>
>>          if (skb->protocol == htons(ETH_P_IPV6))
>>              sprintf(saddr, "%pI6", &ipv6_hdr(skb)->saddr);
>>
> you fixed a bug here but i think the following would be better
> than hard coding 64 bytes on the stack
>
> --- a/drivers/infiniband/sw/rxe/rxe_recv.c
> +++ b/drivers/infiniband/sw/rxe/rxe_recv.c
> @@ -391,16 +391,14 @@ int rxe_rcv(struct sk_buff *skb)
>                              payload_size(pkt));
>         calc_icrc = cpu_to_be32(~calc_icrc);
>         if (unlikely(calc_icrc != pack_icrc)) {
> -               char saddr[sizeof(struct in6_addr)];
>
>                 if (skb->protocol == htons(ETH_P_IPV6))
> -                       sprintf(saddr, "%pI6", &ipv6_hdr(skb)->saddr);
> +                       pr_warn_ratelimited("bad ICRC from %pI6\n",
> &ipv6_hdr(skb)->saddr);
>                 else if (skb->protocol == htons(ETH_P_IP))
> -                       sprintf(saddr, "%pI4", &ip_hdr(skb)->saddr);
> +                       pr_warn_ratelimited("bad ICRC from %pI4\n",
> &ip_hdr(skb)->saddr);
>                 else
> -                       sprintf(saddr, "unknown");
> +                       pr_warn_ratelimited("bad ICRC from unknown\n");
>
> -               pr_warn_ratelimited("bad ICRC from %s\n", saddr);
>                 goto drop;
>         }

Hello Yonatan,

Apparently our e-mails crossed. Anyway, have you considered to use %pIS 
instead of %pI4 / %pI6? See also 
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=1067964305df.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v5 9/9] selinux: Add a cache for quicker retreival of PKey SIDs
From: Daniel Jurgens @ 2016-11-22 23:21 UTC (permalink / raw)
  To: James Morris
  Cc: chrisw-69jw2NvuJkxg9hUCZPvPmw@public.gmane.org,
	paul-r2n+y4ga6xFZroRs9YW3xA@public.gmane.org,
	sds-+05T5uksL2qpZYMLLGbcSA@public.gmane.org,
	eparis-FjpueFixGhCM4zKIHC2jIg@public.gmane.org,
	dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org,
	hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
	selinux-+05T5uksL2qpZYMLLGbcSA@public.gmane.org,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Yevgeny Petrilin
In-Reply-To: <alpine.LRH.2.20.1611230948300.3895@namei.org>

On 11/22/2016 4:53 PM, James Morris wrote:
> On Tue, 22 Nov 2016, Dan Jurgens wrote:
>
>> +static int sel_pkey_sid_slow(u64 subnet_prefix, u16 pkey_num, u32 *sid)
>> +{
>> +	int ret = -ENOMEM;
>> +	struct sel_pkey *pkey;
>> +	struct sel_pkey *new = NULL;
>> +	unsigned long flags;
>> +
>> +	spin_lock_irqsave(&sel_pkey_lock, flags);
>> +	pkey = sel_pkey_find(subnet_prefix, pkey_num);
>> +	if (pkey) {
>> +		*sid = pkey->psec.sid;
>> +		spin_unlock_irqrestore(&sel_pkey_lock, flags);
>> +		return 0;
>> +	}
>> +
>> +	ret = security_pkey_sid(subnet_prefix, pkey_num, sid);
>> +	if (ret != 0)
>> +		goto out;
>> +
>> +	new = kzalloc(sizeof(*new), GFP_ATOMIC);
>> +	if (!new)
>> +		goto out;
>> +
>> +	new->psec.subnet_prefix = subnet_prefix;
>> +	new->psec.pkey = pkey_num;
>> +	new->psec.sid = *sid;
>> +	sel_pkey_insert(new);
>> +
>> +out:
>> +	spin_unlock_irqrestore(&sel_pkey_lock, flags);
>> +	if (unlikely(ret))
>> +		kfree(new);
>> +
>> +	return ret;
> The error handling is messed up here.
>
> You'll return 0 on kzalloc failure.
>
> Also, if security_pkey_sid fails, you'll attempt to kfree 'new', and you 
> don't need an unlikely() in a slow path.
>
>
Right.  I'll remove the if/kfree in the out path completely.  There's no way ret becomes non-zero after the kzalloc.  If the kzalloc fails I'll still have it return 0, the SID retrieved is valid, it just won't be added the cache so returning with an error is overkill.

Thank you for your review.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC 02/10] IB/hfi-vnic: Virtual Network Interface Controller (VNIC) Bus driver
From: Christoph Lameter @ 2016-11-22 23:04 UTC (permalink / raw)
  To: Vishwanathapura, Niranjana
  Cc: Jason Gunthorpe, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Dennis Dalessandro
In-Reply-To: <20161122194918.GA69241-wPcXA7LoDC+1XWohqUldA0EOCMrvLtNR@public.gmane.org>

On Tue, 22 Nov 2016, Vishwanathapura, Niranjana wrote:

> Ok, I do understand Jason's point that we should probably not put this driver
> under drivers/infiniband/sw/.., as this driver is not a HCA.
> It is an ULP similar to ipoib, built on top of Omni-path irrespective of
> whether we register a hfi_vnic_bus or a direct custom interface with HFI1.
> This ULP will transmit and recieve Omni-path packets over the fabric, and is
> dependent on IB MAD interface and the HFI1 driver.

This is something that encapsulates IP (v4 right?) in something else.
Would belong into

	linux/net/ipv4

You already have similar implementations there

See f.e. ipip.c, ip_tunnel.c and lots more (try
	ls linux/net/ipv4/*tunnel*

)

If this is more like a device then it would belong into

linux/drivers/net/hfi or so (see also linux/drivers/net/ppp, plip,
loopback, etc etc)



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v5 8/9] selinux: Add IB Port SMP access vector
From: Daniel Jurgens @ 2016-11-22 23:00 UTC (permalink / raw)
  To: James Morris
  Cc: chrisw@sous-sol.org, paul@paul-moore.com, sds@tycho.nsa.gov,
	eparis@parisplace.org, dledford@redhat.com, sean.hefty@intel.com,
	hal.rosenstock@gmail.com, selinux@tycho.nsa.gov,
	linux-security-module@vger.kernel.org, linux-rdma@vger.kernel.org,
	Yevgeny Petrilin
In-Reply-To: <alpine.LRH.2.20.1611230945140.3895@namei.org>

On 11/22/2016 4:47 PM, James Morris wrote:
> On Tue, 22 Nov 2016, Dan Jurgens wrote:
>
>> +		*out_sid = c->sid[0];
>> +	} else {
>> +		*out_sid = SECINITSID_UNLABELED;
>> +	}
> Per previous comment about the braces.
The coding style says if one branch requires brackets they should be used on all branches:

"This does not apply if only one branch of a conditional statement is a single statement; in the latter case use braces in both branches: "


^ permalink raw reply

* Re: [PATCH v5 9/9] selinux: Add a cache for quicker retreival of PKey SIDs
From: James Morris @ 2016-11-22 22:53 UTC (permalink / raw)
  To: Dan Jurgens
  Cc: chrisw, paul, sds, eparis, dledford, sean.hefty, hal.rosenstock,
	selinux, linux-security-module, linux-rdma, yevgenyp
In-Reply-To: <1479843638-67136-10-git-send-email-danielj@mellanox.com>

On Tue, 22 Nov 2016, Dan Jurgens wrote:

> +static int sel_pkey_sid_slow(u64 subnet_prefix, u16 pkey_num, u32 *sid)
> +{
> +	int ret = -ENOMEM;
> +	struct sel_pkey *pkey;
> +	struct sel_pkey *new = NULL;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&sel_pkey_lock, flags);
> +	pkey = sel_pkey_find(subnet_prefix, pkey_num);
> +	if (pkey) {
> +		*sid = pkey->psec.sid;
> +		spin_unlock_irqrestore(&sel_pkey_lock, flags);
> +		return 0;
> +	}
> +
> +	ret = security_pkey_sid(subnet_prefix, pkey_num, sid);
> +	if (ret != 0)
> +		goto out;
> +
> +	new = kzalloc(sizeof(*new), GFP_ATOMIC);
> +	if (!new)
> +		goto out;
> +
> +	new->psec.subnet_prefix = subnet_prefix;
> +	new->psec.pkey = pkey_num;
> +	new->psec.sid = *sid;
> +	sel_pkey_insert(new);
> +
> +out:
> +	spin_unlock_irqrestore(&sel_pkey_lock, flags);
> +	if (unlikely(ret))
> +		kfree(new);
> +
> +	return ret;

The error handling is messed up here.

You'll return 0 on kzalloc failure.

Also, if security_pkey_sid fails, you'll attempt to kfree 'new', and you 
don't need an unlikely() in a slow path.


-- 
James Morris
<jmorris@namei.org>


^ permalink raw reply

* Re: [PATCH v5 8/9] selinux: Add IB Port SMP access vector
From: James Morris @ 2016-11-22 22:45 UTC (permalink / raw)
  To: Dan Jurgens
  Cc: chrisw-69jw2NvuJkxg9hUCZPvPmw, paul-r2n+y4ga6xFZroRs9YW3xA,
	sds-+05T5uksL2qpZYMLLGbcSA, eparis-FjpueFixGhCM4zKIHC2jIg,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w,
	selinux-+05T5uksL2qpZYMLLGbcSA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	yevgenyp-VPRAkNaXOzVWk0Htik3J/w
In-Reply-To: <1479843638-67136-9-git-send-email-danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

On Tue, 22 Nov 2016, Dan Jurgens wrote:

> +		*out_sid = c->sid[0];
> +	} else {
> +		*out_sid = SECINITSID_UNLABELED;
> +	}

Per previous comment about the braces.


-- 
James Morris
<jmorris-gx6/JNMH7DfYtjvyW6yDsg@public.gmane.org>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v5 7/9] selinux: Implement Infiniband PKey "Access" access vector
From: James Morris @ 2016-11-22 22:43 UTC (permalink / raw)
  To: Dan Jurgens
  Cc: chrisw, paul, sds, eparis, dledford, sean.hefty, hal.rosenstock,
	selinux, linux-security-module, linux-rdma, yevgenyp
In-Reply-To: <1479843638-67136-8-git-send-email-danielj@mellanox.com>

On Tue, 22 Nov 2016, Dan Jurgens wrote:

>  		struct file *file;
> +		struct lsm_pkey_audit *pkey;
>  	} u;
>  	/* this union contains LSM specific data */
>  	union {
> diff --git a/security/lsm_audit.c b/security/lsm_audit.c
> index 37f04da..b18d277 100644
> --- a/security/lsm_audit.c
> +++ b/security/lsm_audit.c
> @@ -410,6 +410,19 @@ static void dump_common_audit_data(struct audit_buffer *ab,
>  		audit_log_format(ab, " kmod=");
>  		audit_log_untrustedstring(ab, a->u.kmod_name);
>  		break;
> +	case LSM_AUDIT_DATA_PKEY: {
> +		struct in6_addr sbn_pfx;
> +
> +		memset(&sbn_pfx.s6_addr, 0,
> +		       sizeof(sbn_pfx.s6_addr));
> +
> +		memcpy(&sbn_pfx.s6_addr, &a->u.pkey->subnet_prefix,
> +		       sizeof(a->u.pkey->subnet_prefix));
> +
> +		audit_log_format(ab, " pkey=0x%x subnet_prefix=%pI6c",
> +				 a->u.pkey->pkey, &sbn_pfx);
> +		break;

Please do not add include extraneous empty lines in the code.


> index d87e29d..e21f7690 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -6086,6 +6086,28 @@ static int selinux_key_getsecurity(struct key *key, char **_buffer)
>  #endif
>  
>  #ifdef CONFIG_SECURITY_INFINIBAND
> +static int selinux_ib_pkey_access(void *ib_sec, u64 subnet_prefix, u16 pkey_val)
> +{
> +	struct common_audit_data ad;
> +	int err;
> +	u32 sid = 0;
> +	struct ib_security_struct *sec = ib_sec;
> +	struct lsm_pkey_audit pkey;
> +
> +	err = security_pkey_sid(subnet_prefix, pkey_val, &sid);
> +
> +	if (err)
> +		return err;

Ditto.

> +		}
> +		*out_sid = c->sid[0];
> +	} else {
> +		*out_sid = SECINITSID_UNLABELED;
> +	}

Also don't add extraneous braces.  It makes the code more difficult to 
review ("is something missing or malformatted here?").


-- 
James Morris
<jmorris@namei.org>


^ permalink raw reply

* Re: [PATCH v5 6/9] selinux: Allocate and free infiniband security hooks
From: James Morris @ 2016-11-22 22:36 UTC (permalink / raw)
  To: Dan Jurgens
  Cc: chrisw, paul, sds, eparis, dledford, sean.hefty, hal.rosenstock,
	selinux, linux-security-module, linux-rdma, yevgenyp
In-Reply-To: <1479843638-67136-7-git-send-email-danielj@mellanox.com>

On Tue, 22 Nov 2016, Dan Jurgens wrote:

> From: Daniel Jurgens <danielj@mellanox.com>
> 
> Implement and attach hooks to allocate and free Infiniband object
> security structures.
> 
> Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
> 


Reviewed-by: James Morris <james.l.morris@oracle.com>

-- 
James Morris
<jmorris@namei.org>


^ permalink raw reply

* Re: [PATCH v5 4/9] IB/core: Enforce security on management datagrams
From: James Morris @ 2016-11-22 22:34 UTC (permalink / raw)
  To: Dan Jurgens
  Cc: chrisw, paul, sds, eparis, dledford, sean.hefty, hal.rosenstock,
	selinux, linux-security-module, linux-rdma, yevgenyp
In-Reply-To: <1479843638-67136-5-git-send-email-danielj@mellanox.com>

On Tue, 22 Nov 2016, Dan Jurgens wrote:

> From: Daniel Jurgens <danielj@mellanox.com>
> 
> Allocate and free a security context when creating and destroying a MAD
> agent.  This context is used for controlling access to PKeys and sending
> and receiving SMPs.


Acked-by: James Morris <james.l.morris@oracle.com>

-- 
James Morris
<jmorris@namei.org>


^ permalink raw reply

* Re: [PATCH v5 3/9] selinux lsm IB/core: Implement LSM notification system
From: James Morris @ 2016-11-22 22:31 UTC (permalink / raw)
  To: Dan Jurgens
  Cc: chrisw, paul, sds, eparis, dledford, sean.hefty, hal.rosenstock,
	selinux, linux-security-module, linux-rdma, yevgenyp
In-Reply-To: <1479843638-67136-4-git-send-email-danielj@mellanox.com>

On Tue, 22 Nov 2016, Dan Jurgens wrote:

> From: Daniel Jurgens <danielj@mellanox.com>
> 
> Add a generic notificaiton mechanism in the LSM. Interested consumers
> can register a callback with the LSM and security modules can produce
> events.
> 
> Because access to Infiniband QPs are enforced in the setup phase of a
> connection security should be enforced again if the policy changes.
> Register infiniband devices for policy change notification and check all
> QPs on that device when the notification is received.
> 
> Add a call to the notification mechanism from SELinux when the AVC
> cache changes or setenforce is cleared.
> 
> Signed-off-by: Daniel Jurgens <danielj@mellanox.com>


Acked-by: James Morris <james.l.morris@oracle.com>


-- 
James Morris
<jmorris@namei.org>


^ permalink raw reply

* Re: [PATCH v5 2/9] IB/core: Enforce PKey security on QPs
From: James Morris @ 2016-11-22 22:26 UTC (permalink / raw)
  To: Dan Jurgens
  Cc: chrisw, paul, sds, eparis, dledford, sean.hefty, hal.rosenstock,
	selinux, linux-security-module, linux-rdma, yevgenyp
In-Reply-To: <1479843638-67136-3-git-send-email-danielj@mellanox.com>

On Tue, 22 Nov 2016, Dan Jurgens wrote:

> From: Daniel Jurgens <danielj@mellanox.com>
> 
> Add new LSM hooks to allocate and free security contexts and check for
> permission to access a PKey.

I guess Doug's is best tree for these patches?

> ---
> v2:
> - Squashed LSM hook additions. Paul Moore
> - Changed security blobs to void*. Paul Moore
> 
> v3:
> - Change parameter order of pkey_access hook. Paul Moore
> ---
>  drivers/infiniband/core/Makefile     |   3 +-
>  drivers/infiniband/core/cache.c      |  21 +-
>  drivers/infiniband/core/core_priv.h  |  77 +++++
>  drivers/infiniband/core/device.c     |  33 ++
>  drivers/infiniband/core/security.c   | 617 +++++++++++++++++++++++++++++++++++
>  drivers/infiniband/core/uverbs_cmd.c |  20 +-
>  drivers/infiniband/core/verbs.c      |  27 +-
>  include/linux/lsm_hooks.h            |  27 ++
>  include/linux/security.h             |  21 ++
>  include/rdma/ib_verbs.h              |  48 +++
>  security/Kconfig                     |   9 +
>  security/security.c                  |  31 ++
>  12 files changed, 925 insertions(+), 9 deletions(-)
>  create mode 100644 drivers/infiniband/core/security.c


Acked-by: James Morris <james.l.morris@oracle.com>


-- 
James Morris
<jmorris@namei.org>


^ permalink raw reply

* Re: Enabling peer to peer device transactions for PCIe devices
From: Sagalovitch, Serguei @ 2016-11-22 22:21 UTC (permalink / raw)
  To: Dan Williams, Daniel Vetter
  Cc: linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Kuehling, Felix, Dave Hansen, Blinzer, Paul,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org,
	Sander, Ben, Suthikulpanit, Suravee, Deucher, Alexander,
	Koenig, Christian,
	Linux-media-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <CAPcyv4ind0fxek7g25MX=49rDfT5X151tb4=TYudMBmUJFZZNQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

> I don't think we should be using numa distance to reverse engineer a
> certain allocation behavior.  The latency data should be truthful, but
> you're right we'll need a mechanism to keep general purpose
> allocations out of that range by default. 

Just to clarify: Do you propose/thinking to utilize NUMA API for 
such (VRAM) allocations? 



    

^ permalink raw reply

* Re: [PATCH 3/3] IB/multicast: Check ib_find_pkey() return value
From: Bart Van Assche @ 2016-11-22 22:13 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Sean Hefty
In-Reply-To: <20161122071145.GD23083-2ukJVAZIZ/Y@public.gmane.org>

On 11/21/2016 11:11 PM, Leon Romanovsky wrote:
> Don't you need to update process_group_error() function? It has code
> flows with ib_find_pkey() without checking return value, which can be
> anything too.

Hello Leon,

I'll leave that to someone who is familiar with the multicast code.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 1/7] IB/rxe: Allocate enough space for an IPv6 addr
From: Bart Van Assche @ 2016-11-22 22:11 UTC (permalink / raw)
  To: Andrew Boyer, monis-VPRAkNaXOzVWk0Htik3J/w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1479479809-10798-1-git-send-email-andrew.boyer-8PEkshWhKlo@public.gmane.org>

On 11/18/2016 06:36 AM, Andrew Boyer wrote:
> Avoid smashing the stack when an ICRC error occurs on an IPv6 network.
>
> Signed-off-by: Andrew Boyer <andrew.boyer-8PEkshWhKlo@public.gmane.org>
> ---
>  drivers/infiniband/sw/rxe/rxe_recv.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
> index 46f0628..b40ab8d 100644
> --- a/drivers/infiniband/sw/rxe/rxe_recv.c
> +++ b/drivers/infiniband/sw/rxe/rxe_recv.c
> @@ -391,7 +391,7 @@ int rxe_rcv(struct sk_buff *skb)
>  			     payload_size(pkt));
>  	calc_icrc = cpu_to_be32(~calc_icrc);
>  	if (unlikely(calc_icrc != pack_icrc)) {
> -		char saddr[sizeof(struct in6_addr)];
> +		char saddr[64];
>
>  		if (skb->protocol == htons(ETH_P_IPV6))
>  			sprintf(saddr, "%pI6", &ipv6_hdr(skb)->saddr);

Hello Andrew,

Since you are touching this code please also switch from sprintf() to 
snprintf().

Thanks,

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2 02/11] IB/core: Change port_attr.sm_lid from 16 to 32 bits
From: Jason Gunthorpe @ 2016-11-22 21:24 UTC (permalink / raw)
  To: Chandramouli, Dasaratharaman
  Cc: Ira Weiny, Don Hiatt, linux-rdma, Doug Ledford
In-Reply-To: <cb14f866-0e19-1631-745d-5c9d67aabfe8-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

On Tue, Nov 22, 2016 at 01:13:26PM -0800, Chandramouli, Dasaratharaman wrote:
> 
> 
> On 11/22/2016 12:57 PM, Jason Gunthorpe wrote:
> >On Tue, Nov 22, 2016 at 02:38:43PM -0500, Dasaratharaman Chandramouli wrote:
> >>+++ b/drivers/infiniband/core/sa_query.c
> >>@@ -958,7 +958,7 @@ static void update_sm_ah(struct work_struct *work)
> >> 		pr_err("Couldn't find index for default PKey\n");
> >>
> >> 	memset(&ah_attr, 0, sizeof ah_attr);
> >>-	ah_attr.dlid     = port_attr.sm_lid;
> >>+	ah_attr.dlid     = (u16)port_attr.sm_lid;
> >
> >Why are we dropping bits here?
> 
> Patch 3 increases ah_attr.dlid to 32 bits and the typecast is dropped.
> We added it in this patch just to avoid compiler warnings, if any.

It would be alot better to fix this series so adding typecasts and
then dropping them didn't happen so extensively. Just reodering some
patched would do the trick. That would make it alot less churny and
easy to read..

> >>-	sport->sm_lid = port_attr.sm_lid;
> >>+	sport->sm_lid = (u16)port_attr.sm_lid;
> >> 	sport->lid = port_attr.lid;
> >
> >And here..

> Patch 11 increases lid and sm_lid fields in struct srpt_port and the
> typecast is dropped.  We added it in this patch just to avoid
> compiler warnings, if any.

Eg if you do patch 11 first this hunk goes away.

I also don't really care for all the added u16 casts, the compiler
doesn't warn on implicit demotion without a higher warning level...

> >
> >>+#define OPA_TO_IB_UCAST_LID(x)	(((x) >= be16_to_cpu(IB_MULTICAST_LID_BASE)) \
> >>+				 ? 0 : x)
> >
> >static inline function please.
> Will change. Thanks.

All of them please.

And I think you should re-think how this is being used, the pattern
around OPA_TO_IB_UCAST_LID is copied too many times for my liking, and
isn't this UAPI compatability only?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Enabling peer to peer device transactions for PCIe devices
From: Dan Williams @ 2016-11-22 21:21 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Dave Hansen, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Kuehling, Felix, Serguei Sagalovitch, Blinzer, Paul,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org,
	Sander, Ben, Suthikulpanit, Suravee, Deucher, Alexander,
	Koenig, Christian,
	Linux-media-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <CAKMK7uF+k5LvcPEHvtdcXQFrpKVbFxwZ32EexoU3rZ9LFhVSow-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Tue, Nov 22, 2016 at 1:03 PM, Daniel Vetter <daniel-/w4YWyX8dFk@public.gmane.org> wrote:
> On Tue, Nov 22, 2016 at 9:35 PM, Serguei Sagalovitch
> <serguei.sagalovitch-5C7GfCeVMHo@public.gmane.org> wrote:
>>
>> On 2016-11-22 03:10 PM, Daniel Vetter wrote:
>>>
>>> On Tue, Nov 22, 2016 at 9:01 PM, Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>>> wrote:
>>>>
>>>> On Tue, Nov 22, 2016 at 10:59 AM, Serguei Sagalovitch
>>>> <serguei.sagalovitch-5C7GfCeVMHo@public.gmane.org> wrote:
>>>>>
>>>>> I personally like "device-DAX" idea but my concerns are:
>>>>>
>>>>> -  How well it will co-exists with the  DRM infrastructure /
>>>>> implementations
>>>>>     in part dealing with CPU pointers?
>>>>
>>>> Inside the kernel a device-DAX range is "just memory" in the sense
>>>> that you can perform pfn_to_page() on it and issue I/O, but the vma is
>>>> not migratable. To be honest I do not know how well that co-exists
>>>> with drm infrastructure.
>>>>
>>>>> -  How well we will be able to handle case when we need to
>>>>> "move"/"evict"
>>>>>     memory/data to the new location so CPU pointer should point to the
>>>>> new
>>>>> physical location/address
>>>>>      (and may be not in PCI device memory at all)?
>>>>
>>>> So, device-DAX deliberately avoids support for in-kernel migration or
>>>> overcommit. Those cases are left to the core mm or drm. The device-dax
>>>> interface is for cases where all that is needed is a direct-mapping to
>>>> a statically-allocated physical-address range be it persistent memory
>>>> or some other special reserved memory range.
>>>
>>> For some of the fancy use-cases (e.g. to be comparable to what HMM can
>>> pull off) I think we want all the magic in core mm, i.e. migration and
>>> overcommit. At least that seems to be the very strong drive in all
>>> general-purpose gpu abstractions and implementations, where memory is
>>> allocated with malloc, and then mapped/moved into vram/gpu address
>>> space through some magic,
>>
>> It is possible that there is other way around: memory is requested to be
>> allocated and should be kept in vram for  performance reason but due
>> to possible overcommit case we need at least temporally to "move" such
>> allocation to system memory.
>
> With migration I meant migrating both ways of course. And with stuff
> like numactl we can also influence where exactly the malloc'ed memory
> is allocated originally, at least if we'd expose the vram range as a
> very special numa node that happens to be far away and not hold any
> cpu cores.

I don't think we should be using numa distance to reverse engineer a
certain allocation behavior.  The latency data should be truthful, but
you're right we'll need a mechanism to keep general purpose
allocations out of that range by default. Btw, strict isolation is
another design point of device-dax, but I think in this case we're
describing something between the two extremes of full isolation and
full compatibility with existing numactl apis.

^ permalink raw reply

* Re: [PATCH v2 10/11] IB/IPoIB: Modify ipoib_get_net_dev_by_params to lookup gid table
From: Jason Gunthorpe @ 2016-11-22 21:20 UTC (permalink / raw)
  To: Dasaratharaman Chandramouli
  Cc: Ira Weiny, Don Hiatt, linux-rdma, Doug Ledford
In-Reply-To: <1479843532-47496-11-git-send-email-dasaratharaman.chandramouli-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

On Tue, Nov 22, 2016 at 02:38:51PM -0500, Dasaratharaman Chandramouli wrote:
> ipoib_get_net_dev_by_params compares incoming gid with local_gid
> which is gid at index 0 of the gid table. OPA devices using larger
> LIDs may have a different GID format than whats setup in the local_gid
> field. Do a search of the gid table in those cases.

Sorry, NAK, the incoming GID has to exactly match what is in IPoIB
LLADDR, you can't do this or it breaks how the virtualization stuff is
supposed to work with gid aliases.

I'm skeptical about the other ipoib patch as well since ipoib should
be dealing natively with 32 bit lids, I think you need to fix the path
record stuff too instead of opening coding that in ipoib.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Enabling peer to peer device transactions for PCIe devices
From: Serguei Sagalovitch @ 2016-11-22 21:13 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Kuehling, Felix, Dave Hansen, Blinzer, Paul,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org,
	Sander, Ben, Suthikulpanit, Suravee, Deucher, Alexander,
	Koenig, Christian,
	Linux-media-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <CAKMK7uF+k5LvcPEHvtdcXQFrpKVbFxwZ32EexoU3rZ9LFhVSow-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>



On 2016-11-22 04:03 PM, Daniel Vetter wrote:
> On Tue, Nov 22, 2016 at 9:35 PM, Serguei Sagalovitch
> <serguei.sagalovitch-5C7GfCeVMHo@public.gmane.org> wrote:
>> On 2016-11-22 03:10 PM, Daniel Vetter wrote:
>>> On Tue, Nov 22, 2016 at 9:01 PM, Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>>> wrote:
>>>> On Tue, Nov 22, 2016 at 10:59 AM, Serguei Sagalovitch
>>>> <serguei.sagalovitch-5C7GfCeVMHo@public.gmane.org> wrote:
>>>>> I personally like "device-DAX" idea but my concerns are:
>>>>>
>>>>> -  How well it will co-exists with the  DRM infrastructure /
>>>>> implementations
>>>>>      in part dealing with CPU pointers?
>>>> Inside the kernel a device-DAX range is "just memory" in the sense
>>>> that you can perform pfn_to_page() on it and issue I/O, but the vma is
>>>> not migratable. To be honest I do not know how well that co-exists
>>>> with drm infrastructure.
>>>>
>>>>> -  How well we will be able to handle case when we need to
>>>>> "move"/"evict"
>>>>>      memory/data to the new location so CPU pointer should point to the
>>>>> new
>>>>> physical location/address
>>>>>       (and may be not in PCI device memory at all)?
>>>> So, device-DAX deliberately avoids support for in-kernel migration or
>>>> overcommit. Those cases are left to the core mm or drm. The device-dax
>>>> interface is for cases where all that is needed is a direct-mapping to
>>>> a statically-allocated physical-address range be it persistent memory
>>>> or some other special reserved memory range.
>>> For some of the fancy use-cases (e.g. to be comparable to what HMM can
>>> pull off) I think we want all the magic in core mm, i.e. migration and
>>> overcommit. At least that seems to be the very strong drive in all
>>> general-purpose gpu abstractions and implementations, where memory is
>>> allocated with malloc, and then mapped/moved into vram/gpu address
>>> space through some magic,
>> It is possible that there is other way around: memory is requested to be
>> allocated and should be kept in vram for  performance reason but due
>> to possible overcommit case we need at least temporally to "move" such
>> allocation to system memory.
> With migration I meant migrating both ways of course. And with stuff
> like numactl we can also influence where exactly the malloc'ed memory
> is allocated originally, at least if we'd expose the vram range as a
> very special numa node that happens to be far away and not hold any
> cpu cores.
> -Daniel
One additional item to consider: it is not only "plain" numa case where
we could have different  performance for access but also possibility that
we will not have access at all (or write only access) particular if PCIe
devices belong to different root complex. I must admit that I do not know
how to  detect reliably such cases in the kernel.

^ permalink raw reply

* Re: [PATCH v2 02/11] IB/core: Change port_attr.sm_lid from 16 to 32 bits
From: Chandramouli, Dasaratharaman @ 2016-11-22 21:13 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Ira Weiny, Don Hiatt, linux-rdma, Doug Ledford
In-Reply-To: <20161122205717.GA6484-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>



On 11/22/2016 12:57 PM, Jason Gunthorpe wrote:
> On Tue, Nov 22, 2016 at 02:38:43PM -0500, Dasaratharaman Chandramouli wrote:
>> +++ b/drivers/infiniband/core/sa_query.c
>> @@ -958,7 +958,7 @@ static void update_sm_ah(struct work_struct *work)
>>  		pr_err("Couldn't find index for default PKey\n");
>>
>>  	memset(&ah_attr, 0, sizeof ah_attr);
>> -	ah_attr.dlid     = port_attr.sm_lid;
>> +	ah_attr.dlid     = (u16)port_attr.sm_lid;
>
> Why are we dropping bits here?

Patch 3 increases ah_attr.dlid to 32 bits and the typecast is dropped.
We added it in this patch just to avoid compiler warnings, if any.
>
>> +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
>> @@ -514,7 +514,7 @@ static int srpt_refresh_port(struct srpt_port *sport)
>>  	if (ret)
>>  		goto err_query_port;
>>
>> -	sport->sm_lid = port_attr.sm_lid;
>> +	sport->sm_lid = (u16)port_attr.sm_lid;
>>  	sport->lid = port_attr.lid;
>
> And here..
Patch 11 increases lid and sm_lid fields in struct srpt_port and the 
typecast is dropped.
We added it in this patch just to avoid compiler warnings, if any.
>
>> +#if !defined(OPA_ADDR_H)
>> +#define OPA_ADDR_H
>
> I don't think we need a header file for this.
There are other helpers specific to OPA addresses that were required in 
later patches which we thought can go in a separate file.
>
>> +#define OPA_TO_IB_UCAST_LID(x)	(((x) >= be16_to_cpu(IB_MULTICAST_LID_BASE)) \
>> +				 ? 0 : x)
>
> static inline function please.
Will change. Thanks.
>
> Jason
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Enabling peer to peer device transactions for PCIe devices
From: Daniel Vetter @ 2016-11-22 21:03 UTC (permalink / raw)
  To: Serguei Sagalovitch
  Cc: linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Kuehling, Felix, Dave Hansen, Blinzer, Paul,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org,
	Sander, Ben, Suthikulpanit, Suravee, Deucher, Alexander,
	Koenig, Christian,
	Linux-media-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <a99fd9ea-64d8-c5d3-0b96-f96c92369601-5C7GfCeVMHo@public.gmane.org>

On Tue, Nov 22, 2016 at 9:35 PM, Serguei Sagalovitch
<serguei.sagalovitch-5C7GfCeVMHo@public.gmane.org> wrote:
>
> On 2016-11-22 03:10 PM, Daniel Vetter wrote:
>>
>> On Tue, Nov 22, 2016 at 9:01 PM, Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>> wrote:
>>>
>>> On Tue, Nov 22, 2016 at 10:59 AM, Serguei Sagalovitch
>>> <serguei.sagalovitch-5C7GfCeVMHo@public.gmane.org> wrote:
>>>>
>>>> I personally like "device-DAX" idea but my concerns are:
>>>>
>>>> -  How well it will co-exists with the  DRM infrastructure /
>>>> implementations
>>>>     in part dealing with CPU pointers?
>>>
>>> Inside the kernel a device-DAX range is "just memory" in the sense
>>> that you can perform pfn_to_page() on it and issue I/O, but the vma is
>>> not migratable. To be honest I do not know how well that co-exists
>>> with drm infrastructure.
>>>
>>>> -  How well we will be able to handle case when we need to
>>>> "move"/"evict"
>>>>     memory/data to the new location so CPU pointer should point to the
>>>> new
>>>> physical location/address
>>>>      (and may be not in PCI device memory at all)?
>>>
>>> So, device-DAX deliberately avoids support for in-kernel migration or
>>> overcommit. Those cases are left to the core mm or drm. The device-dax
>>> interface is for cases where all that is needed is a direct-mapping to
>>> a statically-allocated physical-address range be it persistent memory
>>> or some other special reserved memory range.
>>
>> For some of the fancy use-cases (e.g. to be comparable to what HMM can
>> pull off) I think we want all the magic in core mm, i.e. migration and
>> overcommit. At least that seems to be the very strong drive in all
>> general-purpose gpu abstractions and implementations, where memory is
>> allocated with malloc, and then mapped/moved into vram/gpu address
>> space through some magic,
>
> It is possible that there is other way around: memory is requested to be
> allocated and should be kept in vram for  performance reason but due
> to possible overcommit case we need at least temporally to "move" such
> allocation to system memory.

With migration I meant migrating both ways of course. And with stuff
like numactl we can also influence where exactly the malloc'ed memory
is allocated originally, at least if we'd expose the vram range as a
very special numa node that happens to be far away and not hold any
cpu cores.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply

* Re: [PATCH v2 02/11] IB/core: Change port_attr.sm_lid from 16 to 32 bits
From: Jason Gunthorpe @ 2016-11-22 20:57 UTC (permalink / raw)
  To: Dasaratharaman Chandramouli
  Cc: Ira Weiny, Don Hiatt, linux-rdma, Doug Ledford
In-Reply-To: <1479843532-47496-3-git-send-email-dasaratharaman.chandramouli-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

On Tue, Nov 22, 2016 at 02:38:43PM -0500, Dasaratharaman Chandramouli wrote:
> +++ b/drivers/infiniband/core/sa_query.c
> @@ -958,7 +958,7 @@ static void update_sm_ah(struct work_struct *work)
>  		pr_err("Couldn't find index for default PKey\n");
>  
>  	memset(&ah_attr, 0, sizeof ah_attr);
> -	ah_attr.dlid     = port_attr.sm_lid;
> +	ah_attr.dlid     = (u16)port_attr.sm_lid;

Why are we dropping bits here?

> +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
> @@ -514,7 +514,7 @@ static int srpt_refresh_port(struct srpt_port *sport)
>  	if (ret)
>  		goto err_query_port;
>  
> -	sport->sm_lid = port_attr.sm_lid;
> +	sport->sm_lid = (u16)port_attr.sm_lid;
>  	sport->lid = port_attr.lid;

And here..

> +#if !defined(OPA_ADDR_H)
> +#define OPA_ADDR_H

I don't think we need a header file for this.

> +#define OPA_TO_IB_UCAST_LID(x)	(((x) >= be16_to_cpu(IB_MULTICAST_LID_BASE)) \
> +				 ? 0 : x)

static inline function please.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Enabling peer to peer device transactions for PCIe devices
From: Serguei Sagalovitch @ 2016-11-22 20:35 UTC (permalink / raw)
  To: Daniel Vetter, Dan Williams
  Cc: linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Kuehling, Felix, Dave Hansen, Blinzer, Paul,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org,
	Sander, Ben, Suthikulpanit, Suravee, Deucher, Alexander,
	Koenig, Christian,
	Linux-media-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <CAKMK7uGoXAYoazyGLbGU7svVD10WmaBtpko8BpHeNpRhST8F7g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>



On 2016-11-22 03:10 PM, Daniel Vetter wrote:
> On Tue, Nov 22, 2016 at 9:01 PM, Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
>> On Tue, Nov 22, 2016 at 10:59 AM, Serguei Sagalovitch
>> <serguei.sagalovitch-5C7GfCeVMHo@public.gmane.org> wrote:
>>> I personally like "device-DAX" idea but my concerns are:
>>>
>>> -  How well it will co-exists with the  DRM infrastructure / implementations
>>>     in part dealing with CPU pointers?
>> Inside the kernel a device-DAX range is "just memory" in the sense
>> that you can perform pfn_to_page() on it and issue I/O, but the vma is
>> not migratable. To be honest I do not know how well that co-exists
>> with drm infrastructure.
>>
>>> -  How well we will be able to handle case when we need to "move"/"evict"
>>>     memory/data to the new location so CPU pointer should point to the new
>>> physical location/address
>>>      (and may be not in PCI device memory at all)?
>> So, device-DAX deliberately avoids support for in-kernel migration or
>> overcommit. Those cases are left to the core mm or drm. The device-dax
>> interface is for cases where all that is needed is a direct-mapping to
>> a statically-allocated physical-address range be it persistent memory
>> or some other special reserved memory range.
> For some of the fancy use-cases (e.g. to be comparable to what HMM can
> pull off) I think we want all the magic in core mm, i.e. migration and
> overcommit. At least that seems to be the very strong drive in all
> general-purpose gpu abstractions and implementations, where memory is
> allocated with malloc, and then mapped/moved into vram/gpu address
> space through some magic,
It is possible that there is other way around: memory is requested to be
allocated and should be kept in vram for  performance reason but due
to possible overcommit case we need at least temporally to "move" such
allocation to system memory.
>   but still visible on both the cpu and gpu
> side in some form. Special device to allocate memory, and not being
> able to migrate stuff around sound like misfeatures from that pov.
> -Daniel

^ permalink raw reply

* Re: Enabling peer to peer device transactions for PCIe devices
From: Dan Williams @ 2016-11-22 20:24 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Serguei Sagalovitch, Dave Hansen, linux-nvdimm@lists.01.org,
	linux-rdma@vger.kernel.org, linux-pci@vger.kernel.org,
	Kuehling, Felix, linux-kernel@vger.kernel.org,
	dri-devel@lists.freedesktop.org, Koenig, Christian, Sander, Ben,
	Suthikulpanit, Suravee, Deucher, Alexander, Blinzer, Paul,
	Linux-media@vger.kernel.org
In-Reply-To: <CAKMK7uGoXAYoazyGLbGU7svVD10WmaBtpko8BpHeNpRhST8F7g@mail.gmail.com>

On Tue, Nov 22, 2016 at 12:10 PM, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Tue, Nov 22, 2016 at 9:01 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>> On Tue, Nov 22, 2016 at 10:59 AM, Serguei Sagalovitch
>> <serguei.sagalovitch@amd.com> wrote:
>>> I personally like "device-DAX" idea but my concerns are:
>>>
>>> -  How well it will co-exists with the  DRM infrastructure / implementations
>>>    in part dealing with CPU pointers?
>>
>> Inside the kernel a device-DAX range is "just memory" in the sense
>> that you can perform pfn_to_page() on it and issue I/O, but the vma is
>> not migratable. To be honest I do not know how well that co-exists
>> with drm infrastructure.
>>
>>> -  How well we will be able to handle case when we need to "move"/"evict"
>>>    memory/data to the new location so CPU pointer should point to the new
>>> physical location/address
>>>     (and may be not in PCI device memory at all)?
>>
>> So, device-DAX deliberately avoids support for in-kernel migration or
>> overcommit. Those cases are left to the core mm or drm. The device-dax
>> interface is for cases where all that is needed is a direct-mapping to
>> a statically-allocated physical-address range be it persistent memory
>> or some other special reserved memory range.
>
> For some of the fancy use-cases (e.g. to be comparable to what HMM can
> pull off) I think we want all the magic in core mm, i.e. migration and
> overcommit. At least that seems to be the very strong drive in all
> general-purpose gpu abstractions and implementations, where memory is
> allocated with malloc, and then mapped/moved into vram/gpu address
> space through some magic, but still visible on both the cpu and gpu
> side in some form. Special device to allocate memory, and not being
> able to migrate stuff around sound like misfeatures from that pov.

Agreed. For general purpose P2P use cases where all you want is
direct-I/O to a memory range that happens to be on a PCIe device then
I think a special device fits the bill. For gpu P2P use cases that
already have migration/overcommit expectations then it is not a good
fit.

^ permalink raw reply

* Re: Enabling peer to peer device transactions for PCIe devices
From: Daniel Vetter @ 2016-11-22 20:10 UTC (permalink / raw)
  To: Dan Williams
  Cc: Dave Hansen, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Kuehling, Felix, Serguei Sagalovitch, Blinzer, Paul,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org,
	Sander, Ben, Suthikulpanit, Suravee, Deucher, Alexander,
	Koenig, Christian,
	Linux-media-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <CAPcyv4htu4gayz_Dpe0pnfLN4v_Kcy-fTx3B-HEfadCHvzJnhA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Tue, Nov 22, 2016 at 9:01 PM, Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> On Tue, Nov 22, 2016 at 10:59 AM, Serguei Sagalovitch
> <serguei.sagalovitch-5C7GfCeVMHo@public.gmane.org> wrote:
>> I personally like "device-DAX" idea but my concerns are:
>>
>> -  How well it will co-exists with the  DRM infrastructure / implementations
>>    in part dealing with CPU pointers?
>
> Inside the kernel a device-DAX range is "just memory" in the sense
> that you can perform pfn_to_page() on it and issue I/O, but the vma is
> not migratable. To be honest I do not know how well that co-exists
> with drm infrastructure.
>
>> -  How well we will be able to handle case when we need to "move"/"evict"
>>    memory/data to the new location so CPU pointer should point to the new
>> physical location/address
>>     (and may be not in PCI device memory at all)?
>
> So, device-DAX deliberately avoids support for in-kernel migration or
> overcommit. Those cases are left to the core mm or drm. The device-dax
> interface is for cases where all that is needed is a direct-mapping to
> a statically-allocated physical-address range be it persistent memory
> or some other special reserved memory range.

For some of the fancy use-cases (e.g. to be comparable to what HMM can
pull off) I think we want all the magic in core mm, i.e. migration and
overcommit. At least that seems to be the very strong drive in all
general-purpose gpu abstractions and implementations, where memory is
allocated with malloc, and then mapped/moved into vram/gpu address
space through some magic, but still visible on both the cpu and gpu
side in some form. Special device to allocate memory, and not being
able to migrate stuff around sound like misfeatures from that pov.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply

* Re: Enabling peer to peer device transactions for PCIe devices
From: Dan Williams @ 2016-11-22 20:01 UTC (permalink / raw)
  To: Serguei Sagalovitch
  Cc: Deucher, Alexander, linux-kernel@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-nvdimm@lists.01.org,
	Linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linux-pci@vger.kernel.org, Bridgman, John, Kuehling, Felix,
	Blinzer, Paul, Koenig, Christian, Suthikulpanit, Suravee,
	Sander, Ben, Dave Hansen
In-Reply-To: <75a1f44f-c495-7d1e-7e1c-17e89555edba@amd.com>

On Tue, Nov 22, 2016 at 10:59 AM, Serguei Sagalovitch
<serguei.sagalovitch@amd.com> wrote:
> Dan,
>
> I personally like "device-DAX" idea but my concerns are:
>
> -  How well it will co-exists with the  DRM infrastructure / implementations
>    in part dealing with CPU pointers?

Inside the kernel a device-DAX range is "just memory" in the sense
that you can perform pfn_to_page() on it and issue I/O, but the vma is
not migratable. To be honest I do not know how well that co-exists
with drm infrastructure.

> -  How well we will be able to handle case when we need to "move"/"evict"
>    memory/data to the new location so CPU pointer should point to the new
> physical location/address
>     (and may be not in PCI device memory at all)?

So, device-DAX deliberately avoids support for in-kernel migration or
overcommit. Those cases are left to the core mm or drm. The device-dax
interface is for cases where all that is needed is a direct-mapping to
a statically-allocated physical-address range be it persistent memory
or some other special reserved memory range.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox