Review Request

All of lore.kernel.org
 help / color / mirror / Atom feed

* Review Request
@ 2020-06-08 14:46 Divya Indi
  2020-06-08 14:46 ` [PATCH v3] IB/sa: Resolving use-after-free in ib_nl_send_msg Divya Indi
  2020-06-09  7:03 ` Review Request Leon Romanovsky
  0 siblings, 2 replies; 30+ messages in thread
From: Divya Indi @ 2020-06-08 14:46 UTC (permalink / raw)
  To: linux-kernel, linux-rdma, Jason Gunthorpe, Kaike Wan
  Cc: Gerd Rausch, Håkon Bugge, Srinivas Eeda, Rama Nichanamatlu,
	Doug Ledford

[PATCH v3] IB/sa: Resolving use-after-free in ib_nl_send_msg

Hi,

Please review the patch that follows. 

v3 addresses the previously raised concerns. Changes include -

1. To resolve the race where the timer can kick in before request
has been sent out, we now add the request to the list after sending out
the request.

2. To handle the race where the response can come in before we got a 
chance to add the req to the list, sending and adding the request to
request list is done under spinlock - request_lock.

3. To make sure there is no blocking op/delay while holding the spinlock,
using GFP_NOWAIT for memory allocation.

Thanks Jason for providing your valuable feedback. 

Let me know if you have any suggestions or concerns.

Thanks,
Divya

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v3] IB/sa: Resolving use-after-free in ib_nl_send_msg
  2020-06-08 14:46 Review Request Divya Indi
@ 2020-06-08 14:46 ` Divya Indi
  2020-06-09  7:00   ` Leon Romanovsky
  2020-06-09  7:03 ` Review Request Leon Romanovsky
  1 sibling, 1 reply; 30+ messages in thread
From: Divya Indi @ 2020-06-08 14:46 UTC (permalink / raw)
  To: linux-kernel, linux-rdma, Jason Gunthorpe, Kaike Wan
  Cc: Gerd Rausch, Håkon Bugge, Srinivas Eeda, Rama Nichanamatlu,
	Doug Ledford, Divya Indi

Commit 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list before sending")'
-
1. Adds the query to the request list before ib_nl_snd_msg.
2. Removes ib_nl_send_msg from within the spinlock which also makes it
possible to allocate memory with GFP_KERNEL.

However, if there is a delay in sending out the request (For
eg: Delay due to low memory situation) the timer to handle request timeout
might kick in before the request is sent out to ibacm via netlink.
ib_nl_request_timeout may release the query causing a use after free situation
while accessing the query in ib_nl_send_msg.

Call Trace for the above race:

[<ffffffffa02f43cb>] ? ib_pack+0x17b/0x240 [ib_core]
[<ffffffffa032aef1>] ib_sa_path_rec_get+0x181/0x200 [ib_sa]
[<ffffffffa0379db0>] rdma_resolve_route+0x3c0/0x8d0 [rdma_cm]
[<ffffffffa0374450>] ? cma_bind_port+0xa0/0xa0 [rdma_cm]
[<ffffffffa040f850>] ? rds_rdma_cm_event_handler_cmn+0x850/0x850
[rds_rdma]
[<ffffffffa040f22c>] rds_rdma_cm_event_handler_cmn+0x22c/0x850
[rds_rdma]
[<ffffffffa040f860>] rds_rdma_cm_event_handler+0x10/0x20 [rds_rdma]
[<ffffffffa037778e>] addr_handler+0x9e/0x140 [rdma_cm]
[<ffffffffa026cdb4>] process_req+0x134/0x190 [ib_addr]
[<ffffffff810a02f9>] process_one_work+0x169/0x4a0
[<ffffffff810a0b2b>] worker_thread+0x5b/0x560
[<ffffffff810a0ad0>] ? flush_delayed_work+0x50/0x50
[<ffffffff810a68fb>] kthread+0xcb/0xf0
[<ffffffff816ec49a>] ? __schedule+0x24a/0x810
[<ffffffff816ec49a>] ? __schedule+0x24a/0x810
[<ffffffff810a6830>] ? kthread_create_on_node+0x180/0x180
[<ffffffff816f25a7>] ret_from_fork+0x47/0x90
[<ffffffff810a6830>] ? kthread_create_on_node+0x180/0x180
....
RIP  [<ffffffffa03296cd>] send_mad+0x33d/0x5d0 [ib_sa]

To resolve the above issue -
1. Add the req to the request list only after the request has been sent out.
2. To handle the race where response comes in before adding request to
the request list, send(rdma_nl_multicast) and add to list while holding the
spinlock - request_lock.
3. Use GFP_NOWAIT for rdma_nl_multicast since it is called while holding
a spinlock. In case of memory allocation failure, request will go out to SA.

Signed-off-by: Divya Indi <divya.indi@oracle.com>
Fixes: 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list
before sending")
---
 drivers/infiniband/core/sa_query.c | 34 +++++++++++++++++-----------------
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index 74e0058..042c99b 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -836,6 +836,9 @@ static int ib_nl_send_msg(struct ib_sa_query *query, gfp_t gfp_mask)
 	void *data;
 	struct ib_sa_mad *mad;
 	int len;
+	unsigned long flags;
+	unsigned long delay;
+	int ret;
 
 	mad = query->mad_buf->mad;
 	len = ib_nl_get_path_rec_attrs_len(mad->sa_hdr.comp_mask);
@@ -860,35 +863,32 @@ static int ib_nl_send_msg(struct ib_sa_query *query, gfp_t gfp_mask)
 	/* Repair the nlmsg header length */
 	nlmsg_end(skb, nlh);
 
-	return rdma_nl_multicast(&init_net, skb, RDMA_NL_GROUP_LS, gfp_mask);
+	spin_lock_irqsave(&ib_nl_request_lock, flags);
+	ret =  rdma_nl_multicast(&init_net, skb, RDMA_NL_GROUP_LS, GFP_NOWAIT);
+	if (!ret) {
+		/* Put the request on the list.*/
+		delay = msecs_to_jiffies(sa_local_svc_timeout_ms);
+		query->timeout = delay + jiffies;
+		list_add_tail(&query->list, &ib_nl_request_list);
+		/* Start the timeout if this is the only request */
+		if (ib_nl_request_list.next == &query->list)
+			queue_delayed_work(ib_nl_wq, &ib_nl_timed_work, delay);
+	}
+	spin_unlock_irqrestore(&ib_nl_request_lock, flags);
+
+	return ret;
 }
 
 static int ib_nl_make_request(struct ib_sa_query *query, gfp_t gfp_mask)
 {
-	unsigned long flags;
-	unsigned long delay;
 	int ret;
 
 	INIT_LIST_HEAD(&query->list);
 	query->seq = (u32)atomic_inc_return(&ib_nl_sa_request_seq);
 
-	/* Put the request on the list first.*/
-	spin_lock_irqsave(&ib_nl_request_lock, flags);
-	delay = msecs_to_jiffies(sa_local_svc_timeout_ms);
-	query->timeout = delay + jiffies;
-	list_add_tail(&query->list, &ib_nl_request_list);
-	/* Start the timeout if this is the only request */
-	if (ib_nl_request_list.next == &query->list)
-		queue_delayed_work(ib_nl_wq, &ib_nl_timed_work, delay);
-	spin_unlock_irqrestore(&ib_nl_request_lock, flags);
-
 	ret = ib_nl_send_msg(query, gfp_mask);
 	if (ret) {
 		ret = -EIO;
-		/* Remove the request */
-		spin_lock_irqsave(&ib_nl_request_lock, flags);
-		list_del(&query->list);
-		spin_unlock_irqrestore(&ib_nl_request_lock, flags);
 	}
 
 	return ret;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] IB/sa: Resolving use-after-free in ib_nl_send_msg
  2020-06-08 14:46 ` [PATCH v3] IB/sa: Resolving use-after-free in ib_nl_send_msg Divya Indi
@ 2020-06-09  7:00   ` Leon Romanovsky
  2020-06-09 14:45     ` Divya Indi
  0 siblings, 1 reply; 30+ messages in thread
From: Leon Romanovsky @ 2020-06-09  7:00 UTC (permalink / raw)
  To: Divya Indi
  Cc: linux-kernel, linux-rdma, Jason Gunthorpe, Kaike Wan, Gerd Rausch,
	Håkon Bugge, Srinivas Eeda, Rama Nichanamatlu, Doug Ledford

On Mon, Jun 08, 2020 at 07:46:16AM -0700, Divya Indi wrote:
> Commit 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list before sending")'
> -
> 1. Adds the query to the request list before ib_nl_snd_msg.
> 2. Removes ib_nl_send_msg from within the spinlock which also makes it
> possible to allocate memory with GFP_KERNEL.
>
> However, if there is a delay in sending out the request (For
> eg: Delay due to low memory situation) the timer to handle request timeout
> might kick in before the request is sent out to ibacm via netlink.
> ib_nl_request_timeout may release the query causing a use after free situation
> while accessing the query in ib_nl_send_msg.
>
> Call Trace for the above race:
>
> [<ffffffffa02f43cb>] ? ib_pack+0x17b/0x240 [ib_core]
> [<ffffffffa032aef1>] ib_sa_path_rec_get+0x181/0x200 [ib_sa]
> [<ffffffffa0379db0>] rdma_resolve_route+0x3c0/0x8d0 [rdma_cm]
> [<ffffffffa0374450>] ? cma_bind_port+0xa0/0xa0 [rdma_cm]
> [<ffffffffa040f850>] ? rds_rdma_cm_event_handler_cmn+0x850/0x850
> [rds_rdma]
> [<ffffffffa040f22c>] rds_rdma_cm_event_handler_cmn+0x22c/0x850
> [rds_rdma]
> [<ffffffffa040f860>] rds_rdma_cm_event_handler+0x10/0x20 [rds_rdma]
> [<ffffffffa037778e>] addr_handler+0x9e/0x140 [rdma_cm]
> [<ffffffffa026cdb4>] process_req+0x134/0x190 [ib_addr]
> [<ffffffff810a02f9>] process_one_work+0x169/0x4a0
> [<ffffffff810a0b2b>] worker_thread+0x5b/0x560
> [<ffffffff810a0ad0>] ? flush_delayed_work+0x50/0x50
> [<ffffffff810a68fb>] kthread+0xcb/0xf0
> [<ffffffff816ec49a>] ? __schedule+0x24a/0x810
> [<ffffffff816ec49a>] ? __schedule+0x24a/0x810
> [<ffffffff810a6830>] ? kthread_create_on_node+0x180/0x180
> [<ffffffff816f25a7>] ret_from_fork+0x47/0x90
> [<ffffffff810a6830>] ? kthread_create_on_node+0x180/0x180
> ....
> RIP  [<ffffffffa03296cd>] send_mad+0x33d/0x5d0 [ib_sa]
>
> To resolve the above issue -
> 1. Add the req to the request list only after the request has been sent out.
> 2. To handle the race where response comes in before adding request to
> the request list, send(rdma_nl_multicast) and add to list while holding the
> spinlock - request_lock.
> 3. Use GFP_NOWAIT for rdma_nl_multicast since it is called while holding
> a spinlock. In case of memory allocation failure, request will go out to SA.
>
> Signed-off-by: Divya Indi <divya.indi@oracle.com>
> Fixes: 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list
> before sending")

Author SOB should be after "Fixes" line.

> ---
>  drivers/infiniband/core/sa_query.c | 34 +++++++++++++++++-----------------
>  1 file changed, 17 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
> index 74e0058..042c99b 100644
> --- a/drivers/infiniband/core/sa_query.c
> +++ b/drivers/infiniband/core/sa_query.c
> @@ -836,6 +836,9 @@ static int ib_nl_send_msg(struct ib_sa_query *query, gfp_t gfp_mask)
>  	void *data;
>  	struct ib_sa_mad *mad;
>  	int len;
> +	unsigned long flags;
> +	unsigned long delay;
> +	int ret;
>
>  	mad = query->mad_buf->mad;
>  	len = ib_nl_get_path_rec_attrs_len(mad->sa_hdr.comp_mask);
> @@ -860,35 +863,32 @@ static int ib_nl_send_msg(struct ib_sa_query *query, gfp_t gfp_mask)
>  	/* Repair the nlmsg header length */
>  	nlmsg_end(skb, nlh);
>
> -	return rdma_nl_multicast(&init_net, skb, RDMA_NL_GROUP_LS, gfp_mask);
> +	spin_lock_irqsave(&ib_nl_request_lock, flags);
> +	ret =  rdma_nl_multicast(&init_net, skb, RDMA_NL_GROUP_LS, GFP_NOWAIT);

It is hard to be convinced that this is correct solution. The mix of
gfp_flags and GFP_NOWAIT at the same time and usage of
ib_nl_request_lock to protect lists and suddenly rdma_nl_multicast() too
makes this code unreadable/non-maintainable.

> +	if (!ret) {

Please use kernel coding style.

if (ret) {
  spin_unlock_irqrestore(&ib_nl_request_lock, flags);
  return ret;
  }

 ....

> +		/* Put the request on the list.*/
> +		delay = msecs_to_jiffies(sa_local_svc_timeout_ms);
> +		query->timeout = delay + jiffies;
> +		list_add_tail(&query->list, &ib_nl_request_list);
> +		/* Start the timeout if this is the only request */
> +		if (ib_nl_request_list.next == &query->list)
> +			queue_delayed_work(ib_nl_wq, &ib_nl_timed_work, delay);
> +	}
> +	spin_unlock_irqrestore(&ib_nl_request_lock, flags);
> +
> +	return ret;
>  }
>
>  static int ib_nl_make_request(struct ib_sa_query *query, gfp_t gfp_mask)
>  {
> -	unsigned long flags;
> -	unsigned long delay;
>  	int ret;
>
>  	INIT_LIST_HEAD(&query->list);
>  	query->seq = (u32)atomic_inc_return(&ib_nl_sa_request_seq);
>
> -	/* Put the request on the list first.*/
> -	spin_lock_irqsave(&ib_nl_request_lock, flags);
> -	delay = msecs_to_jiffies(sa_local_svc_timeout_ms);
> -	query->timeout = delay + jiffies;
> -	list_add_tail(&query->list, &ib_nl_request_list);
> -	/* Start the timeout if this is the only request */
> -	if (ib_nl_request_list.next == &query->list)
> -		queue_delayed_work(ib_nl_wq, &ib_nl_timed_work, delay);
> -	spin_unlock_irqrestore(&ib_nl_request_lock, flags);
> -
>  	ret = ib_nl_send_msg(query, gfp_mask);
>  	if (ret) {
>  		ret = -EIO;
> -		/* Remove the request */
> -		spin_lock_irqsave(&ib_nl_request_lock, flags);
> -		list_del(&query->list);
> -		spin_unlock_irqrestore(&ib_nl_request_lock, flags);
>  	}

Brackets should be removed too.
>
>  	return ret;
> --
> 1.8.3.1
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] IB/sa: Resolving use-after-free in ib_nl_send_msg
  2020-06-09  7:00   ` Leon Romanovsky
@ 2020-06-09 14:45     ` Divya Indi
  2020-06-14  6:41       ` Leon Romanovsky
  0 siblings, 1 reply; 30+ messages in thread
From: Divya Indi @ 2020-06-09 14:45 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: linux-kernel, linux-rdma, Jason Gunthorpe, Kaike Wan, Gerd Rausch,
	Håkon Bugge, Srinivas Eeda, Rama Nichanamatlu, Doug Ledford

Hi Leon,

Thanks for taking the time to review.

Please find my comments inline -

On 6/9/20 12:00 AM, Leon Romanovsky wrote:
> On Mon, Jun 08, 2020 at 07:46:16AM -0700, Divya Indi wrote:
>> Commit 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list before sending")'
>> -
>> 1. Adds the query to the request list before ib_nl_snd_msg.
>> 2. Removes ib_nl_send_msg from within the spinlock which also makes it
>> possible to allocate memory with GFP_KERNEL.
>>
>> However, if there is a delay in sending out the request (For
>> eg: Delay due to low memory situation) the timer to handle request timeout
>> might kick in before the request is sent out to ibacm via netlink.
>> ib_nl_request_timeout may release the query causing a use after free situation
>> while accessing the query in ib_nl_send_msg.
>>
>> Call Trace for the above race:
>>
>> [<ffffffffa02f43cb>] ? ib_pack+0x17b/0x240 [ib_core]
>> [<ffffffffa032aef1>] ib_sa_path_rec_get+0x181/0x200 [ib_sa]
>> [<ffffffffa0379db0>] rdma_resolve_route+0x3c0/0x8d0 [rdma_cm]
>> [<ffffffffa0374450>] ? cma_bind_port+0xa0/0xa0 [rdma_cm]
>> [<ffffffffa040f850>] ? rds_rdma_cm_event_handler_cmn+0x850/0x850
>> [rds_rdma]
>> [<ffffffffa040f22c>] rds_rdma_cm_event_handler_cmn+0x22c/0x850
>> [rds_rdma]
>> [<ffffffffa040f860>] rds_rdma_cm_event_handler+0x10/0x20 [rds_rdma]
>> [<ffffffffa037778e>] addr_handler+0x9e/0x140 [rdma_cm]
>> [<ffffffffa026cdb4>] process_req+0x134/0x190 [ib_addr]
>> [<ffffffff810a02f9>] process_one_work+0x169/0x4a0
>> [<ffffffff810a0b2b>] worker_thread+0x5b/0x560
>> [<ffffffff810a0ad0>] ? flush_delayed_work+0x50/0x50
>> [<ffffffff810a68fb>] kthread+0xcb/0xf0
>> [<ffffffff816ec49a>] ? __schedule+0x24a/0x810
>> [<ffffffff816ec49a>] ? __schedule+0x24a/0x810
>> [<ffffffff810a6830>] ? kthread_create_on_node+0x180/0x180
>> [<ffffffff816f25a7>] ret_from_fork+0x47/0x90
>> [<ffffffff810a6830>] ? kthread_create_on_node+0x180/0x180
>> ....
>> RIP  [<ffffffffa03296cd>] send_mad+0x33d/0x5d0 [ib_sa]
>>
>> To resolve the above issue -
>> 1. Add the req to the request list only after the request has been sent out.
>> 2. To handle the race where response comes in before adding request to
>> the request list, send(rdma_nl_multicast) and add to list while holding the
>> spinlock - request_lock.
>> 3. Use GFP_NOWAIT for rdma_nl_multicast since it is called while holding
>> a spinlock. In case of memory allocation failure, request will go out to SA.
>>
>> Signed-off-by: Divya Indi <divya.indi@oracle.com>
>> Fixes: 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list
>> before sending")
> Author SOB should be after "Fixes" line.

My bad. Noted.

>
>> ---
>>  drivers/infiniband/core/sa_query.c | 34 +++++++++++++++++-----------------
>>  1 file changed, 17 insertions(+), 17 deletions(-)
>>
>> diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
>> index 74e0058..042c99b 100644
>> --- a/drivers/infiniband/core/sa_query.c
>> +++ b/drivers/infiniband/core/sa_query.c
>> @@ -836,6 +836,9 @@ static int ib_nl_send_msg(struct ib_sa_query *query, gfp_t gfp_mask)
>>  	void *data;
>>  	struct ib_sa_mad *mad;
>>  	int len;
>> +	unsigned long flags;
>> +	unsigned long delay;
>> +	int ret;
>>
>>  	mad = query->mad_buf->mad;
>>  	len = ib_nl_get_path_rec_attrs_len(mad->sa_hdr.comp_mask);
>> @@ -860,35 +863,32 @@ static int ib_nl_send_msg(struct ib_sa_query *query, gfp_t gfp_mask)
>>  	/* Repair the nlmsg header length */
>>  	nlmsg_end(skb, nlh);
>>
>> -	return rdma_nl_multicast(&init_net, skb, RDMA_NL_GROUP_LS, gfp_mask);
>> +	spin_lock_irqsave(&ib_nl_request_lock, flags);
>> +	ret =  rdma_nl_multicast(&init_net, skb, RDMA_NL_GROUP_LS, GFP_NOWAIT);
> It is hard to be convinced that this is correct solution. The mix of
> gfp_flags and GFP_NOWAIT at the same time and usage of
> ib_nl_request_lock to protect lists and suddenly rdma_nl_multicast() too
> makes this code unreadable/non-maintainable.

Prior to 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list
before sending"), we had ib_nl_send_msg under the spinlock ib_nl_request_lock.

ie we had -

1. Get spinlock - ib_nl_request_lock 
2. ib_nl_send_msg
	2.a) rdma_nl_multicast
3. Add request to the req list
4. Arm the timer if needed.
5. Release spinlock

However, ib_nl_send_msg involved a memory allocation using GFP_KERNEL.
hence, was moved out of the spinlock. In addition, req was now being 
added prior to ib_nl_send_msg [To handle the race where response can
come in before we get a chance to add the request back to the list].

This introduced another race resulting in use-after-free.[Described in the commit.]

To resolve this, sending out the request and adding it to list need to 
happen while holding the request_lock.  
To ensure minimum allocations while holding the lock, instead of having
the entire ib_nl_send_msg under the lock, we only have rdma_nl_multicast
under this spinlock.

However, do you think it would be a good idea to split ib_nl_send_msg
into 2 functions -
1. Prepare the req/query [Outside the spinlock]
2. Sending the req - rdma_nl_multicast [while holding spinlock]

Would this be more intuitive? 

>> +	if (!ret) {
> Please use kernel coding style.
>
> if (ret) {
>   spin_unlock_irqrestore(&ib_nl_request_lock, flags);
>   return ret;
>   }
>
>  ....

Noted. Will make this change.

>
>> +		/* Put the request on the list.*/
>> +		delay = msecs_to_jiffies(sa_local_svc_timeout_ms);
>> +		query->timeout = delay + jiffies;
>> +		list_add_tail(&query->list, &ib_nl_request_list);
>> +		/* Start the timeout if this is the only request */
>> +		if (ib_nl_request_list.next == &query->list)
>> +			queue_delayed_work(ib_nl_wq, &ib_nl_timed_work, delay);
>> +	}
>> +	spin_unlock_irqrestore(&ib_nl_request_lock, flags);
>> +
>> +	return ret;
>>  }
>>
>>  static int ib_nl_make_request(struct ib_sa_query *query, gfp_t gfp_mask)
>>  {
>> -	unsigned long flags;
>> -	unsigned long delay;
>>  	int ret;
>>
>>  	INIT_LIST_HEAD(&query->list);
>>  	query->seq = (u32)atomic_inc_return(&ib_nl_sa_request_seq);
>>
>> -	/* Put the request on the list first.*/
>> -	spin_lock_irqsave(&ib_nl_request_lock, flags);
>> -	delay = msecs_to_jiffies(sa_local_svc_timeout_ms);
>> -	query->timeout = delay + jiffies;
>> -	list_add_tail(&query->list, &ib_nl_request_list);
>> -	/* Start the timeout if this is the only request */
>> -	if (ib_nl_request_list.next == &query->list)
>> -		queue_delayed_work(ib_nl_wq, &ib_nl_timed_work, delay);
>> -	spin_unlock_irqrestore(&ib_nl_request_lock, flags);
>> -
>>  	ret = ib_nl_send_msg(query, gfp_mask);
>>  	if (ret) {
>>  		ret = -EIO;
>> -		/* Remove the request */
>> -		spin_lock_irqsave(&ib_nl_request_lock, flags);
>> -		list_del(&query->list);
>> -		spin_unlock_irqrestore(&ib_nl_request_lock, flags);
>>  	}
> Brackets should be removed too.
Noted.
>>  	return ret;
>> --
>> 1.8.3.1
>>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] IB/sa: Resolving use-after-free in ib_nl_send_msg
  2020-06-09 14:45     ` Divya Indi
@ 2020-06-14  6:41       ` Leon Romanovsky
  2020-06-16 17:56         ` Divya Indi
  0 siblings, 1 reply; 30+ messages in thread
From: Leon Romanovsky @ 2020-06-14  6:41 UTC (permalink / raw)
  To: Divya Indi
  Cc: linux-kernel, linux-rdma, Jason Gunthorpe, Kaike Wan, Gerd Rausch,
	Håkon Bugge, Srinivas Eeda, Rama Nichanamatlu, Doug Ledford

On Tue, Jun 09, 2020 at 07:45:21AM -0700, Divya Indi wrote:
> Hi Leon,
>
> Thanks for taking the time to review.
>
> Please find my comments inline -
>
> On 6/9/20 12:00 AM, Leon Romanovsky wrote:
> > On Mon, Jun 08, 2020 at 07:46:16AM -0700, Divya Indi wrote:
> >> Commit 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list before sending")'
> >> -
> >> 1. Adds the query to the request list before ib_nl_snd_msg.
> >> 2. Removes ib_nl_send_msg from within the spinlock which also makes it
> >> possible to allocate memory with GFP_KERNEL.
> >>
> >> However, if there is a delay in sending out the request (For
> >> eg: Delay due to low memory situation) the timer to handle request timeout
> >> might kick in before the request is sent out to ibacm via netlink.
> >> ib_nl_request_timeout may release the query causing a use after free situation
> >> while accessing the query in ib_nl_send_msg.
> >>
> >> Call Trace for the above race:
> >>
> >> [<ffffffffa02f43cb>] ? ib_pack+0x17b/0x240 [ib_core]
> >> [<ffffffffa032aef1>] ib_sa_path_rec_get+0x181/0x200 [ib_sa]
> >> [<ffffffffa0379db0>] rdma_resolve_route+0x3c0/0x8d0 [rdma_cm]
> >> [<ffffffffa0374450>] ? cma_bind_port+0xa0/0xa0 [rdma_cm]
> >> [<ffffffffa040f850>] ? rds_rdma_cm_event_handler_cmn+0x850/0x850
> >> [rds_rdma]
> >> [<ffffffffa040f22c>] rds_rdma_cm_event_handler_cmn+0x22c/0x850
> >> [rds_rdma]
> >> [<ffffffffa040f860>] rds_rdma_cm_event_handler+0x10/0x20 [rds_rdma]
> >> [<ffffffffa037778e>] addr_handler+0x9e/0x140 [rdma_cm]
> >> [<ffffffffa026cdb4>] process_req+0x134/0x190 [ib_addr]
> >> [<ffffffff810a02f9>] process_one_work+0x169/0x4a0
> >> [<ffffffff810a0b2b>] worker_thread+0x5b/0x560
> >> [<ffffffff810a0ad0>] ? flush_delayed_work+0x50/0x50
> >> [<ffffffff810a68fb>] kthread+0xcb/0xf0
> >> [<ffffffff816ec49a>] ? __schedule+0x24a/0x810
> >> [<ffffffff816ec49a>] ? __schedule+0x24a/0x810
> >> [<ffffffff810a6830>] ? kthread_create_on_node+0x180/0x180
> >> [<ffffffff816f25a7>] ret_from_fork+0x47/0x90
> >> [<ffffffff810a6830>] ? kthread_create_on_node+0x180/0x180
> >> ....
> >> RIP  [<ffffffffa03296cd>] send_mad+0x33d/0x5d0 [ib_sa]
> >>
> >> To resolve the above issue -
> >> 1. Add the req to the request list only after the request has been sent out.
> >> 2. To handle the race where response comes in before adding request to
> >> the request list, send(rdma_nl_multicast) and add to list while holding the
> >> spinlock - request_lock.
> >> 3. Use GFP_NOWAIT for rdma_nl_multicast since it is called while holding
> >> a spinlock. In case of memory allocation failure, request will go out to SA.
> >>
> >> Signed-off-by: Divya Indi <divya.indi@oracle.com>
> >> Fixes: 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list
> >> before sending")
> > Author SOB should be after "Fixes" line.
>
> My bad. Noted.
>
> >
> >> ---
> >>  drivers/infiniband/core/sa_query.c | 34 +++++++++++++++++-----------------
> >>  1 file changed, 17 insertions(+), 17 deletions(-)
> >>
> >> diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
> >> index 74e0058..042c99b 100644
> >> --- a/drivers/infiniband/core/sa_query.c
> >> +++ b/drivers/infiniband/core/sa_query.c
> >> @@ -836,6 +836,9 @@ static int ib_nl_send_msg(struct ib_sa_query *query, gfp_t gfp_mask)
> >>  	void *data;
> >>  	struct ib_sa_mad *mad;
> >>  	int len;
> >> +	unsigned long flags;
> >> +	unsigned long delay;
> >> +	int ret;
> >>
> >>  	mad = query->mad_buf->mad;
> >>  	len = ib_nl_get_path_rec_attrs_len(mad->sa_hdr.comp_mask);
> >> @@ -860,35 +863,32 @@ static int ib_nl_send_msg(struct ib_sa_query *query, gfp_t gfp_mask)
> >>  	/* Repair the nlmsg header length */
> >>  	nlmsg_end(skb, nlh);
> >>
> >> -	return rdma_nl_multicast(&init_net, skb, RDMA_NL_GROUP_LS, gfp_mask);
> >> +	spin_lock_irqsave(&ib_nl_request_lock, flags);
> >> +	ret =  rdma_nl_multicast(&init_net, skb, RDMA_NL_GROUP_LS, GFP_NOWAIT);
> > It is hard to be convinced that this is correct solution. The mix of
> > gfp_flags and GFP_NOWAIT at the same time and usage of
> > ib_nl_request_lock to protect lists and suddenly rdma_nl_multicast() too
> > makes this code unreadable/non-maintainable.
>
> Prior to 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list
> before sending"), we had ib_nl_send_msg under the spinlock ib_nl_request_lock.
>
> ie we had -
>
> 1. Get spinlock - ib_nl_request_lock
> 2. ib_nl_send_msg
> 	2.a) rdma_nl_multicast
> 3. Add request to the req list
> 4. Arm the timer if needed.
> 5. Release spinlock
>
> However, ib_nl_send_msg involved a memory allocation using GFP_KERNEL.
> hence, was moved out of the spinlock. In addition, req was now being
> added prior to ib_nl_send_msg [To handle the race where response can
> come in before we get a chance to add the request back to the list].
>
> This introduced another race resulting in use-after-free.[Described in the commit.]
>
> To resolve this, sending out the request and adding it to list need to
> happen while holding the request_lock.
> To ensure minimum allocations while holding the lock, instead of having
> the entire ib_nl_send_msg under the lock, we only have rdma_nl_multicast
> under this spinlock.
>
> However, do you think it would be a good idea to split ib_nl_send_msg
> into 2 functions -
> 1. Prepare the req/query [Outside the spinlock]
> 2. Sending the req - rdma_nl_multicast [while holding spinlock]
>
> Would this be more intuitive?

While it is always good idea to minimize the locked period. It still
doesn't answer concern about mixing gfp_flags and direct GFP_NOWAIT.
For example if user provides GFP_ATOMIC, the GFP_NOWAIT allocation will
cause a trouble because latter is more lax than first one.

Thanks

>
> >> +	if (!ret) {
> > Please use kernel coding style.
> >
> > if (ret) {
> >   spin_unlock_irqrestore(&ib_nl_request_lock, flags);
> >   return ret;
> >   }
> >
> >  ....
>
> Noted. Will make this change.
>
> >
> >> +		/* Put the request on the list.*/
> >> +		delay = msecs_to_jiffies(sa_local_svc_timeout_ms);
> >> +		query->timeout = delay + jiffies;
> >> +		list_add_tail(&query->list, &ib_nl_request_list);
> >> +		/* Start the timeout if this is the only request */
> >> +		if (ib_nl_request_list.next == &query->list)
> >> +			queue_delayed_work(ib_nl_wq, &ib_nl_timed_work, delay);
> >> +	}
> >> +	spin_unlock_irqrestore(&ib_nl_request_lock, flags);
> >> +
> >> +	return ret;
> >>  }
> >>
> >>  static int ib_nl_make_request(struct ib_sa_query *query, gfp_t gfp_mask)
> >>  {
> >> -	unsigned long flags;
> >> -	unsigned long delay;
> >>  	int ret;
> >>
> >>  	INIT_LIST_HEAD(&query->list);
> >>  	query->seq = (u32)atomic_inc_return(&ib_nl_sa_request_seq);
> >>
> >> -	/* Put the request on the list first.*/
> >> -	spin_lock_irqsave(&ib_nl_request_lock, flags);
> >> -	delay = msecs_to_jiffies(sa_local_svc_timeout_ms);
> >> -	query->timeout = delay + jiffies;
> >> -	list_add_tail(&query->list, &ib_nl_request_list);
> >> -	/* Start the timeout if this is the only request */
> >> -	if (ib_nl_request_list.next == &query->list)
> >> -		queue_delayed_work(ib_nl_wq, &ib_nl_timed_work, delay);
> >> -	spin_unlock_irqrestore(&ib_nl_request_lock, flags);
> >> -
> >>  	ret = ib_nl_send_msg(query, gfp_mask);
> >>  	if (ret) {
> >>  		ret = -EIO;
> >> -		/* Remove the request */
> >> -		spin_lock_irqsave(&ib_nl_request_lock, flags);
> >> -		list_del(&query->list);
> >> -		spin_unlock_irqrestore(&ib_nl_request_lock, flags);
> >>  	}
> > Brackets should be removed too.
> Noted.
> >>  	return ret;
> >> --
> >> 1.8.3.1
> >>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] IB/sa: Resolving use-after-free in ib_nl_send_msg
  2020-06-14  6:41       ` Leon Romanovsky
@ 2020-06-16 17:56         ` Divya Indi
  2020-06-17  5:17           ` Leon Romanovsky
  2020-06-17 18:24           ` Jason Gunthorpe
  0 siblings, 2 replies; 30+ messages in thread
From: Divya Indi @ 2020-06-16 17:56 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: linux-kernel, linux-rdma, Jason Gunthorpe, Kaike Wan, Gerd Rausch,
	Håkon Bugge, Srinivas Eeda, Rama Nichanamatlu, Doug Ledford

Hi Leon,

Please find my comments inline -

On 6/13/20 11:41 PM, Leon Romanovsky wrote:
> On Tue, Jun 09, 2020 at 07:45:21AM -0700, Divya Indi wrote:
>> Hi Leon,
>>
>> Thanks for taking the time to review.
>>
>> Please find my comments inline -
>>
>> On 6/9/20 12:00 AM, Leon Romanovsky wrote:
>>> On Mon, Jun 08, 2020 at 07:46:16AM -0700, Divya Indi wrote:
>>>> Commit 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list before sending")'
>>>> -
>>>> 1. Adds the query to the request list before ib_nl_snd_msg.
>>>> 2. Removes ib_nl_send_msg from within the spinlock which also makes it
>>>> possible to allocate memory with GFP_KERNEL.
>>>>
>>>> However, if there is a delay in sending out the request (For
>>>> eg: Delay due to low memory situation) the timer to handle request timeout
>>>> might kick in before the request is sent out to ibacm via netlink.
>>>> ib_nl_request_timeout may release the query causing a use after free situation
>>>> while accessing the query in ib_nl_send_msg.
>>>>
>>>> Call Trace for the above race:
>>>>
>>>> [<ffffffffa02f43cb>] ? ib_pack+0x17b/0x240 [ib_core]
>>>> [<ffffffffa032aef1>] ib_sa_path_rec_get+0x181/0x200 [ib_sa]
>>>> [<ffffffffa0379db0>] rdma_resolve_route+0x3c0/0x8d0 [rdma_cm]
>>>> [<ffffffffa0374450>] ? cma_bind_port+0xa0/0xa0 [rdma_cm]
>>>> [<ffffffffa040f850>] ? rds_rdma_cm_event_handler_cmn+0x850/0x850
>>>> [rds_rdma]
>>>> [<ffffffffa040f22c>] rds_rdma_cm_event_handler_cmn+0x22c/0x850
>>>> [rds_rdma]
>>>> [<ffffffffa040f860>] rds_rdma_cm_event_handler+0x10/0x20 [rds_rdma]
>>>> [<ffffffffa037778e>] addr_handler+0x9e/0x140 [rdma_cm]
>>>> [<ffffffffa026cdb4>] process_req+0x134/0x190 [ib_addr]
>>>> [<ffffffff810a02f9>] process_one_work+0x169/0x4a0
>>>> [<ffffffff810a0b2b>] worker_thread+0x5b/0x560
>>>> [<ffffffff810a0ad0>] ? flush_delayed_work+0x50/0x50
>>>> [<ffffffff810a68fb>] kthread+0xcb/0xf0
>>>> [<ffffffff816ec49a>] ? __schedule+0x24a/0x810
>>>> [<ffffffff816ec49a>] ? __schedule+0x24a/0x810
>>>> [<ffffffff810a6830>] ? kthread_create_on_node+0x180/0x180
>>>> [<ffffffff816f25a7>] ret_from_fork+0x47/0x90
>>>> [<ffffffff810a6830>] ? kthread_create_on_node+0x180/0x180
>>>> ....
>>>> RIP  [<ffffffffa03296cd>] send_mad+0x33d/0x5d0 [ib_sa]
>>>>
>>>> To resolve the above issue -
>>>> 1. Add the req to the request list only after the request has been sent out.
>>>> 2. To handle the race where response comes in before adding request to
>>>> the request list, send(rdma_nl_multicast) and add to list while holding the
>>>> spinlock - request_lock.
>>>> 3. Use GFP_NOWAIT for rdma_nl_multicast since it is called while holding
>>>> a spinlock. In case of memory allocation failure, request will go out to SA.
>>>>
>>>> Signed-off-by: Divya Indi <divya.indi@oracle.com>
>>>> Fixes: 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list
>>>> before sending")
>>> Author SOB should be after "Fixes" line.
>> My bad. Noted.
>>
>>>> ---
>>>>  drivers/infiniband/core/sa_query.c | 34 +++++++++++++++++-----------------
>>>>  1 file changed, 17 insertions(+), 17 deletions(-)
>>>>
>>>> diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
>>>> index 74e0058..042c99b 100644
>>>> --- a/drivers/infiniband/core/sa_query.c
>>>> +++ b/drivers/infiniband/core/sa_query.c
>>>> @@ -836,6 +836,9 @@ static int ib_nl_send_msg(struct ib_sa_query *query, gfp_t gfp_mask)
>>>>  	void *data;
>>>>  	struct ib_sa_mad *mad;
>>>>  	int len;
>>>> +	unsigned long flags;
>>>> +	unsigned long delay;
>>>> +	int ret;
>>>>
>>>>  	mad = query->mad_buf->mad;
>>>>  	len = ib_nl_get_path_rec_attrs_len(mad->sa_hdr.comp_mask);
>>>> @@ -860,35 +863,32 @@ static int ib_nl_send_msg(struct ib_sa_query *query, gfp_t gfp_mask)
>>>>  	/* Repair the nlmsg header length */
>>>>  	nlmsg_end(skb, nlh);
>>>>
>>>> -	return rdma_nl_multicast(&init_net, skb, RDMA_NL_GROUP_LS, gfp_mask);
>>>> +	spin_lock_irqsave(&ib_nl_request_lock, flags);
>>>> +	ret =  rdma_nl_multicast(&init_net, skb, RDMA_NL_GROUP_LS, GFP_NOWAIT);
>>> It is hard to be convinced that this is correct solution. The mix of
>>> gfp_flags and GFP_NOWAIT at the same time and usage of
>>> ib_nl_request_lock to protect lists and suddenly rdma_nl_multicast() too
>>> makes this code unreadable/non-maintainable.
>> Prior to 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list
>> before sending"), we had ib_nl_send_msg under the spinlock ib_nl_request_lock.
>>
>> ie we had -
>>
>> 1. Get spinlock - ib_nl_request_lock
>> 2. ib_nl_send_msg
>> 	2.a) rdma_nl_multicast
>> 3. Add request to the req list
>> 4. Arm the timer if needed.
>> 5. Release spinlock
>>
>> However, ib_nl_send_msg involved a memory allocation using GFP_KERNEL.
>> hence, was moved out of the spinlock. In addition, req was now being
>> added prior to ib_nl_send_msg [To handle the race where response can
>> come in before we get a chance to add the request back to the list].
>>
>> This introduced another race resulting in use-after-free.[Described in the commit.]
>>
>> To resolve this, sending out the request and adding it to list need to
>> happen while holding the request_lock.
>> To ensure minimum allocations while holding the lock, instead of having
>> the entire ib_nl_send_msg under the lock, we only have rdma_nl_multicast
>> under this spinlock.
>>
>> However, do you think it would be a good idea to split ib_nl_send_msg
>> into 2 functions -
>> 1. Prepare the req/query [Outside the spinlock]
>> 2. Sending the req - rdma_nl_multicast [while holding spinlock]
>>
>> Would this be more intuitive?
> While it is always good idea to minimize the locked period. It still
> doesn't answer concern about mixing gfp_flags and direct GFP_NOWAIT.
> For example if user provides GFP_ATOMIC, the GFP_NOWAIT allocation will
> cause a trouble because latter is more lax than first one.

Makes sense, and we do have callers passing GFP_ATOMIC with gfp_mask.

However, in this case when we fail to send the request to ibacm,
we then fallback to sending it to the SA with gfp_mask. So, the
request will eventually go out with GFP_ATOMIC to SA. From the
caller perspective the request will not fail due to memory pressure.

-------
send_mad(...gfp_mask)
	- send to ibacm with GFP_NOWAIT
	- If fails, send to SA with gfp_mask
-------

So, using GFP_NOWAIT may not cause trouble here. 

The other option might be to use GFP_NOWAIT conditionally ie
(only use GFP_NOWAIT when GFP_ATOMIC is not specified in gfp_mask else
use GFP_ATOMIC). Eventual goal being to not have a blocking memory allocation.

Your thoughts? 

Really appreciate your feedback. Thanks!


Regards,
Divya

>
> Thanks
>
>>>> +	if (!ret) {
>>> Please use kernel coding style.
>>>
>>> if (ret) {
>>>   spin_unlock_irqrestore(&ib_nl_request_lock, flags);
>>>   return ret;
>>>   }
>>>
>>>  ....
>> Noted. Will make this change.
>>
>>>> +		/* Put the request on the list.*/
>>>> +		delay = msecs_to_jiffies(sa_local_svc_timeout_ms);
>>>> +		query->timeout = delay + jiffies;
>>>> +		list_add_tail(&query->list, &ib_nl_request_list);
>>>> +		/* Start the timeout if this is the only request */
>>>> +		if (ib_nl_request_list.next == &query->list)
>>>> +			queue_delayed_work(ib_nl_wq, &ib_nl_timed_work, delay);
>>>> +	}
>>>> +	spin_unlock_irqrestore(&ib_nl_request_lock, flags);
>>>> +
>>>> +	return ret;
>>>>  }
>>>>
>>>>  static int ib_nl_make_request(struct ib_sa_query *query, gfp_t gfp_mask)
>>>>  {
>>>> -	unsigned long flags;
>>>> -	unsigned long delay;
>>>>  	int ret;
>>>>
>>>>  	INIT_LIST_HEAD(&query->list);
>>>>  	query->seq = (u32)atomic_inc_return(&ib_nl_sa_request_seq);
>>>>
>>>> -	/* Put the request on the list first.*/
>>>> -	spin_lock_irqsave(&ib_nl_request_lock, flags);
>>>> -	delay = msecs_to_jiffies(sa_local_svc_timeout_ms);
>>>> -	query->timeout = delay + jiffies;
>>>> -	list_add_tail(&query->list, &ib_nl_request_list);
>>>> -	/* Start the timeout if this is the only request */
>>>> -	if (ib_nl_request_list.next == &query->list)
>>>> -		queue_delayed_work(ib_nl_wq, &ib_nl_timed_work, delay);
>>>> -	spin_unlock_irqrestore(&ib_nl_request_lock, flags);
>>>> -
>>>>  	ret = ib_nl_send_msg(query, gfp_mask);
>>>>  	if (ret) {
>>>>  		ret = -EIO;
>>>> -		/* Remove the request */
>>>> -		spin_lock_irqsave(&ib_nl_request_lock, flags);
>>>> -		list_del(&query->list);
>>>> -		spin_unlock_irqrestore(&ib_nl_request_lock, flags);
>>>>  	}
>>> Brackets should be removed too.
>> Noted.
>>>>  	return ret;
>>>> --
>>>> 1.8.3.1
>>>>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] IB/sa: Resolving use-after-free in ib_nl_send_msg
  2020-06-16 17:56         ` Divya Indi
@ 2020-06-17  5:17           ` Leon Romanovsky
  2020-06-17 18:23             ` Jason Gunthorpe
  2020-06-17 18:24           ` Jason Gunthorpe
  1 sibling, 1 reply; 30+ messages in thread
From: Leon Romanovsky @ 2020-06-17  5:17 UTC (permalink / raw)
  To: Divya Indi
  Cc: linux-kernel, linux-rdma, Jason Gunthorpe, Kaike Wan, Gerd Rausch,
	Håkon Bugge, Srinivas Eeda, Rama Nichanamatlu, Doug Ledford

On Tue, Jun 16, 2020 at 10:56:53AM -0700, Divya Indi wrote:
> Hi Leon,
>
> Please find my comments inline -
>
> On 6/13/20 11:41 PM, Leon Romanovsky wrote:
> > On Tue, Jun 09, 2020 at 07:45:21AM -0700, Divya Indi wrote:
> >> Hi Leon,
> >>
> >> Thanks for taking the time to review.
> >>
> >> Please find my comments inline -
> >>
> >> On 6/9/20 12:00 AM, Leon Romanovsky wrote:
> >>> On Mon, Jun 08, 2020 at 07:46:16AM -0700, Divya Indi wrote:
> >>>> Commit 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list before sending")'
> >>>> -
> >>>> 1. Adds the query to the request list before ib_nl_snd_msg.
> >>>> 2. Removes ib_nl_send_msg from within the spinlock which also makes it
> >>>> possible to allocate memory with GFP_KERNEL.
> >>>>
> >>>> However, if there is a delay in sending out the request (For
> >>>> eg: Delay due to low memory situation) the timer to handle request timeout
> >>>> might kick in before the request is sent out to ibacm via netlink.
> >>>> ib_nl_request_timeout may release the query causing a use after free situation
> >>>> while accessing the query in ib_nl_send_msg.
> >>>>
> >>>> Call Trace for the above race:
> >>>>
> >>>> [<ffffffffa02f43cb>] ? ib_pack+0x17b/0x240 [ib_core]
> >>>> [<ffffffffa032aef1>] ib_sa_path_rec_get+0x181/0x200 [ib_sa]
> >>>> [<ffffffffa0379db0>] rdma_resolve_route+0x3c0/0x8d0 [rdma_cm]
> >>>> [<ffffffffa0374450>] ? cma_bind_port+0xa0/0xa0 [rdma_cm]
> >>>> [<ffffffffa040f850>] ? rds_rdma_cm_event_handler_cmn+0x850/0x850
> >>>> [rds_rdma]
> >>>> [<ffffffffa040f22c>] rds_rdma_cm_event_handler_cmn+0x22c/0x850
> >>>> [rds_rdma]
> >>>> [<ffffffffa040f860>] rds_rdma_cm_event_handler+0x10/0x20 [rds_rdma]
> >>>> [<ffffffffa037778e>] addr_handler+0x9e/0x140 [rdma_cm]
> >>>> [<ffffffffa026cdb4>] process_req+0x134/0x190 [ib_addr]
> >>>> [<ffffffff810a02f9>] process_one_work+0x169/0x4a0
> >>>> [<ffffffff810a0b2b>] worker_thread+0x5b/0x560
> >>>> [<ffffffff810a0ad0>] ? flush_delayed_work+0x50/0x50
> >>>> [<ffffffff810a68fb>] kthread+0xcb/0xf0
> >>>> [<ffffffff816ec49a>] ? __schedule+0x24a/0x810
> >>>> [<ffffffff816ec49a>] ? __schedule+0x24a/0x810
> >>>> [<ffffffff810a6830>] ? kthread_create_on_node+0x180/0x180
> >>>> [<ffffffff816f25a7>] ret_from_fork+0x47/0x90
> >>>> [<ffffffff810a6830>] ? kthread_create_on_node+0x180/0x180
> >>>> ....
> >>>> RIP  [<ffffffffa03296cd>] send_mad+0x33d/0x5d0 [ib_sa]
> >>>>
> >>>> To resolve the above issue -
> >>>> 1. Add the req to the request list only after the request has been sent out.
> >>>> 2. To handle the race where response comes in before adding request to
> >>>> the request list, send(rdma_nl_multicast) and add to list while holding the
> >>>> spinlock - request_lock.
> >>>> 3. Use GFP_NOWAIT for rdma_nl_multicast since it is called while holding
> >>>> a spinlock. In case of memory allocation failure, request will go out to SA.
> >>>>
> >>>> Signed-off-by: Divya Indi <divya.indi@oracle.com>
> >>>> Fixes: 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list
> >>>> before sending")
> >>> Author SOB should be after "Fixes" line.
> >> My bad. Noted.
> >>
> >>>> ---
> >>>>  drivers/infiniband/core/sa_query.c | 34 +++++++++++++++++-----------------
> >>>>  1 file changed, 17 insertions(+), 17 deletions(-)
> >>>>
> >>>> diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
> >>>> index 74e0058..042c99b 100644
> >>>> --- a/drivers/infiniband/core/sa_query.c
> >>>> +++ b/drivers/infiniband/core/sa_query.c
> >>>> @@ -836,6 +836,9 @@ static int ib_nl_send_msg(struct ib_sa_query *query, gfp_t gfp_mask)
> >>>>  	void *data;
> >>>>  	struct ib_sa_mad *mad;
> >>>>  	int len;
> >>>> +	unsigned long flags;
> >>>> +	unsigned long delay;
> >>>> +	int ret;
> >>>>
> >>>>  	mad = query->mad_buf->mad;
> >>>>  	len = ib_nl_get_path_rec_attrs_len(mad->sa_hdr.comp_mask);
> >>>> @@ -860,35 +863,32 @@ static int ib_nl_send_msg(struct ib_sa_query *query, gfp_t gfp_mask)
> >>>>  	/* Repair the nlmsg header length */
> >>>>  	nlmsg_end(skb, nlh);
> >>>>
> >>>> -	return rdma_nl_multicast(&init_net, skb, RDMA_NL_GROUP_LS, gfp_mask);
> >>>> +	spin_lock_irqsave(&ib_nl_request_lock, flags);
> >>>> +	ret =  rdma_nl_multicast(&init_net, skb, RDMA_NL_GROUP_LS, GFP_NOWAIT);
> >>> It is hard to be convinced that this is correct solution. The mix of
> >>> gfp_flags and GFP_NOWAIT at the same time and usage of
> >>> ib_nl_request_lock to protect lists and suddenly rdma_nl_multicast() too
> >>> makes this code unreadable/non-maintainable.
> >> Prior to 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list
> >> before sending"), we had ib_nl_send_msg under the spinlock ib_nl_request_lock.
> >>
> >> ie we had -
> >>
> >> 1. Get spinlock - ib_nl_request_lock
> >> 2. ib_nl_send_msg
> >> 	2.a) rdma_nl_multicast
> >> 3. Add request to the req list
> >> 4. Arm the timer if needed.
> >> 5. Release spinlock
> >>
> >> However, ib_nl_send_msg involved a memory allocation using GFP_KERNEL.
> >> hence, was moved out of the spinlock. In addition, req was now being
> >> added prior to ib_nl_send_msg [To handle the race where response can
> >> come in before we get a chance to add the request back to the list].
> >>
> >> This introduced another race resulting in use-after-free.[Described in the commit.]
> >>
> >> To resolve this, sending out the request and adding it to list need to
> >> happen while holding the request_lock.
> >> To ensure minimum allocations while holding the lock, instead of having
> >> the entire ib_nl_send_msg under the lock, we only have rdma_nl_multicast
> >> under this spinlock.
> >>
> >> However, do you think it would be a good idea to split ib_nl_send_msg
> >> into 2 functions -
> >> 1. Prepare the req/query [Outside the spinlock]
> >> 2. Sending the req - rdma_nl_multicast [while holding spinlock]
> >>
> >> Would this be more intuitive?
> > While it is always good idea to minimize the locked period. It still
> > doesn't answer concern about mixing gfp_flags and direct GFP_NOWAIT.
> > For example if user provides GFP_ATOMIC, the GFP_NOWAIT allocation will
> > cause a trouble because latter is more lax than first one.
>
> Makes sense, and we do have callers passing GFP_ATOMIC with gfp_mask.
>
> However, in this case when we fail to send the request to ibacm,
> we then fallback to sending it to the SA with gfp_mask. So, the
> request will eventually go out with GFP_ATOMIC to SA. From the
> caller perspective the request will not fail due to memory pressure.
>
> -------
> send_mad(...gfp_mask)
> 	- send to ibacm with GFP_NOWAIT
> 	- If fails, send to SA with gfp_mask
> -------
>
> So, using GFP_NOWAIT may not cause trouble here.
>
> The other option might be to use GFP_NOWAIT conditionally ie
> (only use GFP_NOWAIT when GFP_ATOMIC is not specified in gfp_mask else
> use GFP_ATOMIC). Eventual goal being to not have a blocking memory allocation.
>
> Your thoughts?

My thoughts that everything here hints me that state machine and
locking are implemented wrongly. In ideal world, the expectation
is that REQ message will have a state in it (PREPARED, SENT, ACK
e.t.c.) and list manipulations are done accordingly with proper
locks, while rdma_nl_multicast() is done outside of the locks.

I don't know if it is possible to fix.

>
> Really appreciate your feedback. Thanks!
>
>
> Regards,
> Divya
>
> >
> > Thanks
> >
> >>>> +	if (!ret) {
> >>> Please use kernel coding style.
> >>>
> >>> if (ret) {
> >>>   spin_unlock_irqrestore(&ib_nl_request_lock, flags);
> >>>   return ret;
> >>>   }
> >>>
> >>>  ....
> >> Noted. Will make this change.
> >>
> >>>> +		/* Put the request on the list.*/
> >>>> +		delay = msecs_to_jiffies(sa_local_svc_timeout_ms);
> >>>> +		query->timeout = delay + jiffies;
> >>>> +		list_add_tail(&query->list, &ib_nl_request_list);
> >>>> +		/* Start the timeout if this is the only request */
> >>>> +		if (ib_nl_request_list.next == &query->list)
> >>>> +			queue_delayed_work(ib_nl_wq, &ib_nl_timed_work, delay);
> >>>> +	}
> >>>> +	spin_unlock_irqrestore(&ib_nl_request_lock, flags);
> >>>> +
> >>>> +	return ret;
> >>>>  }
> >>>>
> >>>>  static int ib_nl_make_request(struct ib_sa_query *query, gfp_t gfp_mask)
> >>>>  {
> >>>> -	unsigned long flags;
> >>>> -	unsigned long delay;
> >>>>  	int ret;
> >>>>
> >>>>  	INIT_LIST_HEAD(&query->list);
> >>>>  	query->seq = (u32)atomic_inc_return(&ib_nl_sa_request_seq);
> >>>>
> >>>> -	/* Put the request on the list first.*/
> >>>> -	spin_lock_irqsave(&ib_nl_request_lock, flags);
> >>>> -	delay = msecs_to_jiffies(sa_local_svc_timeout_ms);
> >>>> -	query->timeout = delay + jiffies;
> >>>> -	list_add_tail(&query->list, &ib_nl_request_list);
> >>>> -	/* Start the timeout if this is the only request */
> >>>> -	if (ib_nl_request_list.next == &query->list)
> >>>> -		queue_delayed_work(ib_nl_wq, &ib_nl_timed_work, delay);
> >>>> -	spin_unlock_irqrestore(&ib_nl_request_lock, flags);
> >>>> -
> >>>>  	ret = ib_nl_send_msg(query, gfp_mask);
> >>>>  	if (ret) {
> >>>>  		ret = -EIO;
> >>>> -		/* Remove the request */
> >>>> -		spin_lock_irqsave(&ib_nl_request_lock, flags);
> >>>> -		list_del(&query->list);
> >>>> -		spin_unlock_irqrestore(&ib_nl_request_lock, flags);
> >>>>  	}
> >>> Brackets should be removed too.
> >> Noted.
> >>>>  	return ret;
> >>>> --
> >>>> 1.8.3.1
> >>>>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] IB/sa: Resolving use-after-free in ib_nl_send_msg
  2020-06-17  5:17           ` Leon Romanovsky
@ 2020-06-17 18:23             ` Jason Gunthorpe
  2020-06-21  7:12               ` Leon Romanovsky
  0 siblings, 1 reply; 30+ messages in thread
From: Jason Gunthorpe @ 2020-06-17 18:23 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Divya Indi, linux-kernel, linux-rdma, Kaike Wan, Gerd Rausch,
	Håkon Bugge, Srinivas Eeda, Rama Nichanamatlu, Doug Ledford

On Wed, Jun 17, 2020 at 08:17:39AM +0300, Leon Romanovsky wrote:
> 
> My thoughts that everything here hints me that state machine and
> locking are implemented wrongly. In ideal world, the expectation
> is that REQ message will have a state in it (PREPARED, SENT, ACK
> e.t.c.) and list manipulations are done accordingly with proper
> locks, while rdma_nl_multicast() is done outside of the locks.

It can't be done outside the lock without creating races - once
rdma_nl_multicast happens it is possible for the other leg of the
operation to begin processing. 

The list must be updated before this happens.

What is missing here is refcounting - the lifetime model of this data
is too implicit, but it is not worth adding I think

Jason

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] IB/sa: Resolving use-after-free in ib_nl_send_msg
  2020-06-17 18:23             ` Jason Gunthorpe
@ 2020-06-21  7:12               ` Leon Romanovsky
  0 siblings, 0 replies; 30+ messages in thread
From: Leon Romanovsky @ 2020-06-21  7:12 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Divya Indi, linux-kernel, linux-rdma, Kaike Wan, Gerd Rausch,
	Håkon Bugge, Srinivas Eeda, Rama Nichanamatlu, Doug Ledford

On Wed, Jun 17, 2020 at 03:23:00PM -0300, Jason Gunthorpe wrote:
> On Wed, Jun 17, 2020 at 08:17:39AM +0300, Leon Romanovsky wrote:
> >
> > My thoughts that everything here hints me that state machine and
> > locking are implemented wrongly. In ideal world, the expectation
> > is that REQ message will have a state in it (PREPARED, SENT, ACK
> > e.t.c.) and list manipulations are done accordingly with proper
> > locks, while rdma_nl_multicast() is done outside of the locks.
>
> It can't be done outside the lock without creating races - once
> rdma_nl_multicast happens it is possible for the other leg of the
> operation to begin processing.

It means that the state machine is wrong, not complete.

>
> The list must be updated before this happens.
>
> What is missing here is refcounting - the lifetime model of this data
> is too implicit, but it is not worth adding I think

I have same feeling for now, but it will flip if new fixes be in this area.

Thanks

>
> Jason

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] IB/sa: Resolving use-after-free in ib_nl_send_msg
  2020-06-16 17:56         ` Divya Indi
  2020-06-17  5:17           ` Leon Romanovsky
@ 2020-06-17 18:24           ` Jason Gunthorpe
  2020-06-20  0:43             ` Divya Indi
  1 sibling, 1 reply; 30+ messages in thread
From: Jason Gunthorpe @ 2020-06-17 18:24 UTC (permalink / raw)
  To: Divya Indi
  Cc: Leon Romanovsky, linux-kernel, linux-rdma, Kaike Wan, Gerd Rausch,
	Håkon Bugge, Srinivas Eeda, Rama Nichanamatlu, Doug Ledford

On Tue, Jun 16, 2020 at 10:56:53AM -0700, Divya Indi wrote:
> The other option might be to use GFP_NOWAIT conditionally ie
> (only use GFP_NOWAIT when GFP_ATOMIC is not specified in gfp_mask else
> use GFP_ATOMIC). Eventual goal being to not have a blocking memory allocation.

This is probably safest for now, unless you can audit all callers and
see if they can switch to GFP_NOWAIT as well

Jason

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] IB/sa: Resolving use-after-free in ib_nl_send_msg
  2020-06-17 18:24           ` Jason Gunthorpe
@ 2020-06-20  0:43             ` Divya Indi
  0 siblings, 0 replies; 30+ messages in thread
From: Divya Indi @ 2020-06-20  0:43 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, linux-kernel, linux-rdma, Kaike Wan, Gerd Rausch,
	Håkon Bugge, Srinivas Eeda, Rama Nichanamatlu, Doug Ledford

Hi Jason,

Thanks for taking the time to review!

On 6/17/20 11:24 AM, Jason Gunthorpe wrote:
> On Tue, Jun 16, 2020 at 10:56:53AM -0700, Divya Indi wrote:
>> The other option might be to use GFP_NOWAIT conditionally ie
>> (only use GFP_NOWAIT when GFP_ATOMIC is not specified in gfp_mask else
>> use GFP_ATOMIC). Eventual goal being to not have a blocking memory allocation.
> This is probably safest for now, unless you can audit all callers and
> see if they can switch to GFP_NOWAIT as well

At present the callers with GFP_ATOMIC appear to be ipoib. Might not be
feasible to change them all to GFP_NOWAIT. 

Will incorporate the review comments and send out v3 early next week.

Thanks,
Divya

>
> Jason

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Review Request
  2020-06-08 14:46 Review Request Divya Indi
  2020-06-08 14:46 ` [PATCH v3] IB/sa: Resolving use-after-free in ib_nl_send_msg Divya Indi
@ 2020-06-09  7:03 ` Leon Romanovsky
  2020-06-09 15:44   ` Divya Indi
  1 sibling, 1 reply; 30+ messages in thread
From: Leon Romanovsky @ 2020-06-09  7:03 UTC (permalink / raw)
  To: Divya Indi
  Cc: linux-kernel, linux-rdma, Jason Gunthorpe, Kaike Wan, Gerd Rausch,
	Håkon Bugge, Srinivas Eeda, Rama Nichanamatlu, Doug Ledford

On Mon, Jun 08, 2020 at 07:46:15AM -0700, Divya Indi wrote:
> [PATCH v3] IB/sa: Resolving use-after-free in ib_nl_send_msg
>
> Hi,
>
> Please review the patch that follows.

Please read Documentation/process/submitting-patches.rst
14) The canonical patch format

You don't need an extra email "Review request" and Changelog should be
put inside the patch itself under "---" marker.

Thanks

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Review Request
  2020-06-09  7:03 ` Review Request Leon Romanovsky
@ 2020-06-09 15:44   ` Divya Indi
  0 siblings, 0 replies; 30+ messages in thread
From: Divya Indi @ 2020-06-09 15:44 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: linux-kernel, linux-rdma, Jason Gunthorpe, Kaike Wan, Gerd Rausch,
	Håkon Bugge, Srinivas Eeda, Rama Nichanamatlu, Doug Ledford

Thanks Leon, Noted!

On 6/9/20 12:03 AM, Leon Romanovsky wrote:
> On Mon, Jun 08, 2020 at 07:46:15AM -0700, Divya Indi wrote:
>> [PATCH v3] IB/sa: Resolving use-after-free in ib_nl_send_msg
>>
>> Hi,
>>
>> Please review the patch that follows.
> Please read Documentation/process/submitting-patches.rst
> 14) The canonical patch format
>
> You don't need an extra email "Review request" and Changelog should be
> put inside the patch itself under "---" marker.
>
> Thanks

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Review Request
@ 2020-05-14 15:11 Divya Indi
  0 siblings, 0 replies; 30+ messages in thread
From: Divya Indi @ 2020-05-14 15:11 UTC (permalink / raw)
  To: linux-kernel, linux-rdma, Jason Gunthorpe, Kaike Wan
  Cc: Gerd Rausch, Håkon Bugge, Srinivas Eeda, Rama Nichanamatlu,
	Doug Ledford

[PATCH v2] IB/sa: Resolving use-after-free in ib_nl_send_msg.

Hi,

This is the v2 of the patch that addresses the comments received for v1 -

- Using atomic bit ops for setting and testing IB_SA_NL_QUERY_SENT.
- Rewording and adding comments.


Thanks,
Divya

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Review Request
@ 2015-09-16 18:09 Adam C. Emerson
  2015-09-16 18:31 ` Gregory Farnum
  0 siblings, 1 reply; 30+ messages in thread
From: Adam C. Emerson @ 2015-09-16 18:09 UTC (permalink / raw)
  To: Ceph Developers

Creators of the Storage Squid,

If you're interested in less use of the allocator, you are interested in
Context* elimination. If so, please review the top two commits on the
wip-decontextualization branch of:

https://github.com/cohortfsllc/cohortfs-ceph

They provide an enhanced version of std::function that allows you to
specify how much space gets preallocated for function objects or disable
allocation entirely.

Also, I have two pull requests, wip-cxx11time and wip-cxx11concurrency
that pass check on which I would appreciate any feedback.

Thank you, Comrades!

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Review Request
  2015-09-16 18:09 Adam C. Emerson
@ 2015-09-16 18:31 ` Gregory Farnum
  2015-09-16 19:06   ` Adam C. Emerson
  0 siblings, 1 reply; 30+ messages in thread
From: Gregory Farnum @ 2015-09-16 18:31 UTC (permalink / raw)
  To: Adam C. Emerson; +Cc: Ceph Developers

On Wed, Sep 16, 2015 at 11:09 AM, Adam C. Emerson <aemerson@redhat.com> wrote:
> Creators of the Storage Squid,
>
> If you're interested in less use of the allocator, you are interested in
> Context* elimination. If so, please review the top two commits on the
> wip-decontextualization branch of:
>
> https://github.com/cohortfsllc/cohortfs-ceph
>
> They provide an enhanced version of std::function that allows you to
> specify how much space gets preallocated for function objects or disable
> allocation entirely.

Can you provide a little background (links or text) for those of us
who aren't up on C++11x/14x? I looked at them briefly but having only
the vaguest idea about some of them quickly got lost. :)
-Greg

>
> Also, I have two pull requests, wip-cxx11time and wip-cxx11concurrency
> that pass check on which I would appreciate any feedback.
>
> Thank you, Comrades!
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Review Request
  2015-09-16 18:31 ` Gregory Farnum
@ 2015-09-16 19:06   ` Adam C. Emerson
  2015-09-16 22:15     ` John Spray
  0 siblings, 1 reply; 30+ messages in thread
From: Adam C. Emerson @ 2015-09-16 19:06 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Ceph Developers

On 09/16/2015 02:31 PM, Gregory Farnum wrote:
> Can you provide a little background (links or text) for those of us
> who aren't up on C++11x/14x? I looked at them briefly but having only
> the vaguest idea about some of them quickly got lost. :)

Surely, sir.

So, in summary, Context* is the means we have been using to handle
callbacks. It has two big problems:

(1) Every Context is allocated at the point of use and freed when it's
called. So by their nature they make heavy use of the heap. (If you
overload complete there's a few exceptions, but by and large Context is
very heapy.)

(2) Context accepts an int. That's it. You can get around this by
storing other things in it and passing pointers to it and so forth. But
it would be nice to have more variety of type in our callbacks.

C++11 (and following) have a whole pile of things that can be used as
functions. There are function objects (objects with an operator()
defined on them), lambdas with variable capture, function member
pointers, reference wrappers pointing to function objects and the like.
This gives you a good bit of flexibility, allows things like variable
capture, let's you allocate one object once and just pass references to
it, and so forth.

Now, these objects all have different sizes and different types. So you
can't just shove them in an object naively. Because a class one of whose
members is a function pointer is going to be a different type than a
class holding a function object. (Which itself is different to a type
holding a reference to a function object.)

std::function exists as a solution to this problem. It provides a
uniform type that can hold any object satisfying the requirement of
being callable with given argument types and a given return value. So,
for example, if you have some 'stat'-like call, you could specify it as
taking a function taking an error code, a size, and a date. Anything
that can be called with such would be accepted, anything that can't
(wrong argument types, say) would be rejected at compile time. And it
could be uniformly stored in a list of Operations.

The downside is that if the thing you provide is too big, std::function
will allocate. 'too big' depends on your library vendor and there's no
way to find it out what. Thus the purpose of the ceph::function class.
If we have a pretty good idea how large most of the callback function
objects we expect to hold are, we can tell it to statically allocate
that much space. This gives us a tool to get allocations out of our fast
path. (For example, if we preallocate a bunch of classes with a
ceph::function that preallocates enough space to hold likely callbacks,
we can just pick them off and have no allocations, in the usual case.)

If we know /everything/ we'll ever get, we can disable allocations
entirely. This is mostly so we can catch situations where we're trying
to shove something unexpectedly huge somewhere it ought not go. An
internal sanity check.

Now, ceph::function, like std::function, is still an abstraction with a
virtual call in it and because it copies things around it reduces
opportunities for inlining. Thus, if you aren't storing a callback on an
object that's supposed to be in a queue, you shouldn't use it. You
should do something like:

template<typename Fun>
void do_stuff(Fun&& f) {
  ...;
  f(some, values);
};

This allows inlining, and (based on my experiments trying multiple
implementations and looking at the generated assembly) if f is called in
a loop, the code is just as good as if you'd open-coded the loop and
written the body of 'f' in it. Functions like this can have the
potential to use closures (lambda expression) effectively for free.

So, the summary is:

Context* is very heapy. We have less heapy alternatives. This implements
a foundation for one of them. We also get a bit more flexibility.

Thank you.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Review Request
  2015-09-16 19:06   ` Adam C. Emerson
@ 2015-09-16 22:15     ` John Spray
  2015-09-17 16:19       ` Adam C. Emerson
  0 siblings, 1 reply; 30+ messages in thread
From: John Spray @ 2015-09-16 22:15 UTC (permalink / raw)
  To: Adam C. Emerson; +Cc: Gregory Farnum, Ceph Developers

This is neat, I had been hankering to lambda-ize various things, but
hadn't worked through what the allocation would looked like in
practice.

Do you know if there's a reason the standard is defined so as to not
let us override the reserved size of a std::function?  Are we taking
any kind of hit by doing this?

John


P.S. I have just gone on Amazon to order a C++11 book because I need
to start using more than just lambda, auto, and pretty for loops ;-)

On Wed, Sep 16, 2015 at 8:06 PM, Adam C. Emerson <aemerson@redhat.com> wrote:
> On 09/16/2015 02:31 PM, Gregory Farnum wrote:
>> Can you provide a little background (links or text) for those of us
>> who aren't up on C++11x/14x? I looked at them briefly but having only
>> the vaguest idea about some of them quickly got lost. :)
>
> Surely, sir.
>
> So, in summary, Context* is the means we have been using to handle
> callbacks. It has two big problems:
>
> (1) Every Context is allocated at the point of use and freed when it's
> called. So by their nature they make heavy use of the heap. (If you
> overload complete there's a few exceptions, but by and large Context is
> very heapy.)
>
> (2) Context accepts an int. That's it. You can get around this by
> storing other things in it and passing pointers to it and so forth. But
> it would be nice to have more variety of type in our callbacks.
>
> C++11 (and following) have a whole pile of things that can be used as
> functions. There are function objects (objects with an operator()
> defined on them), lambdas with variable capture, function member
> pointers, reference wrappers pointing to function objects and the like.
> This gives you a good bit of flexibility, allows things like variable
> capture, let's you allocate one object once and just pass references to
> it, and so forth.
>
> Now, these objects all have different sizes and different types. So you
> can't just shove them in an object naively. Because a class one of whose
> members is a function pointer is going to be a different type than a
> class holding a function object. (Which itself is different to a type
> holding a reference to a function object.)
>
> std::function exists as a solution to this problem. It provides a
> uniform type that can hold any object satisfying the requirement of
> being callable with given argument types and a given return value. So,
> for example, if you have some 'stat'-like call, you could specify it as
> taking a function taking an error code, a size, and a date. Anything
> that can be called with such would be accepted, anything that can't
> (wrong argument types, say) would be rejected at compile time. And it
> could be uniformly stored in a list of Operations.
>
> The downside is that if the thing you provide is too big, std::function
> will allocate. 'too big' depends on your library vendor and there's no
> way to find it out what. Thus the purpose of the ceph::function class.
> If we have a pretty good idea how large most of the callback function
> objects we expect to hold are, we can tell it to statically allocate
> that much space. This gives us a tool to get allocations out of our fast
> path. (For example, if we preallocate a bunch of classes with a
> ceph::function that preallocates enough space to hold likely callbacks,
> we can just pick them off and have no allocations, in the usual case.)
>
> If we know /everything/ we'll ever get, we can disable allocations
> entirely. This is mostly so we can catch situations where we're trying
> to shove something unexpectedly huge somewhere it ought not go. An
> internal sanity check.
>
> Now, ceph::function, like std::function, is still an abstraction with a
> virtual call in it and because it copies things around it reduces
> opportunities for inlining. Thus, if you aren't storing a callback on an
> object that's supposed to be in a queue, you shouldn't use it. You
> should do something like:
>
> template<typename Fun>
> void do_stuff(Fun&& f) {
>   ...;
>   f(some, values);
> };
>
> This allows inlining, and (based on my experiments trying multiple
> implementations and looking at the generated assembly) if f is called in
> a loop, the code is just as good as if you'd open-coded the loop and
> written the body of 'f' in it. Functions like this can have the
> potential to use closures (lambda expression) effectively for free.
>
> So, the summary is:
>
> Context* is very heapy. We have less heapy alternatives. This implements
> a foundation for one of them. We also get a bit more flexibility.
>
> Thank you.
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Review Request
  2015-09-16 22:15     ` John Spray
@ 2015-09-17 16:19       ` Adam C. Emerson
  0 siblings, 0 replies; 30+ messages in thread
From: Adam C. Emerson @ 2015-09-17 16:19 UTC (permalink / raw)
  To: John Spray; +Cc: Gregory Farnum, Ceph Developers

On 09/16/2015 06:15 PM, John Spray wrote:
> This is neat, I had been hankering to lambda-ize various things, but
> hadn't worked through what the allocation would looked like in
> practice.
> 
> Do you know if there's a reason the standard is defined so as to not
> let us override the reserved size of a std::function?  Are we taking
> any kind of hit by doing this?

There are tradeoffs, but I do not believe they burden us.

Small Function Optimization is not actually required by the standard.
It's just that all implementations of std::function I'm familiar with
implement it to at least some degree. In the GNU implementation, the
structure always has space for a pointer to point to heap allocated
stuff. If your 'function' is just a pointer (or something as big as a
pointer) it stores that. Since there aren't really any tradeoffs with
providing at least /some/ form of Small Function Optimization, there's
no reason for an implementation not to write it in. (This is in contrast
to Short String Operation, where it can clash with Copy-on-Write.)

So, the main reason the Standard doesn't allow one to specify the amount
of space reserved for Small Functions is that it doesn't talk about it.

There is a trade-off, however.

  ceph::function<void(int), 4 * sizeof(void*)>

Is a different type than

  ceph::function<void(int), 3 * sizeof(void*)>

This is good, in that it means that in our heap_allocations::forbid
case, we can reject, at compile time anything that the type system can't
guarantee will fit in our reserved space.

The cost is that they differ by type. A reference to one type won't
reference an object of the other type. This means that you'll need to
use a template and universal reference, like:

  template<typename F>
  void do_stuff(F&& f) {
    auto o = new Thing(std::forward<F>(f));
  }

And if you want to overload do_stuff, you'll need to use enable_if to
disambiguate the function from the other stuff. One day we shall have
Concepts. And if they don't give us too broken down and wimpified a
version it should make these problems go away. But for now we need
enable_if. There's a good chance that passing functions by universal
reference and forward right until the point they get stored somewhere is
going to be slightly more efficient at runtime than having the function
take a std::function&/ceph::function& parameter, depending on what the
compiler does (I haven't looked at the assembler for this case but I
suspect it would call the std::function constructor at the point of
function call and the move constructor at the point of storing it into
the data structure), so you might want to do this anyway.

It also means that you can't store pointer/reference to functions with a
different reserved size in the same container. In practice this doesn't
really come up, since if I'm a subsystem with a bunch of OnComplete
functions that I've stored and I want to gather up a list of references
to them I already control what I've stored. Or if you really need to you
can combine a ceph::function with a reserved size of 1 * sizeof(void*)
with std::ref to get a 'universal function reference' at the cost of a
double indirection and two virtual function calls where one would
normally be required.

There is also a potential runtime cost in the move constructor. If our
ceph::function heap allocates its storage and we then use swap, move
assignment, or move construction, we just end up copying a pointer at
each move. If everything fits in reserved space, all these pointer
copies become fairly substantial memcpies. Since Ceph normally stores a
callback function in the structure describing an operation, leaves it
there, and then calls it, this shouldn't hurt us too much. (And is one
of the reasons I recommend using a universal reference and std::forward
above rather than accepting a ceph::function reference and then
std::moving it into its final home.) For cases where we do want to
'move' them around a lot, we can use std::ref and other tricks to turn
the memcpies back into pointer copies.

Thank you!

^ permalink raw reply	[flat|nested] 30+ messages in thread

* review request
@ 2013-11-26 18:28 Damian, Alexandru
  2013-12-05 15:27 ` Paul Eggleton
  0 siblings, 1 reply; 30+ messages in thread
From: Damian, Alexandru @ 2013-11-26 18:28 UTC (permalink / raw)
  To: toaster@yoctoproject.org, Paul Eggleton

[-- Attachment #1: Type: text/plain, Size: 500 bytes --]

Hello,

I have a set of patches for review on:

http://git.yoctoproject.org/cgit/cgit.cgi/poky-contrib/log/?h=toaster/master

The last one, edb12586c27ccddb054737ee6e081f663e2c60da, is pretty invasive
as it
changes the package storage model, from different tables for target and
build packages,
to a single table.

Can you please review and comment on this patchset? and let me know
whenever it's ok to submit upstream ?

Cheers,
Alex

-- 
Alex Damian
Yocto Project
SSG / OTC

[-- Attachment #2: Type: text/html, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: review request
  2013-11-26 18:28 review request Damian, Alexandru
@ 2013-12-05 15:27 ` Paul Eggleton
  2013-12-05 18:18   ` Damian, Alexandru
                     ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Paul Eggleton @ 2013-12-05 15:27 UTC (permalink / raw)
  To: Damian, Alexandru; +Cc: toaster@yoctoproject.org

Hi Alex,

On Tuesday 26 November 2013 18:28:21 Damian, Alexandru wrote:
> I have a set of patches for review on:
> 
> http://git.yoctoproject.org/cgit/cgit.cgi/poky-contrib/log/?h=toaster/master

Sorry for the delay (although we've had some changes in the mean time with the 
pkgdata changes). Reviewing the latest branch, most of the changes look OK, 
however I did notice the following:

> bitbake: toasterui: record recipe licensing information

If do_populate_lic doesn't execute in this build we won't have the license file 
with this method, right? Is that going to be a problem?

> bitbake: toaster: update to Django 1.5.4

I think if we do do this we need to provide people with instructions on how to 
use virtualenv to install 1.5.4 on their system. I'm also a bit concerned 
about us requiring one specific minor release because a number of distros are 
using 1.5 versions other than 1.5.4 - there shouldn't be any incompatible 
changes introduced in minor releases, so we ought to be able to work with any 
1.5 release >= 1.5.4.

> bitbake: toaster: enable debug mode

You don't say *why* you're enabling this for all users - we don't normally 
turn on BBDEBUG by default. Why is this needed?

> bitbake: toaster: save extra data related to executed tasks
>
>+import pprint

This doesn't seem to be needed.

> + 'workdir' : d.getVar("WORKDIR", True),

We can't do this. Apart from a few bits of old code that really shouldn't, 
Bitbake knows nothing about WORKDIR, and that's intentional. If you absolutely 
need this, you'll need to collect it another way. I'm not even sure it makes 
sense when we're talking about a remote context, which we are going to be in 
soon; perhaps we should review how we handle this at the design level.

> + 'filename' : d.getVarFlag(t, "filename"),
> + 'lineno' : d.getVarFlag(t, "lineno"),

I've run this past Richard, and he thinks that we should try partially 
enabling the variable tracking just for functions - sorry for going back and 
forth on this. What we'd need to do is check the performance impact of doing 
so - would you be able to look into that?

Cheers,
Paul

-- 

Paul Eggleton
Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: review request
  2013-12-05 15:27 ` Paul Eggleton
@ 2013-12-05 18:18   ` Damian, Alexandru
  2013-12-06 11:44     ` Paul Eggleton
  2013-12-06 10:57   ` Barros Pena, Belen
  2013-12-09 14:03   ` Damian, Alexandru
  2 siblings, 1 reply; 30+ messages in thread
From: Damian, Alexandru @ 2013-12-05 18:18 UTC (permalink / raw)
  To: Paul Eggleton; +Cc: toaster@yoctoproject.org

[-- Attachment #1: Type: text/plain, Size: 3854 bytes --]

Hi,

Thanks for reviewing this. Comments below, also for Belen.

Alex

On Thu, Dec 5, 2013 at 3:27 PM, Paul Eggleton <paul.eggleton@linux.intel.com
> wrote:

> Hi Alex,
>
> On Tuesday 26 November 2013 18:28:21 Damian, Alexandru wrote:
> > I have a set of patches for review on:
> >
> >
> http://git.yoctoproject.org/cgit/cgit.cgi/poky-contrib/log/?h=toaster/master
>
> Sorry for the delay (although we've had some changes in the mean time with
> the
> pkgdata changes). Reviewing the latest branch, most of the changes look OK,
> however I did notice the following:
>
> > bitbake: toasterui: record recipe licensing information
>
> If do_populate_lic doesn't execute in this build we won't have the license
> file
> with this method, right? Is that going to be a problem?
>

Yep, license info in the required form is dependent on executing
do_populate_lic.
Belen - we need to either return to recipe-based info (relative paths) or
live with missing info for Prebuilt tasks.
Which way to go ?

> bitbake: toaster: update to Django 1.5.4
>
> I think if we do do this we need to provide people with instructions on
> how to
> use virtualenv to install 1.5.4 on their system. I'm also a bit concerned
> about us requiring one specific minor release because a number of distros
> are
> using 1.5 versions other than 1.5.4 - there shouldn't be any incompatible
> changes introduced in minor releases, so we ought to be able to work with
> any
> 1.5 release >= 1.5.4.
>

Installation instructions are included in the patch.  Since we don't have
the capability to test on multiple Django
versions, I think the requirement to use a specific version is acceptable.

Alternatively, we could just print an warning if a supposedly compatible
but not identical Django version is used,
instead of refusing to start. What do you think ?



> > bitbake: toaster: enable debug mode
>
> You don't say *why* you're enabling this for all users - we don't normally
> turn on BBDEBUG by default. Why is this needed?
>

Debug mode is enabled only if one starts toaster with "source toaster start
debug" command line, i.e. not for all
runs, just for debug runs. Since bitbake supports this, seems that toaster
support is needed.


> > bitbake: toaster: save extra data related to executed tasks
> >
> >+import pprint
>
> This doesn't seem to be needed.
>
> > + 'workdir' : d.getVar("WORKDIR", True),
>
> We can't do this. Apart from a few bits of old code that really shouldn't,
> Bitbake knows nothing about WORKDIR, and that's intentional. If you
> absolutely
> need this, you'll need to collect it another way. I'm not even sure it
> makes
> sense when we're talking about a remote context, which we are going to be
> in
> soon; perhaps we should review how we handle this at the design level.
>
> > + 'filename' : d.getVarFlag(t, "filename"),
> > + 'lineno' : d.getVarFlag(t, "lineno"),
>
> I've run this past Richard, and he thinks that we should try partially
> enabling the variable tracking just for functions - sorry for going back
> and
> forth on this. What we'd need to do is check the performance impact of
> doing
> so - would you be able to look into that?
>

Actually, now I think that using variable history is not a good idea - for
AST-parsed
functions, the variable history points to the ast.py file :(.

This patch needs more rework, as I discovered other corner cases, where the
function name is inherited at runtime, e.g. autotools_do_configure is
called instead of do_configure,
but I still think that varflags is the right way to track the function
definition.

What problems do you see with that ?


>
> Cheers,
> Paul
>
> --
>
> Paul Eggleton
> Intel Open Source Technology Centre
>



-- 
Alex Damian
Yocto Project
SSG / OTC

[-- Attachment #2: Type: text/html, Size: 5420 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: review request
  2013-12-05 18:18   ` Damian, Alexandru
@ 2013-12-06 11:44     ` Paul Eggleton
  2013-12-06 13:45       ` Barros Pena, Belen
  0 siblings, 1 reply; 30+ messages in thread
From: Paul Eggleton @ 2013-12-06 11:44 UTC (permalink / raw)
  To: Damian, Alexandru; +Cc: toaster@yoctoproject.org

On Thursday 05 December 2013 18:18:16 Damian, Alexandru wrote:
> On Thu, Dec 5, 2013 at 3:27 PM, Paul Eggleton <paul.eggleton@linux.intel.com
> > If do_populate_lic doesn't execute in this build we won't have the license
> > file with this method, right? Is that going to be a problem?
> 
> Yep, license info in the required form is dependent on executing
> do_populate_lic.
> Belen - we need to either return to recipe-based info (relative paths) or
> live with missing info for Prebuilt tasks.
> Which way to go ?

I'm not an expert in this area, but I'd assume the most important aspect of 
this is about what went into the image, so it might be similar to the rest of 
the individual package information.

> > bitbake: toaster: update to Django 1.5.4
> > 
> > I think if we do do this we need to provide people with instructions on
> > how to use virtualenv to install 1.5.4 on their system. I'm also a bit
> > concerned about us requiring one specific minor release because a number of
> > distros are using 1.5 versions other than 1.5.4 - there shouldn't be any
> > incompatible changes introduced in minor releases, so we ought to be able
> > to work with any 1.5 release >= 1.5.4.
> 
> Installation instructions are included in the patch.  Since we don't have
> the capability to test on multiple Django versions, I think the requirement
> to use a specific version is acceptable.

We handle differing installed versions in the rest of the build system; QA 
ensure that things keep on working. I don't see how this is different.

> Alternatively, we could just print an warning if a supposedly compatible
> but not identical Django version is used, instead of refusing to start. What
> do you think ?

I think the first question is where we expect this to be usable. I think the 
answer almost certainly is "almost every distro we currently support, where 
practical". From there, we'll know what we need to support in Toaster.

> > > bitbake: toaster: enable debug mode
> > 
> > You don't say *why* you're enabling this for all users - we don't normally
> > turn on BBDEBUG by default. Why is this needed?
> 
> Debug mode is enabled only if one starts toaster with "source toaster start
> debug" command line, i.e. not for all runs, just for debug runs. Since
> bitbake supports this, seems that toaster support is needed.

OK, understood.
 
> > > bitbake: toaster: save extra data related to executed tasks
> > >
> > >+import pprint
> > 
> > This doesn't seem to be needed.
> > 
> > > + 'workdir' : d.getVar("WORKDIR", True),
> > 
> > We can't do this. Apart from a few bits of old code that really shouldn't,
> > Bitbake knows nothing about WORKDIR, and that's intentional. If you
> > absolutely need this, you'll need to collect it another way. I'm not even
> > sure it makes sense when we're talking about a remote context, which we
> > are going to be in soon; perhaps we should review how we handle this at
> > the design level.
> > 
> > > + 'filename' : d.getVarFlag(t, "filename"),
> > > + 'lineno' : d.getVarFlag(t, "lineno"),
> > 
> > I've run this past Richard, and he thinks that we should try partially
> > enabling the variable tracking just for functions - sorry for going back
> > and forth on this. What we'd need to do is check the performance impact of
> > doing so - would you be able to look into that?
> 
> Actually, now I think that using variable history is not a good idea - for
> AST-parsed functions, the variable history points to the ast.py file :(.
> 
> This patch needs more rework, as I discovered other corner cases, where the
> function name is inherited at runtime, e.g. autotools_do_configure is
> called instead of do_configure, but I still think that varflags is the right
> way to track the function definition.
> 
> What problems do you see with that ?

It occurs to me that the trouble with our initial model of a single file / line 
is it isn't going to be able to effectively reflect how our task functions came 
to be - they're really just a special case of variables and can thus be 
prepended/appended in a number of different places to get the final function. 
This kind of suggests that the history is what we ought to be using.

Cheers,
Paul

-- 

Paul Eggleton
Intel Open Source Technology Centre


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: review request
  2013-12-06 11:44     ` Paul Eggleton
@ 2013-12-06 13:45       ` Barros Pena, Belen
  2013-12-09 10:40         ` Damian, Alexandru
  0 siblings, 1 reply; 30+ messages in thread
From: Barros Pena, Belen @ 2013-12-06 13:45 UTC (permalink / raw)
  To: Paul Eggleton, Damian, Alexandru
  Cc: Flanagan, Elizabeth, toaster@yoctoproject.org



On 06/12/2013 11:44, "Paul Eggleton" <paul.eggleton@linux.intel.com> wrote:

>On Thursday 05 December 2013 18:18:16 Damian, Alexandru wrote:
>> On Thu, Dec 5, 2013 at 3:27 PM, Paul Eggleton
>><paul.eggleton@linux.intel.com
>> > If do_populate_lic doesn't execute in this build we won't have the
>>license
>> > file with this method, right? Is that going to be a problem?
>> 
>> Yep, license info in the required form is dependent on executing
>> do_populate_lic.
>> Belen - we need to either return to recipe-based info (relative paths)
>>or
>> live with missing info for Prebuilt tasks.
>> Which way to go ?
>
>I'm not an expert in this area, but I'd assume the most important aspect
>of 
>this is about what went into the image, so it might be similar to the
>rest of 
>the individual package information.

I spoke to Beth Flanagan about what kind of information is useful when it
comes to licensing, and Paul is right. It is useful to know which licenses
apply to a certain recipe, but when it comes to license files, you really
want to provide information at the package level. Sorry about this: I
should have spoken to Beth sooner.

So what we should be showing is the full path to the license files in
/build/tmp/deploy/licenses/


So, for busybox:

.../build/tmp/deploy/licenses/busybox/generic_bzip2.txt

.../build/tmp/deploy/licenses/busybox/generic_GPLv2.txt

.../build/tmp/deploy/licenses/busybox/LICENSE.txt


I can open enhancements to remove licensing_info from the recipes table
and add a license_files field to the package information. Let me know if
that would be ok. We should probably also provide the path to the license
manifest as part of the image information.


>
>> > bitbake: toaster: update to Django 1.5.4
>> > 
>> > I think if we do do this we need to provide people with instructions
>>on
>> > how to use virtualenv to install 1.5.4 on their system. I'm also a bit
>> > concerned about us requiring one specific minor release because a
>>number of
>> > distros are using 1.5 versions other than 1.5.4 - there shouldn't be
>>any
>> > incompatible changes introduced in minor releases, so we ought to be
>>able
>> > to work with any 1.5 release >= 1.5.4.
>> 
>> Installation instructions are included in the patch.  Since we don't
>>have
>> the capability to test on multiple Django versions, I think the
>>requirement
>> to use a specific version is acceptable.
>
>We handle differing installed versions in the rest of the build system;
>QA 
>ensure that things keep on working. I don't see how this is different.
>
>> Alternatively, we could just print an warning if a supposedly compatible
>> but not identical Django version is used, instead of refusing to start.
>>What
>> do you think ?
>
>I think the first question is where we expect this to be usable. I think
>the 
>answer almost certainly is "almost every distro we currently support,
>where 
>practical". From there, we'll know what we need to support in Toaster.
>
>> > > bitbake: toaster: enable debug mode
>> > 
>> > You don't say *why* you're enabling this for all users - we don't
>>normally
>> > turn on BBDEBUG by default. Why is this needed?
>> 
>> Debug mode is enabled only if one starts toaster with "source toaster
>>start
>> debug" command line, i.e. not for all runs, just for debug runs. Since
>> bitbake supports this, seems that toaster support is needed.
>
>OK, understood.
> 
>> > > bitbake: toaster: save extra data related to executed tasks
>> > >
>> > >+import pprint
>> > 
>> > This doesn't seem to be needed.
>> > 
>> > > + 'workdir' : d.getVar("WORKDIR", True),
>> > 
>> > We can't do this. Apart from a few bits of old code that really
>>shouldn't,
>> > Bitbake knows nothing about WORKDIR, and that's intentional. If you
>> > absolutely need this, you'll need to collect it another way. I'm not
>>even
>> > sure it makes sense when we're talking about a remote context, which
>>we
>> > are going to be in soon; perhaps we should review how we handle this
>>at
>> > the design level.
>> > 
>> > > + 'filename' : d.getVarFlag(t, "filename"),
>> > > + 'lineno' : d.getVarFlag(t, "lineno"),
>> > 
>> > I've run this past Richard, and he thinks that we should try partially
>> > enabling the variable tracking just for functions - sorry for going
>>back
>> > and forth on this. What we'd need to do is check the performance
>>impact of
>> > doing so - would you be able to look into that?
>> 
>> Actually, now I think that using variable history is not a good idea -
>>for
>> AST-parsed functions, the variable history points to the ast.py file :(.
>> 
>> This patch needs more rework, as I discovered other corner cases, where
>>the
>> function name is inherited at runtime, e.g. autotools_do_configure is
>> called instead of do_configure, but I still think that varflags is the
>>right
>> way to track the function definition.
>> 
>> What problems do you see with that ?
>
>It occurs to me that the trouble with our initial model of a single file
>/ line 
>is it isn't going to be able to effectively reflect how our task
>functions came 
>to be - they're really just a special case of variables and can thus be
>prepended/appended in a number of different places to get the final
>function. 
>This kind of suggests that the history is what we ought to be using.
>
>Cheers,
>Paul
>
>-- 
>
>Paul Eggleton
>Intel Open Source Technology Centre



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: review request
  2013-12-06 13:45       ` Barros Pena, Belen
@ 2013-12-09 10:40         ` Damian, Alexandru
  2013-12-09 13:50           ` Barros Pena, Belen
  0 siblings, 1 reply; 30+ messages in thread
From: Damian, Alexandru @ 2013-12-09 10:40 UTC (permalink / raw)
  To: Barros Pena, Belen
  Cc: Paul Eggleton, Flanagan, Elizabeth, toaster@yoctoproject.org

[-- Attachment #1: Type: text/plain, Size: 6063 bytes --]

Ok, can you please open a bug for the changes to licensing info ?

I'm working on the other items under review.

Alex


On Fri, Dec 6, 2013 at 1:45 PM, Barros Pena, Belen <
belen.barros.pena@intel.com> wrote:

>
>
> On 06/12/2013 11:44, "Paul Eggleton" <paul.eggleton@linux.intel.com>
> wrote:
>
> >On Thursday 05 December 2013 18:18:16 Damian, Alexandru wrote:
> >> On Thu, Dec 5, 2013 at 3:27 PM, Paul Eggleton
> >><paul.eggleton@linux.intel.com
> >> > If do_populate_lic doesn't execute in this build we won't have the
> >>license
> >> > file with this method, right? Is that going to be a problem?
> >>
> >> Yep, license info in the required form is dependent on executing
> >> do_populate_lic.
> >> Belen - we need to either return to recipe-based info (relative paths)
> >>or
> >> live with missing info for Prebuilt tasks.
> >> Which way to go ?
> >
> >I'm not an expert in this area, but I'd assume the most important aspect
> >of
> >this is about what went into the image, so it might be similar to the
> >rest of
> >the individual package information.
>
> I spoke to Beth Flanagan about what kind of information is useful when it
> comes to licensing, and Paul is right. It is useful to know which licenses
> apply to a certain recipe, but when it comes to license files, you really
> want to provide information at the package level. Sorry about this: I
> should have spoken to Beth sooner.
>
> So what we should be showing is the full path to the license files in
> /build/tmp/deploy/licenses/
>
>
> So, for busybox:
>
> .../build/tmp/deploy/licenses/busybox/generic_bzip2.txt
>
> .../build/tmp/deploy/licenses/busybox/generic_GPLv2.txt
>
> .../build/tmp/deploy/licenses/busybox/LICENSE.txt
>
>
> I can open enhancements to remove licensing_info from the recipes table
> and add a license_files field to the package information. Let me know if
> that would be ok. We should probably also provide the path to the license
> manifest as part of the image information.
>
>
> >
> >> > bitbake: toaster: update to Django 1.5.4
> >> >
> >> > I think if we do do this we need to provide people with instructions
> >>on
> >> > how to use virtualenv to install 1.5.4 on their system. I'm also a bit
> >> > concerned about us requiring one specific minor release because a
> >>number of
> >> > distros are using 1.5 versions other than 1.5.4 - there shouldn't be
> >>any
> >> > incompatible changes introduced in minor releases, so we ought to be
> >>able
> >> > to work with any 1.5 release >= 1.5.4.
> >>
> >> Installation instructions are included in the patch.  Since we don't
> >>have
> >> the capability to test on multiple Django versions, I think the
> >>requirement
> >> to use a specific version is acceptable.
> >
> >We handle differing installed versions in the rest of the build system;
> >QA
> >ensure that things keep on working. I don't see how this is different.
> >
> >> Alternatively, we could just print an warning if a supposedly compatible
> >> but not identical Django version is used, instead of refusing to start.
> >>What
> >> do you think ?
> >
> >I think the first question is where we expect this to be usable. I think
> >the
> >answer almost certainly is "almost every distro we currently support,
> >where
> >practical". From there, we'll know what we need to support in Toaster.
> >
> >> > > bitbake: toaster: enable debug mode
> >> >
> >> > You don't say *why* you're enabling this for all users - we don't
> >>normally
> >> > turn on BBDEBUG by default. Why is this needed?
> >>
> >> Debug mode is enabled only if one starts toaster with "source toaster
> >>start
> >> debug" command line, i.e. not for all runs, just for debug runs. Since
> >> bitbake supports this, seems that toaster support is needed.
> >
> >OK, understood.
> >
> >> > > bitbake: toaster: save extra data related to executed tasks
> >> > >
> >> > >+import pprint
> >> >
> >> > This doesn't seem to be needed.
> >> >
> >> > > + 'workdir' : d.getVar("WORKDIR", True),
> >> >
> >> > We can't do this. Apart from a few bits of old code that really
> >>shouldn't,
> >> > Bitbake knows nothing about WORKDIR, and that's intentional. If you
> >> > absolutely need this, you'll need to collect it another way. I'm not
> >>even
> >> > sure it makes sense when we're talking about a remote context, which
> >>we
> >> > are going to be in soon; perhaps we should review how we handle this
> >>at
> >> > the design level.
> >> >
> >> > > + 'filename' : d.getVarFlag(t, "filename"),
> >> > > + 'lineno' : d.getVarFlag(t, "lineno"),
> >> >
> >> > I've run this past Richard, and he thinks that we should try partially
> >> > enabling the variable tracking just for functions - sorry for going
> >>back
> >> > and forth on this. What we'd need to do is check the performance
> >>impact of
> >> > doing so - would you be able to look into that?
> >>
> >> Actually, now I think that using variable history is not a good idea -
> >>for
> >> AST-parsed functions, the variable history points to the ast.py file :(.
> >>
> >> This patch needs more rework, as I discovered other corner cases, where
> >>the
> >> function name is inherited at runtime, e.g. autotools_do_configure is
> >> called instead of do_configure, but I still think that varflags is the
> >>right
> >> way to track the function definition.
> >>
> >> What problems do you see with that ?
> >
> >It occurs to me that the trouble with our initial model of a single file
> >/ line
> >is it isn't going to be able to effectively reflect how our task
> >functions came
> >to be - they're really just a special case of variables and can thus be
> >prepended/appended in a number of different places to get the final
> >function.
> >This kind of suggests that the history is what we ought to be using.
> >
> >Cheers,
> >Paul
> >
> >--
> >
> >Paul Eggleton
> >Intel Open Source Technology Centre
>
>


-- 
Alex Damian
Yocto Project
SSG / OTC

[-- Attachment #2: Type: text/html, Size: 7911 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: review request
  2013-12-09 10:40         ` Damian, Alexandru
@ 2013-12-09 13:50           ` Barros Pena, Belen
  0 siblings, 0 replies; 30+ messages in thread
From: Barros Pena, Belen @ 2013-12-09 13:50 UTC (permalink / raw)
  To: Damian, Alexandru
  Cc: Paul Eggleton, Flanagan, Elizabeth, toaster@yoctoproject.org

Done

On 09/12/2013 10:40, "Damian, Alexandru" <alexandru.damian@intel.com>
wrote:

>I can open enhancements to remove licensing_info from the recipes table

https://bugzilla.yoctoproject.org/show_bug.cgi?id=5647

>and add a license_files field to the package information.

https://bugzilla.yoctoproject.org/show_bug.cgi?id=5648

>Let me know if
>that would be ok. We should probably also provide the path to the license
>manifest as part of the image information.

https://bugzilla.yoctoproject.org/show_bug.cgi?id=5649

>



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: review request
  2013-12-05 15:27 ` Paul Eggleton
  2013-12-05 18:18   ` Damian, Alexandru
@ 2013-12-06 10:57   ` Barros Pena, Belen
  2013-12-09 14:03   ` Damian, Alexandru
  2 siblings, 0 replies; 30+ messages in thread
From: Barros Pena, Belen @ 2013-12-06 10:57 UTC (permalink / raw)
  To: Paul Eggleton, Damian, Alexandru; +Cc: toaster@yoctoproject.org

On 05/12/2013 15:27, "Paul Eggleton" <paul.eggleton@linux.intel.com> wrote:

>Hi Alex,
>
>On Tuesday 26 November 2013 18:28:21 Damian, Alexandru wrote:
>> I have a set of patches for review on:
>> 
>> 
>>http://git.yoctoproject.org/cgit/cgit.cgi/poky-contrib/log/?h=toaster/mas
>>ter
>
>Sorry for the delay (although we've had some changes in the mean time
>with the 
>pkgdata changes). Reviewing the latest branch, most of the changes look
>OK, 
>however I did notice the following:
>
>> bitbake: toasterui: record recipe licensing information
>
>If do_populate_lic doesn't execute in this build we won't have the
>license file 
>with this method, right? Is that going to be a problem?

That would explain this bug:
https://bugzilla.yoctoproject.org/show_bug.cgi?id=5629

I am inclined to think it is a problem.

>
>> bitbake: toaster: update to Django 1.5.4
>
>I think if we do do this we need to provide people with instructions on
>how to 
>use virtualenv to install 1.5.4 on their system. I'm also a bit concerned
>about us requiring one specific minor release because a number of distros
>are 
>using 1.5 versions other than 1.5.4 - there shouldn't be any incompatible
>changes introduced in minor releases, so we ought to be able to work with
>any 
>1.5 release >= 1.5.4.
>
>> bitbake: toaster: enable debug mode
>
>You don't say *why* you're enabling this for all users - we don't
>normally 
>turn on BBDEBUG by default. Why is this needed?
>
>> bitbake: toaster: save extra data related to executed tasks
>>
>>+import pprint
>
>This doesn't seem to be needed.
>
>> + 'workdir' : d.getVar("WORKDIR", True),
>
>We can't do this. Apart from a few bits of old code that really
>shouldn't, 
>Bitbake knows nothing about WORKDIR, and that's intentional. If you
>absolutely 
>need this, you'll need to collect it another way. I'm not even sure it
>makes 
>sense when we're talking about a remote context, which we are going to be
>in 
>soon; perhaps we should review how we handle this at the design level.

Well, I guess we either show it or not. The reason why I thought this
piece of information would be useful is that the work directory contains a
whole bunch of files people might want to look into (package source, etc).
It is handy to be told where they are instead of having to look for them.
It also opens the possibility of remote access to those files, if we ever
want to provide that.

But I could be wrong, of course.

>
>> + 'filename' : d.getVarFlag(t, "filename"),
>> + 'lineno' : d.getVarFlag(t, "lineno"),
>
>I've run this past Richard, and he thinks that we should try partially
>enabling the variable tracking just for functions - sorry for going back
>and 
>forth on this. What we'd need to do is check the performance impact of
>doing 
>so - would you be able to look into that?
>
>Cheers,
>Paul
>
>-- 
>
>Paul Eggleton
>Intel Open Source Technology Centre
>_______________________________________________
>toaster mailing list
>toaster@yoctoproject.org
>https://lists.yoctoproject.org/listinfo/toaster



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: review request
  2013-12-05 15:27 ` Paul Eggleton
  2013-12-05 18:18   ` Damian, Alexandru
  2013-12-06 10:57   ` Barros Pena, Belen
@ 2013-12-09 14:03   ` Damian, Alexandru
  2 siblings, 0 replies; 30+ messages in thread
From: Damian, Alexandru @ 2013-12-09 14:03 UTC (permalink / raw)
  To: Paul Eggleton; +Cc: toaster@yoctoproject.org

[-- Attachment #1: Type: text/plain, Size: 2753 bytes --]

Hi Paul,

I've pushed a new toaster-master with either the review changes applied, or
the changes entirely taken out if they were too troublesome (workdir,
licensing_info changes).

Can you please check it and let me know if ok ?

Cheers,
Alex


On Thu, Dec 5, 2013 at 3:27 PM, Paul Eggleton <paul.eggleton@linux.intel.com
> wrote:

> Hi Alex,
>
> On Tuesday 26 November 2013 18:28:21 Damian, Alexandru wrote:
> > I have a set of patches for review on:
> >
> >
> http://git.yoctoproject.org/cgit/cgit.cgi/poky-contrib/log/?h=toaster/master
>
> Sorry for the delay (although we've had some changes in the mean time with
> the
> pkgdata changes). Reviewing the latest branch, most of the changes look OK,
> however I did notice the following:
>
> > bitbake: toasterui: record recipe licensing information
>
> If do_populate_lic doesn't execute in this build we won't have the license
> file
> with this method, right? Is that going to be a problem?
>
> > bitbake: toaster: update to Django 1.5.4
>
> I think if we do do this we need to provide people with instructions on
> how to
> use virtualenv to install 1.5.4 on their system. I'm also a bit concerned
> about us requiring one specific minor release because a number of distros
> are
> using 1.5 versions other than 1.5.4 - there shouldn't be any incompatible
> changes introduced in minor releases, so we ought to be able to work with
> any
> 1.5 release >= 1.5.4.
>
> > bitbake: toaster: enable debug mode
>
> You don't say *why* you're enabling this for all users - we don't normally
> turn on BBDEBUG by default. Why is this needed?
>
> > bitbake: toaster: save extra data related to executed tasks
> >
> >+import pprint
>
> This doesn't seem to be needed.
>
> > + 'workdir' : d.getVar("WORKDIR", True),
>
> We can't do this. Apart from a few bits of old code that really shouldn't,
> Bitbake knows nothing about WORKDIR, and that's intentional. If you
> absolutely
> need this, you'll need to collect it another way. I'm not even sure it
> makes
> sense when we're talking about a remote context, which we are going to be
> in
> soon; perhaps we should review how we handle this at the design level.
>
> > + 'filename' : d.getVarFlag(t, "filename"),
> > + 'lineno' : d.getVarFlag(t, "lineno"),
>
> I've run this past Richard, and he thinks that we should try partially
> enabling the variable tracking just for functions - sorry for going back
> and
> forth on this. What we'd need to do is check the performance impact of
> doing
> so - would you be able to look into that?
>
> Cheers,
> Paul
>
> --
>
> Paul Eggleton
> Intel Open Source Technology Centre
>



-- 
Alex Damian
Yocto Project
SSG / OTC

[-- Attachment #2: Type: text/html, Size: 3588 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* review request
@ 2013-05-03 20:54 Elso Andras
  2013-05-03 20:58 ` Sage Weil
  0 siblings, 1 reply; 30+ messages in thread
From: Elso Andras @ 2013-05-03 20:54 UTC (permalink / raw)
  To: ceph-devel

Hi,

Before i send the pull request, please review this commits:
https://github.com/Elbandi/ceph/compare/master...wip-getlayout

Elbandi

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: review request
  2013-05-03 20:54 Elso Andras
@ 2013-05-03 20:58 ` Sage Weil
  0 siblings, 0 replies; 30+ messages in thread
From: Sage Weil @ 2013-05-03 20:58 UTC (permalink / raw)
  To: Elso Andras; +Cc: ceph-devel

On Fri, 3 May 2013, Elso Andras wrote:
> Hi,
> 
> Before i send the pull request, please review this commits:
> https://github.com/Elbandi/ceph/compare/master...wip-getlayout

Nice!  A pull request is actually the easiest way to review as it will 
capture any comments we make on github for future reference.  Each 
subsequent revision can just be repushed to the same branch and github 
will (mostly) do the right thing.

Thanks!
sage

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2020-06-21  7:12 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-06-08 14:46 Review Request Divya Indi
2020-06-08 14:46 ` [PATCH v3] IB/sa: Resolving use-after-free in ib_nl_send_msg Divya Indi
2020-06-09  7:00   ` Leon Romanovsky
2020-06-09 14:45     ` Divya Indi
2020-06-14  6:41       ` Leon Romanovsky
2020-06-16 17:56         ` Divya Indi
2020-06-17  5:17           ` Leon Romanovsky
2020-06-17 18:23             ` Jason Gunthorpe
2020-06-21  7:12               ` Leon Romanovsky
2020-06-17 18:24           ` Jason Gunthorpe
2020-06-20  0:43             ` Divya Indi
2020-06-09  7:03 ` Review Request Leon Romanovsky
2020-06-09 15:44   ` Divya Indi
  -- strict thread matches above, loose matches on Subject: below --
2020-05-14 15:11 Divya Indi
2015-09-16 18:09 Adam C. Emerson
2015-09-16 18:31 ` Gregory Farnum
2015-09-16 19:06   ` Adam C. Emerson
2015-09-16 22:15     ` John Spray
2015-09-17 16:19       ` Adam C. Emerson
2013-11-26 18:28 review request Damian, Alexandru
2013-12-05 15:27 ` Paul Eggleton
2013-12-05 18:18   ` Damian, Alexandru
2013-12-06 11:44     ` Paul Eggleton
2013-12-06 13:45       ` Barros Pena, Belen
2013-12-09 10:40         ` Damian, Alexandru
2013-12-09 13:50           ` Barros Pena, Belen
2013-12-06 10:57   ` Barros Pena, Belen
2013-12-09 14:03   ` Damian, Alexandru
2013-05-03 20:54 Elso Andras
2013-05-03 20:58 ` Sage Weil

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.