From mboxrd@z Thu Jan  1 00:00:00 1970
From: Matan Barak <matanb@mellanox.com>
Subject: Re: [PATCH V1 net-next 03/10] net/mlx4_core: Use tasklet for user-space
 CQ completion events
Date: Wed, 10 Dec 2014 17:47:31 +0200
Message-ID: <54886B13.5040409@mellanox.com>
References: <1418216999-17012-1-git-send-email-ogerlitz@mellanox.com>
	 <1418216999-17012-4-git-send-email-ogerlitz@mellanox.com>
 <1418225599.27198.18.camel@edumazet-glaptop2.roam.corp.google.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="utf-8"; format=flowed
Content-Transfer-Encoding: 7bit
Cc: "David S. Miller" <davem@davemloft.net>, <netdev@vger.kernel.org>,
	"Amir Vadai" <amirv@mellanox.com>, Tal Alon <talal@mellanox.com>,
	Jack Morgenstein <jackm@dev.mellanox.co.il>
To: Eric Dumazet <eric.dumazet@gmail.com>,
	Or Gerlitz <ogerlitz@mellanox.com>
Return-path: <netdev-owner@vger.kernel.org>
Received: from mail-db3on0089.outbound.protection.outlook.com ([157.55.234.89]:42346
	"EHLO emea01-db3-obe.outbound.protection.outlook.com"
	rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP
	id S1754093AbaLJQXc (ORCPT <rfc822;netdev@vger.kernel.org>);
	Wed, 10 Dec 2014 11:23:32 -0500
In-Reply-To: <1418225599.27198.18.camel@edumazet-glaptop2.roam.corp.google.com>
Sender: netdev-owner@vger.kernel.org
List-ID: <netdev.vger.kernel.org>


On 12/10/2014 5:33 PM, Eric Dumazet wrote:
> On Wed, 2014-12-10 at 15:09 +0200, Or Gerlitz wrote:
>> From: Matan Barak <matanb@mellanox.com>
>>
>> Previously, we've fired all our completion callbacks straight from our ISR.
>>
>> Some of those callbacks were lightweight (for example, mlx4_en's and
>> IPoIB napi callbacks), but some of them did more work (for example,
>> the user-space RDMA stack uverbs' completion handler). Besides that,
>> doing more than the minimal work in ISR is generally considered wrong,
>> it could even lead to a hard lockup of the system. Since when a lot
>> of completion events are generated by the hardware, the loop over those
>> events could be so long, that we'll get into a hard lockup by the system
>> watchdog.
>
> ...
>
>> +#define TASKLET_THRESHOLD 1000
>> +
>> +void mlx4_cq_tasklet_cb(unsigned long data)
>> +{
>> +	unsigned long flags;
>> +	unsigned int i = 0;
>> +	struct mlx4_eq_tasklet *ctx = (struct mlx4_eq_tasklet *)data;
>> +	struct mlx4_cq *mcq, *temp;
>> +
>> +	spin_lock_irqsave(&ctx->lock, flags);
>> +	list_splice_tail_init(&ctx->list, &ctx->process_list);
>> +	spin_unlock_irqrestore(&ctx->lock, flags);
>> +
>> +	list_for_each_entry_safe(mcq, temp, &ctx->process_list, tasklet_ctx.list) {
>> +		list_del_init(&mcq->tasklet_ctx.list);
>> +		mcq->tasklet_ctx.comp(mcq);
>> +		if (atomic_dec_and_test(&mcq->refcount))
>> +			complete(&mcq->free);
>> +		if (++i == TASKLET_THRESHOLD)
>> +			break;
>> +	}
>> +
>> +	if (i == TASKLET_THRESHOLD)
>> +		tasklet_schedule(&ctx->task);
>> +}
>> +
>
> What is the max duration of doing this loop up to 1000 times ?
>
> I suspect it might be too long, but not necessarily detected by
> conventional watchdog.
>
> __do_softirq() uses both a counter and a test against jiffies, with a 2
> ms limit.

You're right - we'll measure it accurately, but I think it took over 2ms 
(on a system with 400 CQs opened), including the spin_lock on the 
list_splice.

We'll add the jiffies test to V2.

Thanks.

>
> Thanks.
>
>