From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755874AbeARQMo (ORCPT ); Thu, 18 Jan 2018 11:12:44 -0500 Received: from mail-wm0-f68.google.com ([74.125.82.68]:45581 "EHLO mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753534AbeARQMm (ORCPT ); Thu, 18 Jan 2018 11:12:42 -0500 X-Google-Smtp-Source: ACJfBov1EVeHoxo0DJS4m0prCZ9FvWnOYPmuJ1UzUZ7bISI/b7TFpVXU9i8IfTtKknJQHJyTDUSMKg== From: Dmitry Safonov To: linux-kernel@vger.kernel.org Cc: Dmitry Safonov , Andrew Morton , David Miller , Eric Dumazet , Frederic Weisbecker , Hannes Frederic Sowa , Ingo Molnar , "Levin, Alexander (Sasha Levin)" , Linus Torvalds , Mauro Carvalho Chehab , Mike Galbraith , Paolo Abeni , "Paul E. McKenney" , Peter Zijlstra , Radu Rendec , Rik van Riel , Stanislaw Gruszka , Thomas Gleixner , Wanpeng Li Subject: [RFC 0/6] Multi-thread per-cpu ksoftirqd Date: Thu, 18 Jan 2018 16:12:32 +0000 Message-Id: <20180118161238.13792-1-dima@arista.com> X-Mailer: git-send-email 2.13.6 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Another attempt to solve softirq deferring problems. There are at least two problems, AFAIK: o deferring one softirq to ksoftirqd results in latencies for other (different type) softirqs by the reason of ksoftirqd_running() decision for deferring/servicing. o The logic in __do_softirq() that checks if (pending) after 2ms of processing doesn't work on some machines during i.e. UDP storm. So, what's done here in attempt to improve this is: - added boot param to separate softirqs in deffer-groups - per each softirq-group there is a ksoftirqd (per-cpu also) The last two patches might be just a brain fart as I tried to improve the metric on which the decision to defer is based. I measure the time spent to serve each softirq and account that time to ksoftirqd thread of that softirq-group. After that the decision to serve/defer a softirq is based on the comparison: (current->vruntime < ksoftirqd->vruntime) Ugh, time measures and updating ksoftirqd cpu time each tick might be costly.. And it looks like it doesn't work as expected: a new task is being started with normalized vruntime (min_vruntime), which is lower than ksoftirqd's. And time spent on servicing softirqs are still bigger than any running task. Anyway, sending this as RFC, may be some one will like the approach (or suggests some other ideas). Cc: Andrew Morton Cc: David Miller Cc: Eric Dumazet Cc: Frederic Weisbecker Cc: Hannes Frederic Sowa Cc: Ingo Molnar Cc: "Levin, Alexander (Sasha Levin)" Cc: Linus Torvalds Cc: Mauro Carvalho Chehab Cc: Mike Galbraith Cc: Paolo Abeni Cc: "Paul E. McKenney" Cc: Peter Zijlstra Cc: Radu Rendec Cc: Rik van Riel Cc: Stanislaw Gruszka Cc: Thomas Gleixner Cc: Wanpeng Li Dmitry Safonov (6): softirq: Add softirq_groups boot parameter softirq: Introduce mask for __do_softirq() softirq: Add reverse group-to-softirq map softirq: Run per-group per-cpu ksoftirqd thread softirq: Add time accounting per-softirq type softirq/sched: Account si cpu time to ksoftirqd(s) Documentation/admin-guide/kernel-parameters.txt | 16 ++ include/linux/hardirq.h | 2 +- include/linux/interrupt.h | 26 +- include/linux/vtime.h | 10 +- init/Kconfig | 10 + kernel/sched/cputime.c | 60 +++- kernel/sched/fair.c | 38 +++ kernel/sched/sched.h | 20 ++ kernel/softirq.c | 362 ++++++++++++++++++++---- net/ipv4/tcp_output.c | 2 +- 10 files changed, 464 insertions(+), 82 deletions(-) -- 2.13.6