From mboxrd@z Thu Jan 1 00:00:00 1970 From: rsanford2@gmail.com Subject: [PATCH v2 0/3] timer: fix rte_timer_manage and improve unit tests Date: Mon, 27 Jul 2015 18:46:03 -0400 Message-ID: <1438037168-639-1-git-send-email-rsanford2@gmail.com> References: <1437691347-58708-1-git-send-email-rsanford2@gmail.com> To: dev@dpdk.org Return-path: Received: from mail-ig0-f169.google.com (mail-ig0-f169.google.com [209.85.213.169]) by dpdk.org (Postfix) with ESMTP id C796DC5A8 for ; Tue, 28 Jul 2015 00:46:23 +0200 (CEST) Received: by igbpg9 with SMTP id pg9so112929148igb.0 for ; Mon, 27 Jul 2015 15:46:23 -0700 (PDT) In-Reply-To: <1437691347-58708-1-git-send-email-rsanford2@gmail.com> List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Robert Sanford This patchset fixes a bug in timer stress test 2, adds a new stress test to expose a race condition bug in API rte_timer_manage(), and then fixes the rte_timer_manage() bug. Description of rte_timer_manage() race condition bug: Through code inspection, we notice a potential problem in rte_timer_manage() that leads to corruption of per-lcore pending-lists (implemented as skip-lists). The race condition occurs when rte_timer_manage() expires multiple timers on lcore A, while lcore B simultaneously invokes rte_timer_reset() for one of the expiring timers (other than the first one). Lcore A splits its pending-list, creating a local list of expired timers linked through their sl_next[0] pointers, and sets the first expired timer to the RUNNING state, all during one list-lock round trip. Lcore A then unlocks the list-lock to run the first callback, and that is when A and B can have different interpretations of the subsequent expired timers' true state. Lcore B sees an expired timer still in the PENDING state, atomically changes the timer to the CONFIG state, locks lcore A's list-lock, and reinserts the timer into A's pending-list. The two lcores try to use the same next-pointers to maintain both lists! v2 changes: Move patch descriptions to their respective patches. Correct checkpatch warnings. Robert Sanford (3): fix stress test 2 sync bug add timer manage race condition test fix race condition in rte_timer_manage app/test/Makefile | 1 + app/test/test_timer.c | 154 +++++++++++++++++++++++------- app/test/test_timer_racecond.c | 209 ++++++++++++++++++++++++++++++++++++++++ lib/librte_timer/rte_timer.c | 56 +++++++---- 4 files changed, 366 insertions(+), 54 deletions(-) create mode 100644 app/test/test_timer_racecond.c