From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933050AbdCWIko (ORCPT ); Thu, 23 Mar 2017 04:40:44 -0400 Received: from a.mx.secunet.com ([62.96.220.36]:45914 "EHLO a.mx.secunet.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752388AbdCWIk3 (ORCPT ); Thu, 23 Mar 2017 04:40:29 -0400 Date: Thu, 23 Mar 2017 09:40:26 +0100 From: Steffen Klassert To: "Jason A. Donenfeld" CC: Linux Crypto Mailing List , LKML , Netdev , "Samuel Holland" , WireGuard mailing list Subject: Re: race condition in kernel/padata.c Message-ID: <20170323084026.GA32453@secunet.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-Originating-IP: [10.182.7.193] X-G-Data-MailSecurity-for-Exchange-State: 0 X-G-Data-MailSecurity-for-Exchange-Error: 0 X-G-Data-MailSecurity-for-Exchange-Sender: 23 X-G-Data-MailSecurity-for-Exchange-Server: d65e63f7-5c15-413f-8f63-c0d707471c93 X-EXCLAIMER-MD-CONFIG: 2c86f778-e09b-4440-8b15-867914633a10 X-G-Data-MailSecurity-for-Exchange-Guid: 9FF310A2-2C30-4D5A-9907-959CD81DA159 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 23, 2017 at 12:03:43AM +0100, Jason A. Donenfeld wrote: > Hey Steffen, > > WireGuard makes really heavy use of padata, feeding it units of work > from different cores in different contexts all at the same time. For > the most part, everything has been fine, but one particular user has > consistently run into mysterious bugs. He's using a rather old dual > core CPU, which have a tendency to bring out race conditions > sometimes. After struggling with getting a good backtrace, we finally > managed to extract this from list debugging: > > [87487.298728] WARNING: CPU: 1 PID: 882 at lib/list_debug.c:33 > __list_add+0xae/0x130 > [87487.301868] list_add corruption. prev->next should be next > (ffffb17abfc043d0), but was ffff8dba70872c80. (prev=ffff8dba70872b00). > [87487.339011] [] dump_stack+0x68/0xa3 > [87487.342198] [] ? console_unlock+0x281/0x6d0 > [87487.345364] [] __warn+0xff/0x140 > [87487.348513] [] warn_slowpath_fmt+0x4a/0x50 > [87487.351659] [] __list_add+0xae/0x130 > [87487.354772] [] ? _raw_spin_lock+0x64/0x70 > [87487.357915] [] padata_reorder+0x1e6/0x420 > [87487.361084] [] padata_do_serial+0xa5/0x120 > > padata_reorder calls list_add_tail with the list to which its adding > locked, which seems correct: > > spin_lock(&squeue->serial.lock); > list_add_tail(&padata->list, &squeue->serial.list); > spin_unlock(&squeue->serial.lock); > > This therefore leaves only place where such inconsistency could occur: > if padata->list is added at the same time on two different threads. > This pdata pointer comes from the function call to > padata_get_next(pd), which has in it the following block: > > next_queue = per_cpu_ptr(pd->pqueue, cpu); > padata = NULL; > reorder = &next_queue->reorder; > if (!list_empty(&reorder->list)) { > padata = list_entry(reorder->list.next, > struct padata_priv, list); > spin_lock(&reorder->lock); > list_del_init(&padata->list); > atomic_dec(&pd->reorder_objects); > spin_unlock(&reorder->lock); > > pd->processed++; > > goto out; > } > out: > return padata; > > I strongly suspect that the problem here is that two threads can race > on reorder list. Even though the deletion is locked, call to > list_entry is not locked, which means it's feasible that two threads > pick up the same padata object and subsequently call list_add_tail on > them at the same time. The fix would thus be to hoist that lock > outside of that block. Yes, looks like we should lock the whole list handling block here. Thanks!