From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756195AbcCBWHu (ORCPT ); Wed, 2 Mar 2016 17:07:50 -0500 Received: from mx1.redhat.com ([209.132.183.28]:37661 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755260AbcCBWHs (ORCPT ); Wed, 2 Mar 2016 17:07:48 -0500 Date: Wed, 2 Mar 2016 23:07:46 +0100 From: Andrea Arcangeli To: Paolo Bonzini Cc: linux-kernel@vger.kernel.org, Chris Mason , Andrew Morton , Davide Libenzi Subject: Re: [PATCH] eventfd: document lockless access in eventfd_poll Message-ID: <20160302220746.GF4946@redhat.com> References: <1456956118-7082-1-git-send-email-pbonzini@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1456956118-7082-1-git-send-email-pbonzini@redhat.com> User-Agent: Mutt/1.5.24 (2015-08-30) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Wed, 02 Mar 2016 22:07:48 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 02, 2016 at 11:01:58PM +0100, Paolo Bonzini wrote: > diff --git a/fs/eventfd.c b/fs/eventfd.c > index 8d0c0df01854..dbbbe203f82b 100644 > --- a/fs/eventfd.c > +++ b/fs/eventfd.c > @@ -121,8 +121,46 @@ static unsigned int eventfd_poll(struct file *file, poll_table *wait) > u64 count; > > poll_wait(file, &ctx->wqh, wait); > - smp_rmb(); > - count = ctx->count; > + > + /* > + * All writes to ctx->count occur within ctx->wqh.lock. This read > + * can be done outside ctx->wqh.lock because we know that poll_wait > + * takes that lock (through add_wait_queue) if our caller will sleep. > + * > + * The read _can_ therefore seep into add_wait_queue's critical > + * section, but cannot move above it! add_wait_queue's spin_lock acts > + * as an acquire barrier and ensures that the read be ordered properly > + * against the writes. The following CAN happen and is safe: > + * > + * poll write > + * ----------------- ------------ > + * lock ctx->wqh.lock (in poll_wait) > + * count = ctx->count > + * __add_wait_queue > + * unlock ctx->wqh.lock > + * lock ctx->qwh.lock > + * ctx->count += n > + * if (waitqueue_active) > + * wake_up_locked_poll > + * unlock ctx->qwh.lock > + * eventfd_poll returns 0 > + * > + * but the following, which would miss a wakeup, cannot happen: > + * > + * poll write > + * ----------------- ------------ > + * count = ctx->count (INVALID!) > + * lock ctx->qwh.lock > + * ctx->count += n > + * **waitqueue_active is false** > + * **no wake_up_locked_poll!** > + * unlock ctx->qwh.lock > + * lock ctx->wqh.lock (in poll_wait) > + * __add_wait_queue > + * unlock ctx->wqh.lock > + * eventfd_poll returns 0 > + */ > + count = READ_ONCE(ctx->count); > > if (count > 0) > events |= POLLIN; Reviewed-by: Andrea Arcangeli