kernel-janitors.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: walter harms <wharms@bfs.de>
To: kernel-janitors@vger.kernel.org
Subject: Re: [patch -next] eventfd: type bug in eventfd_poll()
Date: Fri, 30 Jan 2015 08:28:44 +0000	[thread overview]
Message-ID: <54CB40BC.9050407@bfs.de> (raw)
In-Reply-To: <20150119193319.GA32634@mwanda>



Am 29.01.2015 19:56, schrieb Chris Mason:
> On Mon, Jan 19, 2015 at 10:33:19PM +0300, Dan Carpenter wrote:
>> Since "count" is an unsigned int, then these conditions are never true:
>>
>>         if (count = ULLONG_MAX)
>>                 events |= POLLERR;
>>         if (ULLONG_MAX - 1 > count)
>>                 events |= POLLOUT;
>>
>> It should be a u64, because that's what ctx->count is.  Also GCC
>> complains that "flags" is unused.
>>
>> Fixes: a90de8a54127 ('eventfd: don't take the spinlock in eventfd_poll')
>> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
>>
>> diff --git a/fs/eventfd.c b/fs/eventfd.c
>> index 439e6f0..8d0c0df 100644
>> --- a/fs/eventfd.c
>> +++ b/fs/eventfd.c
>> @@ -118,8 +118,7 @@ static unsigned int eventfd_poll(struct file *file, poll_table *wait)
>>  {
>>  	struct eventfd_ctx *ctx = file->private_data;
>>  	unsigned int events = 0;
>> -	unsigned long flags;
>> -	unsigned int count;
>> +	u64 count;
>>  
>>  	poll_wait(file, &ctx->wqh, wait);
>>  	smp_rmb();
> 
> Andrew, not sure if you want to take Dan's incremental or a new v2.  I
> ran this one through the ltp poll and event fd tests.
> 
> From: Chris Mason <clm@fb.com>
> Date: Wed, 28 Jan 2015 10:15:58 -0800
> Subject: [PATCH v2] eventfd: don't take the spinlock in eventfd_poll
> 
> The spinlock in eventfd_poll is trying to protect the count of events
> so it can decide if it should return POLLIN, POLLERR, or POLLOUT.  But,
> because of the way we drop the lock after calling poll_wait, and drop it
> again before returning, we have the same pile of races with the lock as
> we do with a single read of ctx->count().
> 
> This replaces the lock with a read barrier and single read.
> 
> eventfd_write does a single bump of ctx->count, so this should not add
> new races with adding events.  eventfd_read is similar, it will do a
> single decrement with the lock held, and so we're making the race with
> concurrent readers slightly larger.
> 
> This spinlock is the top CPU user in kernel code during one of our workloads.
> Removing it gives us a ~2% boost.
> 
> Signed-off-by: Chris Mason <clm@fb.com>
> Fixed-by: Dan Carpenter <dan.carpenter@oracle.com>
> ---
>  fs/eventfd.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
>  v1->v2 use a u64 for count and get rid of unused flags (Dan Carpenter)
> 
> diff --git a/fs/eventfd.c b/fs/eventfd.c
> index 4b0a226..8d0c0df 100644
> --- a/fs/eventfd.c
> +++ b/fs/eventfd.c
> @@ -118,18 +118,18 @@ static unsigned int eventfd_poll(struct file *file, poll_table *wait)
>  {
>  	struct eventfd_ctx *ctx = file->private_data;
>  	unsigned int events = 0;
> -	unsigned long flags;
> +	u64 count;
>  
>  	poll_wait(file, &ctx->wqh, wait);
> +	smp_rmb();
> +	count = ctx->count;
>  
> -	spin_lock_irqsave(&ctx->wqh.lock, flags);
> -	if (ctx->count > 0)
> +	if (count > 0)
>  		events |= POLLIN;
> -	if (ctx->count = ULLONG_MAX)
> +	if (count = ULLONG_MAX)
>  		events |= POLLERR;
> -	if (ULLONG_MAX - 1 > ctx->count)
> +	if (ULLONG_MAX - 1 > count)
>  		events |= POLLOUT;
> -	spin_unlock_irqrestore(&ctx->wqh.lock, flags);
>  
>  	return events;
>  }

Hi Chris,
i was thinking about the code here:

if (count > 0)
       events |= POLLIN;
if (count = ULLONG_MAX)
	events |= POLLERR;
if (count < ULLONG_MAX)
	events |= POLLOUT;

My understanding is that it would set POLLIN|POLLOUT in most cases.
Is that intended ?

re,
 wh

      parent reply	other threads:[~2015-01-30  8:28 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-19 19:33 [patch -next] eventfd: type bug in eventfd_poll() Dan Carpenter
2015-01-19 19:41 ` Chris Mason
2015-01-29 18:56 ` Chris Mason
2015-01-30  8:28 ` walter harms [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54CB40BC.9050407@bfs.de \
    --to=wharms@bfs.de \
    --cc=kernel-janitors@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).