linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Frederic Barrat <fbarrat@linux.ibm.com>
To: Greg Kurz <groug@kaod.org>
Cc: clombard@linux.ibm.com, linuxppc-dev@lists.ozlabs.org,
	alastair@au1.ibm.com, andrew.donnellan@au1.ibm.com
Subject: Re: [PATCH] ocxl: Fix concurrent AFU open and device removal
Date: Mon, 24 Jun 2019 17:39:26 +0200	[thread overview]
Message-ID: <ea1295fe-d8ad-1e5f-54f7-a72a7149c5b7@linux.ibm.com> (raw)
In-Reply-To: <20190624172452.7e217596@bahia.lan>



Le 24/06/2019 à 17:24, Greg Kurz a écrit :
> On Mon, 24 Jun 2019 16:41:48 +0200
> Frederic Barrat <fbarrat@linux.ibm.com> wrote:
> 
>> If an ocxl device is unbound through sysfs at the same time its AFU is
>> being opened by a user process, the open code may dereference freed
>> stuctures, which can lead to kernel oops messages. You'd have to hit a
>> tiny time window, but it's possible. It's fairly easy to test by
>> making the time window bigger artificially.
>>
>> Fix it with a combination of 2 changes:
>> - when an AFU device is found in the IDR by looking for the device
>> minor number, we should hold a reference on the device until after the
>> context is allocated. A reference on the AFU structure is kept when
>> the context is allocated, so we can release the reference on the
>> device after the context allocation.
>> - with the fix above, there's still another even tinier window,
>> between the time the AFU device is found in the IDR and the reference
>> on the device is taken. We can fix this one by removing the IDR entry
>> earlier, when the device setup is removed, instead of waiting for the
>> 'release' device callback. With proper locking around the IDR.
>>
>> Fixes: 75ca758adbaf ("ocxl: Create a clear delineation between ocxl backend & frontend")
>> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
>> ---
>> mpe: this fixes a commit merged in v5.2-rc1. It's late, and I don't think it's that important. If it's for the next merge window, I would add:
>> Cc: stable@vger.kernel.org      # v5.2
>>
>>
>> drivers/misc/ocxl/file.c | 23 +++++++++++------------
>>   1 file changed, 11 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/misc/ocxl/file.c b/drivers/misc/ocxl/file.c
>> index 2870c25da166..4d1b44de1492 100644
>> --- a/drivers/misc/ocxl/file.c
>> +++ b/drivers/misc/ocxl/file.c
>> @@ -18,18 +18,15 @@ static struct class *ocxl_class;
>>   static struct mutex minors_idr_lock;
>>   static struct idr minors_idr;
>>   
>> -static struct ocxl_file_info *find_file_info(dev_t devno)
>> +static struct ocxl_file_info *find_and_get_file_info(dev_t devno)
>>   {
>>   	struct ocxl_file_info *info;
>>   
>> -	/*
>> -	 * We don't declare an RCU critical section here, as our AFU
>> -	 * is protected by a reference counter on the device. By the time the
>> -	 * info reference is removed from the idr, the ref count of
>> -	 * the device is already at 0, so no user API will access that AFU and
>> -	 * this function can't return it.
>> -	 */
>> +	mutex_lock(&minors_idr_lock);
>>   	info = idr_find(&minors_idr, MINOR(devno));
>> +	if (info)
>> +		get_device(&info->dev);
>> +	mutex_unlock(&minors_idr_lock);
>>   	return info;
>>   }
>>   
>> @@ -58,14 +55,16 @@ static int afu_open(struct inode *inode, struct file *file)
>>   
>>   	pr_debug("%s for device %x\n", __func__, inode->i_rdev);
>>   
>> -	info = find_file_info(inode->i_rdev);
>> +	info = find_and_get_file_info(inode->i_rdev);
>>   	if (!info)
>>   		return -ENODEV;
>>   
>>   	rc = ocxl_context_alloc(&ctx, info->afu, inode->i_mapping);
>> -	if (rc)
>> +	if (rc) {
>> +		put_device(&info->dev);
> 
> You could have a single call site for put_device() since it's
> needed for both branches. No big deal.


Agreed. Will fix if I end up respinning, but won't if it's the only 
complaint :-)



>>   		return rc;
>> -
>> +	}
>> +	put_device(&info->dev);
>>   	file->private_data = ctx;
>>   	return 0;
>>   }
>> @@ -487,7 +486,6 @@ static void info_release(struct device *dev)
>>   {
>>   	struct ocxl_file_info *info = container_of(dev, struct ocxl_file_info, dev);
>>   
>> -	free_minor(info);
>>   	ocxl_afu_put(info->afu);
>>   	kfree(info);
>>   }
>> @@ -577,6 +575,7 @@ void ocxl_file_unregister_afu(struct ocxl_afu *afu)
>>   
>>   	ocxl_file_make_invisible(info);
>>   	ocxl_sysfs_unregister_afu(info);
>> +	free_minor(info);
> 
> Since the IDR entry is added by ocxl_file_register_afu(), it seems to make
> sense to undo that in ocxl_file_unregister_afu(). Out of curiosity, was there
> any historical reason to do this in info_release() in the first place ?


Yeah, it makes a lot of sense to remove the IDR entry in 
ocxl_file_unregister_afu(), that's where we undo the device. I wish I 
had noticed during the code reviews.
I don't think there was any good reason to have it in info_release() in 
the first place. I remember the code went through many iterations to get 
the reference counting on the AFU structure and device done correctly, 
but we let that one slip.

I now think the pre-5.2 ocxl code was also exposed to the 2nd window 
mentioned in the commit log (but the first window is new with the 
refactoring introduced in 5.2-rc1).

   Fred



> 
> Reviewed-by: Greg Kurz <groug@kaod.org>
> 
>>   	device_unregister(&info->dev);
>>   }
>>   
> 


  reply	other threads:[~2019-06-24 15:41 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-24 14:41 [PATCH] ocxl: Fix concurrent AFU open and device removal Frederic Barrat
2019-06-24 15:24 ` Greg Kurz
2019-06-24 15:39   ` Frederic Barrat [this message]
2019-06-24 15:50     ` Greg Kurz
2019-06-25  8:22       ` Frederic Barrat
2019-12-13 21:19 ` Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ea1295fe-d8ad-1e5f-54f7-a72a7149c5b7@linux.ibm.com \
    --to=fbarrat@linux.ibm.com \
    --cc=alastair@au1.ibm.com \
    --cc=andrew.donnellan@au1.ibm.com \
    --cc=clombard@linux.ibm.com \
    --cc=groug@kaod.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).