public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Dan Williams <dan.j.williams@intel.com>,
	Arnd Bergmann <arnd@arndb.de>, Dave Chinner <david@fromorbit.com>
Cc: linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-mm@kvack.org, linux-arch@vger.kernel.org,
	Vishal Verma <vishal.l.verma@intel.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Matthew Wilcox <willy@infradead.org>,
	Russell King <linux@armlinux.org.uk>,
	nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, dm-devel@lists.linux.dev
Subject: Re: [RFC PATCH v3 2/4] dax: Check for data cache aliasing at runtime
Date: Fri, 2 Feb 2024 14:29:05 -0500	[thread overview]
Message-ID: <6bdf6085-101d-47ef-86f4-87936622345a@efficios.com> (raw)
In-Reply-To: <65bd284165177_2d43c29443@dwillia2-mobl3.amr.corp.intel.com.notmuch>

On 2024-02-02 12:37, Dan Williams wrote:
> Mathieu Desnoyers wrote:
[...]
>>
> 
>> The alternative route I intend to take is to audit all callers
>> of alloc_dax() and make sure they all save the alloc_dax() return
>> value in a struct dax_device * local variable first for the sake
>> of checking for IS_ERR(). This will leave the xyz->dax_dev pointer
>> initialized to NULL in the error case and simplify the rest of
>> error checking.
> 
> I could maybe get on board with that, but it needs a comment somewhere
> about the asymmetric subtlety.

Is this "somewhere" at every alloc_dax() call site, or do you have
something else in mind ?

> 
>>
>>
>>>    		return;
>>>    
>>>    	if (dax_dev->holder_data != NULL)
>>> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
>>> index 4e8fdcb3f1c8..b69c9e442cf4 100644
>>> --- a/drivers/nvdimm/pmem.c
>>> +++ b/drivers/nvdimm/pmem.c
>>> @@ -560,17 +560,19 @@ static int pmem_attach_disk(struct device *dev,
>>>    	dax_dev = alloc_dax(pmem, &pmem_dax_ops);
>>>    	if (IS_ERR(dax_dev)) {
>>>    		rc = PTR_ERR(dax_dev);
>>> -		goto out;
>>> +		if (rc != -EOPNOTSUPP)
>>> +			goto out;
>>
>> If I compare the before / after this change, if previously
>> pmem_attach_disk() was called in a configuration with FS_DAX=n, it would
>> result in a NULL pointer dereference.
> 
> No, alloc_dax() only returns NULL CONFIG_DAX=n case, not the
> CONFIG_FS_DAX=n case.

Indeed, I was wrong there.

> So that means that pmem devices on ARM have been
> possible without FS_DAX. So, in order for alloc_dax() returning
> ERR_PTR(-EOPNOTSUPP) to not regress pmem device availability this error
> path needs to be changed.
Good point. We're moving the depends on !(ARM || MIPS |PARC) from FS_DAX
Kconfig to a runtime check in alloc_dax(), which is used whenever DAX=y,
which includes configurations that had FS_DAX=n previously.

I'll change the error path in pmem_attack_disk to treat -EOPNOTSUPP
alloc_dax() return value as non-fatal.

> 
>> This would be an error handling fix all by itself. Do we really want
>> to return successfully if dax is unsupported, or should we return
>> an error here ?
> 
> Per above, there is no error handling fix, and pmem block device
> available should not depend on alloc_dax() succeeding.

I agree on treating alloc_dax() failure as non-fatal. There is
however one error handling fix to nvdimm/pmem which I plan to
introduce as an initial patch before this change:

     nvdimm/pmem: Fix leak on dax_add_host() failure
     
     Fix a leak on dax_add_host() error, where "goto out_cleanup_dax" is done
     before setting pmem->dax_dev, which therefore issues the two following
     calls on NULL pointers:
     
     out_cleanup_dax:
             kill_dax(pmem->dax_dev);
             put_dax(pmem->dax_dev);
     
     Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 4e8fdcb3f1c8..9fe358090720 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -566,12 +566,11 @@ static int pmem_attach_disk(struct device *dev,
  	set_dax_nomc(dax_dev);
  	if (is_nvdimm_sync(nd_region))
  		set_dax_synchronous(dax_dev);
+	pmem->dax_dev = dax_dev;
  	rc = dax_add_host(dax_dev, disk);
  	if (rc)
  		goto out_cleanup_dax;
  	dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
-	pmem->dax_dev = dax_dev;
-
  	rc = device_add_disk(dev, disk, pmem_attribute_groups);
  	if (rc)
  		goto out_remove_host;

> 
> The real question is what to do about device-dax. I *think* it is not
> affected by cpu_dcache aliasing because it never accesses user mappings
> through a kernel alias. I doubt device-dax is in use on these platforms,
> but we might need another fixup for that if someone screams about the
> alloc_dax() behavior change making them lose device-dax access.

By "device-dax", I understand you mean drivers/dax/Kconfig:DEV_DAX.

Based on your analysis, is alloc_dax() still the right spot where
to place this runtime check ? Which call sites are responsible
for invoking alloc_dax() for device-dax ?

If we know which call sites do not intend to use the kernel linear
mapping, we could introduce a flag (or a new variant of the alloc_dax()
API) that would either enforce or skip the check.

[...]

>>
>> Here what I'm seeing so far:
>>
>> - devm_release_mem_region() is never called after devm_request_mem_region(). Not
>>     on error, neither on teardown,
> 
> devm_release_mem_region() is called from virtio_fs_probe() context. That

I guess you mean "devm_request_mem_region()" here.

> means that when virtio_fs_probe() returns an error the driver core will
> automatically call devm_request_mem_region().

And "devm_release_mem_region()" here.

> 
>> - pgmap is never freed on error after devm_kzalloc.
> 
> That is what the "devm_" in devm_kzalloc() does, free the memory on
> driver-probe failure, or after the driver remove callback is invoked.

Got it.

> 
>>
>>>    {
>>> +	struct dax_device *dax_dev __free(cleanup_dax) = NULL;
>>>    	struct virtio_shm_region cache_reg;
>>>    	struct dev_pagemap *pgmap;
>>>    	bool have_cache;
>>> @@ -804,6 +808,15 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs *fs)
>>>    	if (!IS_ENABLED(CONFIG_FUSE_DAX))
>>>    		return 0;
>>>    
>>> +	dax_dev = alloc_dax(fs, &virtio_fs_dax_ops);
>>> +	if (IS_ERR(dax_dev)) {
>>> +		int rc = PTR_ERR(dax_dev);
>>> +
>>> +		if (rc == -EOPNOTSUPP)
>>> +			return 0;
>>> +		return rc;
>>> +	}
>>
>> What is gained by moving this allocation here ?
> 
> The gain is to fail early in virtio_fs_setup_dax() since the fundamental
> dependency of alloc_dax() success is not met. For example why let the
> setup progress to devm_memremap_pages() when alloc_dax() is going to
> return ERR_PTR(-EOPNOTSUPP).

What I don't know is whether there is a dependency requiring to do
devm_request_mem_region(), devm_kzalloc(), devm_memremap_pages()
before calling alloc_dax() ?

Those 3 calls are used to populate:

         fs->window_phys_addr = (phys_addr_t) cache_reg.addr;
         fs->window_len = (phys_addr_t) cache_reg.len;

and then alloc_dax() takes "fs" as private data parameter. So it's
unclear to me whether we can swap the invocation order. I suspect
that it is not an issue because it is only used to populate
dax_dev->private, but I prefer to confirm this with you just to be
on the safe side.

[...]

Thanks,

Mathieu



-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com



  reply	other threads:[~2024-02-02 19:29 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-31 16:25 [RFC PATCH v3 0/4] Introduce cpu_dcache_is_aliasing() to fix DAX regression Mathieu Desnoyers
2024-01-31 16:25 ` [RFC PATCH v3 1/4] dm: Treat alloc_dax failure as non-fatal Mathieu Desnoyers
2024-01-31 16:25 ` [RFC PATCH v3 2/4] dax: Check for data cache aliasing at runtime Mathieu Desnoyers
2024-01-31 21:02   ` Dan Williams
2024-01-31 21:39     ` Mathieu Desnoyers
2024-01-31 22:18       ` Dan Williams
2024-02-01 15:44         ` Mathieu Desnoyers
2024-02-02 14:40           ` Mathieu Desnoyers
2024-02-02 16:32           ` Mathieu Desnoyers
2024-02-02 17:37           ` Dan Williams
2024-02-02 19:29             ` Mathieu Desnoyers [this message]
2024-02-02 19:41               ` Dan Williams
2024-02-02 20:02                 ` Mathieu Desnoyers
2024-02-02 20:14                   ` Dan Williams
2024-02-02 20:18                     ` Mathieu Desnoyers
2024-01-31 16:25 ` [RFC PATCH v3 3/4] Introduce cpu_dcache_is_aliasing() across all architectures Mathieu Desnoyers
2024-01-31 17:17   ` Christoph Hellwig
2024-01-31 17:59     ` Mathieu Desnoyers
2024-01-31 20:42       ` Dan Williams
2024-01-31 16:25 ` [RFC PATCH v3 4/4] dax: Fix incorrect list of data cache aliasing architectures Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6bdf6085-101d-47ef-86f4-87936622345a@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=david@fromorbit.com \
    --cc=dm-devel@lists.linux.dev \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@armlinux.org.uk \
    --cc=nvdimm@lists.linux.dev \
    --cc=torvalds@linux-foundation.org \
    --cc=vishal.l.verma@intel.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox