* [PATCH 0/2] Fix mmap bugs with sysfs_remove_bin_file
@ 2010-09-20 7:55 Eric W. Biederman
2010-09-20 7:56 ` [PATCH 1/2] sysfs: Fail bin file mmap if vma close is implemented Eric W. Biederman
2010-09-20 7:57 ` [PATCH 2/2] sysfs: only access bin file vm_ops with the active lock Eric W. Biederman
0 siblings, 2 replies; 7+ messages in thread
From: Eric W. Biederman @ 2010-09-20 7:55 UTC (permalink / raw)
To: Greg Kroah-Hartman; +Cc: linux-kernel, Tejun Heo, Hugh Dickins
While reviewing the sysfs mmap code in fs/sysfs/bin.c I found two
rather nasty potential issues.
The first is if one of our wrapped mmap implementations implements
vma->close() we do not call it at sysfs_remove_bin_file time leading
to who knows what carnage.
The second is that we are potentially accessing the wrapped vm_ops
after sysfs_remove_bin_file has completed. Which could be a problem
if it is a modular user.
I don't know of any real world instances of problems. None of the
bin attribute mmaps functions that I know of today implement a close
method. However it seems prudent to fix these now before we have
track down some mysterious weird failure with hotunplug.
Eric W. Biederman (2):
sysfs: Fail bin file mmap if vma close is implemented.
sysfs: only access bin file vm_ops with the active lock
---
fs/sysfs/bin.c | 68 ++++++++++++++++++++++++++++----------------------------
1 files changed, 34 insertions(+), 34 deletions(-)
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] sysfs: Fail bin file mmap if vma close is implemented.
2010-09-20 7:55 [PATCH 0/2] Fix mmap bugs with sysfs_remove_bin_file Eric W. Biederman
@ 2010-09-20 7:56 ` Eric W. Biederman
2010-09-22 19:31 ` Hugh Dickins
2010-09-20 7:57 ` [PATCH 2/2] sysfs: only access bin file vm_ops with the active lock Eric W. Biederman
1 sibling, 1 reply; 7+ messages in thread
From: Eric W. Biederman @ 2010-09-20 7:56 UTC (permalink / raw)
To: Greg Kroah-Hartman; +Cc: linux-kernel, Tejun Heo, Hugh Dickins
It is not reasonably possible to wrap vma->close(). To correctly
wrap close would imply calling close on any vmas that remain when
sysfs_remove_bin_file is called. Finding the proper lists walking
them getting the locking right etc, requires deep knowledge of the
mm subsystem and as such would require assistence from the mm
subsystem to implement. That assistence does not currently exist.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
---
fs/sysfs/bin.c | 26 ++++++++------------------
1 files changed, 8 insertions(+), 18 deletions(-)
diff --git a/fs/sysfs/bin.c b/fs/sysfs/bin.c
index 4e321f7..d31d7b7 100644
--- a/fs/sysfs/bin.c
+++ b/fs/sysfs/bin.c
@@ -190,23 +190,6 @@ static void bin_vma_open(struct vm_area_struct *vma)
sysfs_put_active(attr_sd);
}
-static void bin_vma_close(struct vm_area_struct *vma)
-{
- struct file *file = vma->vm_file;
- struct bin_buffer *bb = file->private_data;
- struct sysfs_dirent *attr_sd = file->f_path.dentry->d_fsdata;
-
- if (!bb->vm_ops || !bb->vm_ops->close)
- return;
-
- if (!sysfs_get_active(attr_sd))
- return;
-
- bb->vm_ops->close(vma);
-
- sysfs_put_active(attr_sd);
-}
-
static int bin_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
{
struct file *file = vma->vm_file;
@@ -331,7 +314,6 @@ static int bin_migrate(struct vm_area_struct *vma, const nodemask_t *from,
static const struct vm_operations_struct bin_vm_ops = {
.open = bin_vma_open,
- .close = bin_vma_close,
.fault = bin_fault,
.page_mkwrite = bin_page_mkwrite,
.access = bin_access,
@@ -377,6 +359,14 @@ static int mmap(struct file *file, struct vm_area_struct *vma)
if (bb->mmapped && bb->vm_ops != vma->vm_ops)
goto out_put;
+ /*
+ * It is not possible to successfully wrap close.
+ * So error if someone is trying to use close.
+ */
+ rc = -EINVAL;
+ if (vma->vm_ops && vma->vm_ops->close)
+ goto out_put;
+
rc = 0;
bb->mmapped = 1;
bb->vm_ops = vma->vm_ops;
--
1.7.2.2
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/2] sysfs: only access bin file vm_ops with the active lock
2010-09-20 7:55 [PATCH 0/2] Fix mmap bugs with sysfs_remove_bin_file Eric W. Biederman
2010-09-20 7:56 ` [PATCH 1/2] sysfs: Fail bin file mmap if vma close is implemented Eric W. Biederman
@ 2010-09-20 7:57 ` Eric W. Biederman
2010-09-22 19:49 ` Hugh Dickins
1 sibling, 1 reply; 7+ messages in thread
From: Eric W. Biederman @ 2010-09-20 7:57 UTC (permalink / raw)
To: Greg Kroah-Hartman; +Cc: linux-kernel, Tejun Heo, Hugh Dickins
bb->vm_ops is a cached copy of the vm_ops of the underlying
sysfs bin file, which means that after sysfs_bin_remove_file
completes it is only longer valid to deference bb->vm_ops.
So move all of the tests of bb->vm_ops inside of where
we hold the sysfs active lock.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
---
fs/sysfs/bin.c | 42 ++++++++++++++++++++++++++----------------
1 files changed, 26 insertions(+), 16 deletions(-)
diff --git a/fs/sysfs/bin.c b/fs/sysfs/bin.c
index d31d7b7..a475983 100644
--- a/fs/sysfs/bin.c
+++ b/fs/sysfs/bin.c
@@ -179,13 +179,14 @@ static void bin_vma_open(struct vm_area_struct *vma)
struct bin_buffer *bb = file->private_data;
struct sysfs_dirent *attr_sd = file->f_path.dentry->d_fsdata;
- if (!bb->vm_ops || !bb->vm_ops->open)
+ if (!bb->vm_ops)
return;
if (!sysfs_get_active(attr_sd))
return;
- bb->vm_ops->open(vma);
+ if (bb->vm_ops->open)
+ bb->vm_ops->open(vma);
sysfs_put_active(attr_sd);
}
@@ -197,13 +198,15 @@ static int bin_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
struct sysfs_dirent *attr_sd = file->f_path.dentry->d_fsdata;
int ret;
- if (!bb->vm_ops || !bb->vm_ops->fault)
+ if (!bb->vm_ops)
return VM_FAULT_SIGBUS;
if (!sysfs_get_active(attr_sd))
return VM_FAULT_SIGBUS;
- ret = bb->vm_ops->fault(vma, vmf);
+ ret = VM_FAULT_SIGBUS;
+ if (bb->vm_ops->fault)
+ ret = bb->vm_ops->fault(vma, vmf);
sysfs_put_active(attr_sd);
return ret;
@@ -219,13 +222,12 @@ static int bin_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
if (!bb->vm_ops)
return VM_FAULT_SIGBUS;
- if (!bb->vm_ops->page_mkwrite)
- return 0;
-
if (!sysfs_get_active(attr_sd))
return VM_FAULT_SIGBUS;
- ret = bb->vm_ops->page_mkwrite(vma, vmf);
+ ret = 0;
+ if (bb->vm_ops->page_mkwrite)
+ ret = bb->vm_ops->page_mkwrite(vma, vmf);
sysfs_put_active(attr_sd);
return ret;
@@ -239,13 +241,15 @@ static int bin_access(struct vm_area_struct *vma, unsigned long addr,
struct sysfs_dirent *attr_sd = file->f_path.dentry->d_fsdata;
int ret;
- if (!bb->vm_ops || !bb->vm_ops->access)
+ if (!bb->vm_ops)
return -EINVAL;
if (!sysfs_get_active(attr_sd))
return -EINVAL;
- ret = bb->vm_ops->access(vma, addr, buf, len, write);
+ ret = -EINVAL;
+ if (bb->vm_ops->access)
+ ret = bb->vm_ops->access(vma, addr, buf, len, write);
sysfs_put_active(attr_sd);
return ret;
@@ -259,13 +263,15 @@ static int bin_set_policy(struct vm_area_struct *vma, struct mempolicy *new)
struct sysfs_dirent *attr_sd = file->f_path.dentry->d_fsdata;
int ret;
- if (!bb->vm_ops || !bb->vm_ops->set_policy)
+ if (!bb->vm_ops)
return 0;
if (!sysfs_get_active(attr_sd))
return -EINVAL;
- ret = bb->vm_ops->set_policy(vma, new);
+ ret = 0;
+ if (bb->vm_ops->set_policy)
+ ret = bb->vm_ops->set_policy(vma, new);
sysfs_put_active(attr_sd);
return ret;
@@ -279,13 +285,15 @@ static struct mempolicy *bin_get_policy(struct vm_area_struct *vma,
struct sysfs_dirent *attr_sd = file->f_path.dentry->d_fsdata;
struct mempolicy *pol;
- if (!bb->vm_ops || !bb->vm_ops->get_policy)
+ if (!bb->vm_ops)
return vma->vm_policy;
if (!sysfs_get_active(attr_sd))
return vma->vm_policy;
- pol = bb->vm_ops->get_policy(vma, addr);
+ pol = vma->vm_policy;
+ if (bb->vm_ops->get_policy)
+ pol = bb->vm_ops->get_policy(vma, addr);
sysfs_put_active(attr_sd);
return pol;
@@ -299,13 +307,15 @@ static int bin_migrate(struct vm_area_struct *vma, const nodemask_t *from,
struct sysfs_dirent *attr_sd = file->f_path.dentry->d_fsdata;
int ret;
- if (!bb->vm_ops || !bb->vm_ops->migrate)
+ if (!bb->vm_ops)
return 0;
if (!sysfs_get_active(attr_sd))
return 0;
- ret = bb->vm_ops->migrate(vma, from, to, flags);
+ ret = 0;
+ if (bb->vm_ops->migrate)
+ ret = bb->vm_ops->migrate(vma, from, to, flags);
sysfs_put_active(attr_sd);
return ret;
--
1.7.2.2
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] sysfs: Fail bin file mmap if vma close is implemented.
2010-09-20 7:56 ` [PATCH 1/2] sysfs: Fail bin file mmap if vma close is implemented Eric W. Biederman
@ 2010-09-22 19:31 ` Hugh Dickins
2010-09-22 21:01 ` Eric W. Biederman
0 siblings, 1 reply; 7+ messages in thread
From: Hugh Dickins @ 2010-09-22 19:31 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Greg Kroah-Hartman, linux-kernel, Tejun Heo
On Mon, 20 Sep 2010, Eric W. Biederman wrote:
>
> It is not reasonably possible to wrap vma->close(). To correctly
> wrap close would imply calling close on any vmas that remain when
> sysfs_remove_bin_file is called. Finding the proper lists walking
> them getting the locking right etc, requires deep knowledge of the
> mm subsystem and as such would require assistence from the mm
> subsystem to implement. That assistence does not currently exist.
Isn't it fair to assume that the call to sysfs_remove_bin_file() would
come from the module (or whatever) that would be handling the ->mmap,
->open, ->close, ->fault etc. calls?
In which case, it has been kept up to date with the reference counting
so far: and if it judges that it's apppropriate now to remove bin file,
despite having received more mmaps and opens than closes, then I think
it should be allowed to make that decision (knowing that your sysfs end
will take care not to call it further), even though it wanted to support
close while the sysfs bin file was still there.
I think it would be nicer to leave the bin_vma_close() handling as you
had it before (but repositioning its sysfs_get_active more carefully,
as in your 2/2).
Hugh
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] sysfs: only access bin file vm_ops with the active lock
2010-09-20 7:57 ` [PATCH 2/2] sysfs: only access bin file vm_ops with the active lock Eric W. Biederman
@ 2010-09-22 19:49 ` Hugh Dickins
2010-09-22 21:59 ` Eric W. Biederman
0 siblings, 1 reply; 7+ messages in thread
From: Hugh Dickins @ 2010-09-22 19:49 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Greg Kroah-Hartman, linux-kernel, Tejun Heo
On Mon, 20 Sep 2010, Eric W. Biederman wrote:
>
> bb->vm_ops is a cached copy of the vm_ops of the underlying
> sysfs bin file, which means that after sysfs_bin_remove_file
> completes it is only longer valid to deference bb->vm_ops.
>
> So move all of the tests of bb->vm_ops inside of where
> we hold the sysfs active lock.
>
> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Good point. There's a couple of cases (the ones below) where you've
changed the return value from success to failure when the entry point
didn't exist for an object that is no longer active; but on reflection,
I accept that's just not worth worrying about.
Hugh
> @@ -219,13 +222,12 @@ static int bin_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
> if (!bb->vm_ops)
> return VM_FAULT_SIGBUS;
>
> - if (!bb->vm_ops->page_mkwrite)
> - return 0;
> -
> if (!sysfs_get_active(attr_sd))
> return VM_FAULT_SIGBUS;
>
> - ret = bb->vm_ops->page_mkwrite(vma, vmf);
> + ret = 0;
> + if (bb->vm_ops->page_mkwrite)
> + ret = bb->vm_ops->page_mkwrite(vma, vmf);
>
> sysfs_put_active(attr_sd);
> return ret;
> @@ -259,13 +263,15 @@ static int bin_set_policy(struct vm_area_struct *vma, struct mempolicy *new)
> struct sysfs_dirent *attr_sd = file->f_path.dentry->d_fsdata;
> int ret;
>
> - if (!bb->vm_ops || !bb->vm_ops->set_policy)
> + if (!bb->vm_ops)
> return 0;
>
> if (!sysfs_get_active(attr_sd))
> return -EINVAL;
>
> - ret = bb->vm_ops->set_policy(vma, new);
> + ret = 0;
> + if (bb->vm_ops->set_policy)
> + ret = bb->vm_ops->set_policy(vma, new);
>
> sysfs_put_active(attr_sd);
> return ret;
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] sysfs: Fail bin file mmap if vma close is implemented.
2010-09-22 19:31 ` Hugh Dickins
@ 2010-09-22 21:01 ` Eric W. Biederman
0 siblings, 0 replies; 7+ messages in thread
From: Eric W. Biederman @ 2010-09-22 21:01 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Greg Kroah-Hartman, linux-kernel, Tejun Heo
Hugh Dickins <hughd@google.com> writes:
> On Mon, 20 Sep 2010, Eric W. Biederman wrote:
>>
>> It is not reasonably possible to wrap vma->close(). To correctly
>> wrap close would imply calling close on any vmas that remain when
>> sysfs_remove_bin_file is called. Finding the proper lists walking
>> them getting the locking right etc, requires deep knowledge of the
>> mm subsystem and as such would require assistence from the mm
>> subsystem to implement. That assistence does not currently exist.
>
> Isn't it fair to assume that the call to sysfs_remove_bin_file() would
> come from the module (or whatever) that would be handling the ->mmap,
> ->open, ->close, ->fault etc. calls?
It is.
> In which case, it has been kept up to date with the reference counting
> so far: and if it judges that it's apppropriate now to remove bin file,
> despite having received more mmaps and opens than closes, then I think
> it should be allowed to make that decision (knowing that your sysfs end
> will take care not to call it further), even though it wanted to support
> close while the sysfs bin file was still there.
sysfs is that reference counting layer and removes the burden of that
decision from the drivers.
There are two cases this can happen that are interesting: on rmmod of
the driver, and when the underlying hardware for the driver is removed,
thing usb or pci hotplug. In those cases the driver really doesn't have
much choice but to clean as best it can, blocking and waiting for
references counts to go to zero is not a good option.
> I think it would be nicer to leave the bin_vma_close() handling as you
> had it before (but repositioning its sysfs_get_active more carefully,
> as in your 2/2).
I definitely agree that allowing an implementing that implements close
would definitely be a good thing. Especially as 44 out of 99
vm_operations_structs implement close. To implement close properly
requires finding all of the vmas and calling the underlying close
method, and for that I need support from the mm layer which I don't yet
have. Close is typically used to either free per vma state, or to
decrement a reference count that open creates, so having mismatched
open and closes really is a problem.
In practice there are only two files that implement the mmap method
of sysfs bin files.
drivers/pci/pci-sysfs.c
arch/alpha/kernel/pci-sysfs.c
Which in their own way implement mmap for the pci memory mapped resource
bars of given platform. Each arch implements
pci_mmap_legacy_page_range and pci_mmap_resource in their own way. The
few I have traced come down to io_remap_pfn_range which boils down to
remap_pfn_range. remap_pfn_range doesn't even install
vm_operations_struct so in practice none of the sysfs vma methods get
exercised.
I complain in mmap in case some arch or some implementation is doing
something strange that I missed, because I really do want to be as
general as possible.
I do have a generic version of this brewing that I hope to replace the
sysfs version with that modifies the mm layer with the support I need,
and I will copy you on that when I post it. sysfs is not quite perfect
but proc is actively broken in this area, and I don't know how many
other places in the kernel need fixes.
Eric
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] sysfs: only access bin file vm_ops with the active lock
2010-09-22 19:49 ` Hugh Dickins
@ 2010-09-22 21:59 ` Eric W. Biederman
0 siblings, 0 replies; 7+ messages in thread
From: Eric W. Biederman @ 2010-09-22 21:59 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Greg Kroah-Hartman, linux-kernel, Tejun Heo
Hugh Dickins <hughd@google.com> writes:
> On Mon, 20 Sep 2010, Eric W. Biederman wrote:
>>
>> bb->vm_ops is a cached copy of the vm_ops of the underlying
>> sysfs bin file, which means that after sysfs_bin_remove_file
>> completes it is only longer valid to deference bb->vm_ops.
>>
>> So move all of the tests of bb->vm_ops inside of where
>> we hold the sysfs active lock.
>>
>> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
>
> Good point. There's a couple of cases (the ones below) where you've
> changed the return value from success to failure when the entry point
> didn't exist for an object that is no longer active; but on reflection,
> I accept that's just not worth worrying about.
Good spotting and thanks for the review.
Thinking about this it looks like bin_set_policy should only ever
error if the underlying function errors, and that will clear up
one inconsistency.
As for bin_page_mkwrite if sysfs_get_active fails there is no mapping
because we have called unmap_mapping_range to zap everything, so
page_mkwrite should simply not be called. However there is the
interesting case what should page_mkwrite return if there are no
vm_ops at all.
I expect the right choice for page_mkwrite it to simply have to
different sets of vm_ops and only write of page_mkwrite in one
of them and to simply call VM_FAULT_NOPAGE on error. Implementing
page_mkwrite when it isn't needed appears to have all kinds of
interesting side effects where we do a lot more work than we other
wise would. Lock dropping and regrabbing, as well as some extra
dirty page balancing logic gets triggered.
I will see if I can cook up a patch to that effect.
Eric
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2010-09-22 21:59 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-20 7:55 [PATCH 0/2] Fix mmap bugs with sysfs_remove_bin_file Eric W. Biederman
2010-09-20 7:56 ` [PATCH 1/2] sysfs: Fail bin file mmap if vma close is implemented Eric W. Biederman
2010-09-22 19:31 ` Hugh Dickins
2010-09-22 21:01 ` Eric W. Biederman
2010-09-20 7:57 ` [PATCH 2/2] sysfs: only access bin file vm_ops with the active lock Eric W. Biederman
2010-09-22 19:49 ` Hugh Dickins
2010-09-22 21:59 ` Eric W. Biederman
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.