[PATCH 1/1] fiemap: move i_op->fiemap() locking to the ioctl

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/1] fiemap: move i_op->fiemap() locking to the ioctl_fiemap()
@ 2012-09-21 14:14 Dmitry Kasatkin
  2012-09-21 14:27 ` Steven Whitehouse
  2012-09-21 22:59 ` Dave Chinner
  0 siblings, 2 replies; 14+ messages in thread
From: Dmitry Kasatkin @ 2012-09-21 14:14 UTC (permalink / raw)
  To: sandeen, viro, swhiteho, tytso, linux-fsdevel,
	linux-security-module, zohar

On some filesystems calling i_op->fiemap() takes the i_mutex,
on others it doesn't. For example, on ext2/ext3 fiemap calls
generic_block_fiemap(), which takes the i_mutex, but on ext4
it doesn't call generic_block_fiemap() for inodes, that have no
extents and does not take the i_mutex in other cases. Other i_op
functions, like setxattr/setattr, rely on the caller to lock the
i_mutex.

This patch moves the i_mutex locking from generic_block_fiemap()
to ioctl_fiemap(), which is currently the only caller of i_op->fiemap().

This change is needed in preparation for EVM to include a hash of
the extent data to be used in the HMAC calculation. EVM is called
when the i_mutex has already been taken.

Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
---
 fs/ext2/inode.c   |    4 ++--
 fs/ext3/inode.c   |    4 ++--
 fs/ext4/extents.c |    4 ++--
 fs/gfs2/inode.c   |    3 ---
 fs/ioctl.c        |    2 ++
 5 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 6363ac6..0e7a0b3 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -760,8 +760,8 @@ int ext2_get_block(struct inode *inode, sector_t iblock, struct buffer_head *bh_
 int ext2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 		u64 start, u64 len)
 {
-	return generic_block_fiemap(inode, fieinfo, start, len,
-				    ext2_get_block);
+	return __generic_block_fiemap(inode, fieinfo, start, len,
+				      ext2_get_block);
 }
 
 static int ext2_writepage(struct page *page, struct writeback_control *wbc)
diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index a075973..0727254 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -1046,8 +1046,8 @@ out:
 int ext3_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 		u64 start, u64 len)
 {
-	return generic_block_fiemap(inode, fieinfo, start, len,
-				    ext3_get_block);
+	return __generic_block_fiemap(inode, fieinfo, start, len,
+				      ext3_get_block);
 }
 
 /*
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index cd0c7ed..e960831 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4916,8 +4916,8 @@ int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 
 	/* fallback to generic here if not in extents fmt */
 	if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)))
-		return generic_block_fiemap(inode, fieinfo, start, len,
-			ext4_get_block);
+		return __generic_block_fiemap(inode, fieinfo, start, len,
+					      ext4_get_block);
 
 	if (fiemap_check_flags(fieinfo, EXT4_FIEMAP_FLAGS))
 		return -EBADR;
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 4ce22e5..bb8d541 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -1775,8 +1775,6 @@ static int gfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 	if (ret)
 		return ret;
 
-	mutex_lock(&inode->i_mutex);
-
 	ret = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED, 0, &gh);
 	if (ret)
 		goto out;
@@ -1802,7 +1800,6 @@ static int gfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 
 	gfs2_glock_dq_uninit(&gh);
 out:
-	mutex_unlock(&inode->i_mutex);
 	return ret;
 }
 
diff --git a/fs/ioctl.c b/fs/ioctl.c
index 29167be..320993f 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -206,7 +206,9 @@ static int ioctl_fiemap(struct file *filp, unsigned long arg)
 	if (fieinfo.fi_flags & FIEMAP_FLAG_SYNC)
 		filemap_write_and_wait(inode->i_mapping);
 
+	mutex_lock(&inode->i_mutex);
 	error = inode->i_op->fiemap(inode, &fieinfo, fiemap.fm_start, len);
+	mutex_unlock(&inode->i_mutex);
 	fiemap.fm_flags = fieinfo.fi_flags;
 	fiemap.fm_mapped_extents = fieinfo.fi_extents_mapped;
 	if (copy_to_user(ufiemap, &fiemap, sizeof(fiemap)))
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] fiemap: move i_op->fiemap() locking to the ioctl_fiemap()
  2012-09-21 14:14 [PATCH 1/1] fiemap: move i_op->fiemap() locking to the ioctl_fiemap() Dmitry Kasatkin
@ 2012-09-21 14:27 ` Steven Whitehouse
  2012-09-21 14:33   ` Kasatkin, Dmitry
  2012-09-21 22:59 ` Dave Chinner
  1 sibling, 1 reply; 14+ messages in thread
From: Steven Whitehouse @ 2012-09-21 14:27 UTC (permalink / raw)
  To: Dmitry Kasatkin
  Cc: sandeen, viro, tytso, linux-fsdevel, linux-security-module, zohar

Hi,

On Fri, 2012-09-21 at 17:14 +0300, Dmitry Kasatkin wrote:
> On some filesystems calling i_op->fiemap() takes the i_mutex,
> on others it doesn't. For example, on ext2/ext3 fiemap calls
> generic_block_fiemap(), which takes the i_mutex, but on ext4
> it doesn't call generic_block_fiemap() for inodes, that have no
> extents and does not take the i_mutex in other cases. Other i_op
> functions, like setxattr/setattr, rely on the caller to lock the
> i_mutex.
> 
> This patch moves the i_mutex locking from generic_block_fiemap()
> to ioctl_fiemap(), which is currently the only caller of i_op->fiemap().
> 
> This change is needed in preparation for EVM to include a hash of
> the extent data to be used in the HMAC calculation. EVM is called
> when the i_mutex has already been taken.
> 
> Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>

I'm just wondering whether or not we need the i_mutex lock to be taken
at all in GFS2.... the reason that it was moved up a level was that we
require i_mutex to be taken before the glock (rule being local locking
before cluster locking).

However, we do hold a shared glock over the same portion of code and
that should be enough to prevent any changes in the on-disk extents for
the inode in question.

Since we have the new truncate sequence, the inode cannot be truncated
outside of the glock any more, so I don't think that i_mutex really buys
us anything there, unless anybody else can spot a good reason for it?

That said, the patch as is looks ok to me, its just that we may not need
i_mutex protection at all now,

Steve.


> ---
>  fs/ext2/inode.c   |    4 ++--
>  fs/ext3/inode.c   |    4 ++--
>  fs/ext4/extents.c |    4 ++--
>  fs/gfs2/inode.c   |    3 ---
>  fs/ioctl.c        |    2 ++
>  5 files changed, 8 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
> index 6363ac6..0e7a0b3 100644
> --- a/fs/ext2/inode.c
> +++ b/fs/ext2/inode.c
> @@ -760,8 +760,8 @@ int ext2_get_block(struct inode *inode, sector_t iblock, struct buffer_head *bh_
>  int ext2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>  		u64 start, u64 len)
>  {
> -	return generic_block_fiemap(inode, fieinfo, start, len,
> -				    ext2_get_block);
> +	return __generic_block_fiemap(inode, fieinfo, start, len,
> +				      ext2_get_block);
>  }
>  
>  static int ext2_writepage(struct page *page, struct writeback_control *wbc)
> diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
> index a075973..0727254 100644
> --- a/fs/ext3/inode.c
> +++ b/fs/ext3/inode.c
> @@ -1046,8 +1046,8 @@ out:
>  int ext3_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>  		u64 start, u64 len)
>  {
> -	return generic_block_fiemap(inode, fieinfo, start, len,
> -				    ext3_get_block);
> +	return __generic_block_fiemap(inode, fieinfo, start, len,
> +				      ext3_get_block);
>  }
>  
>  /*
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index cd0c7ed..e960831 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -4916,8 +4916,8 @@ int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>  
>  	/* fallback to generic here if not in extents fmt */
>  	if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)))
> -		return generic_block_fiemap(inode, fieinfo, start, len,
> -			ext4_get_block);
> +		return __generic_block_fiemap(inode, fieinfo, start, len,
> +					      ext4_get_block);
>  
>  	if (fiemap_check_flags(fieinfo, EXT4_FIEMAP_FLAGS))
>  		return -EBADR;
> diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
> index 4ce22e5..bb8d541 100644
> --- a/fs/gfs2/inode.c
> +++ b/fs/gfs2/inode.c
> @@ -1775,8 +1775,6 @@ static int gfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>  	if (ret)
>  		return ret;
>  
> -	mutex_lock(&inode->i_mutex);
> -
>  	ret = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED, 0, &gh);
>  	if (ret)
>  		goto out;
> @@ -1802,7 +1800,6 @@ static int gfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>  
>  	gfs2_glock_dq_uninit(&gh);
>  out:
> -	mutex_unlock(&inode->i_mutex);
>  	return ret;
>  }
>  
> diff --git a/fs/ioctl.c b/fs/ioctl.c
> index 29167be..320993f 100644
> --- a/fs/ioctl.c
> +++ b/fs/ioctl.c
> @@ -206,7 +206,9 @@ static int ioctl_fiemap(struct file *filp, unsigned long arg)
>  	if (fieinfo.fi_flags & FIEMAP_FLAG_SYNC)
>  		filemap_write_and_wait(inode->i_mapping);
>  
> +	mutex_lock(&inode->i_mutex);
>  	error = inode->i_op->fiemap(inode, &fieinfo, fiemap.fm_start, len);
> +	mutex_unlock(&inode->i_mutex);
>  	fiemap.fm_flags = fieinfo.fi_flags;
>  	fiemap.fm_mapped_extents = fieinfo.fi_extents_mapped;
>  	if (copy_to_user(ufiemap, &fiemap, sizeof(fiemap)))



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] fiemap: move i_op->fiemap() locking to the ioctl_fiemap()
  2012-09-21 14:27 ` Steven Whitehouse
@ 2012-09-21 14:33   ` Kasatkin, Dmitry
  2012-09-21 14:39     ` Steven Whitehouse
  0 siblings, 1 reply; 14+ messages in thread
From: Kasatkin, Dmitry @ 2012-09-21 14:33 UTC (permalink / raw)
  To: Steven Whitehouse
  Cc: sandeen, viro, tytso, linux-fsdevel, linux-security-module, zohar

On Fri, Sep 21, 2012 at 5:27 PM, Steven Whitehouse <swhiteho@redhat.com> wrote:
> Hi,
>
> On Fri, 2012-09-21 at 17:14 +0300, Dmitry Kasatkin wrote:
>> On some filesystems calling i_op->fiemap() takes the i_mutex,
>> on others it doesn't. For example, on ext2/ext3 fiemap calls
>> generic_block_fiemap(), which takes the i_mutex, but on ext4
>> it doesn't call generic_block_fiemap() for inodes, that have no
>> extents and does not take the i_mutex in other cases. Other i_op
>> functions, like setxattr/setattr, rely on the caller to lock the
>> i_mutex.
>>
>> This patch moves the i_mutex locking from generic_block_fiemap()
>> to ioctl_fiemap(), which is currently the only caller of i_op->fiemap().
>>
>> This change is needed in preparation for EVM to include a hash of
>> the extent data to be used in the HMAC calculation. EVM is called
>> when the i_mutex has already been taken.
>>
>> Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
>
> I'm just wondering whether or not we need the i_mutex lock to be taken
> at all in GFS2.... the reason that it was moved up a level was that we
> require i_mutex to be taken before the glock (rule being local locking
> before cluster locking).
>
> However, we do hold a shared glock over the same portion of code and
> that should be enough to prevent any changes in the on-disk extents for
> the inode in question.
>
> Since we have the new truncate sequence, the inode cannot be truncated
> outside of the glock any more, so I don't think that i_mutex really buys
> us anything there, unless anybody else can spot a good reason for it?
>
> That said, the patch as is looks ok to me, its just that we may not need
> i_mutex protection at all now,
>
> Steve.
>

So you say that we might not need to lock i_mutex at all when we call
__generic_block_fiemap?

- Dmitry

>
>> ---
>>  fs/ext2/inode.c   |    4 ++--
>>  fs/ext3/inode.c   |    4 ++--
>>  fs/ext4/extents.c |    4 ++--
>>  fs/gfs2/inode.c   |    3 ---
>>  fs/ioctl.c        |    2 ++
>>  5 files changed, 8 insertions(+), 9 deletions(-)
>>
>> diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
>> index 6363ac6..0e7a0b3 100644
>> --- a/fs/ext2/inode.c
>> +++ b/fs/ext2/inode.c
>> @@ -760,8 +760,8 @@ int ext2_get_block(struct inode *inode, sector_t iblock, struct buffer_head *bh_
>>  int ext2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>               u64 start, u64 len)
>>  {
>> -     return generic_block_fiemap(inode, fieinfo, start, len,
>> -                                 ext2_get_block);
>> +     return __generic_block_fiemap(inode, fieinfo, start, len,
>> +                                   ext2_get_block);
>>  }
>>
>>  static int ext2_writepage(struct page *page, struct writeback_control *wbc)
>> diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
>> index a075973..0727254 100644
>> --- a/fs/ext3/inode.c
>> +++ b/fs/ext3/inode.c
>> @@ -1046,8 +1046,8 @@ out:
>>  int ext3_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>               u64 start, u64 len)
>>  {
>> -     return generic_block_fiemap(inode, fieinfo, start, len,
>> -                                 ext3_get_block);
>> +     return __generic_block_fiemap(inode, fieinfo, start, len,
>> +                                   ext3_get_block);
>>  }
>>
>>  /*
>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>> index cd0c7ed..e960831 100644
>> --- a/fs/ext4/extents.c
>> +++ b/fs/ext4/extents.c
>> @@ -4916,8 +4916,8 @@ int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>
>>       /* fallback to generic here if not in extents fmt */
>>       if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)))
>> -             return generic_block_fiemap(inode, fieinfo, start, len,
>> -                     ext4_get_block);
>> +             return __generic_block_fiemap(inode, fieinfo, start, len,
>> +                                           ext4_get_block);
>>
>>       if (fiemap_check_flags(fieinfo, EXT4_FIEMAP_FLAGS))
>>               return -EBADR;
>> diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
>> index 4ce22e5..bb8d541 100644
>> --- a/fs/gfs2/inode.c
>> +++ b/fs/gfs2/inode.c
>> @@ -1775,8 +1775,6 @@ static int gfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>       if (ret)
>>               return ret;
>>
>> -     mutex_lock(&inode->i_mutex);
>> -
>>       ret = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED, 0, &gh);
>>       if (ret)
>>               goto out;
>> @@ -1802,7 +1800,6 @@ static int gfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>
>>       gfs2_glock_dq_uninit(&gh);
>>  out:
>> -     mutex_unlock(&inode->i_mutex);
>>       return ret;
>>  }
>>
>> diff --git a/fs/ioctl.c b/fs/ioctl.c
>> index 29167be..320993f 100644
>> --- a/fs/ioctl.c
>> +++ b/fs/ioctl.c
>> @@ -206,7 +206,9 @@ static int ioctl_fiemap(struct file *filp, unsigned long arg)
>>       if (fieinfo.fi_flags & FIEMAP_FLAG_SYNC)
>>               filemap_write_and_wait(inode->i_mapping);
>>
>> +     mutex_lock(&inode->i_mutex);
>>       error = inode->i_op->fiemap(inode, &fieinfo, fiemap.fm_start, len);
>> +     mutex_unlock(&inode->i_mutex);
>>       fiemap.fm_flags = fieinfo.fi_flags;
>>       fiemap.fm_mapped_extents = fieinfo.fi_extents_mapped;
>>       if (copy_to_user(ufiemap, &fiemap, sizeof(fiemap)))
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] fiemap: move i_op->fiemap() locking to the ioctl_fiemap()
  2012-09-21 14:33   ` Kasatkin, Dmitry
@ 2012-09-21 14:39     ` Steven Whitehouse
  0 siblings, 0 replies; 14+ messages in thread
From: Steven Whitehouse @ 2012-09-21 14:39 UTC (permalink / raw)
  To: Kasatkin, Dmitry
  Cc: sandeen, viro, tytso, linux-fsdevel, linux-security-module, zohar

Hi,

On Fri, 2012-09-21 at 17:33 +0300, Kasatkin, Dmitry wrote:
> On Fri, Sep 21, 2012 at 5:27 PM, Steven Whitehouse <swhiteho@redhat.com> wrote:
> > Hi,
> >
> > On Fri, 2012-09-21 at 17:14 +0300, Dmitry Kasatkin wrote:
> >> On some filesystems calling i_op->fiemap() takes the i_mutex,
> >> on others it doesn't. For example, on ext2/ext3 fiemap calls
> >> generic_block_fiemap(), which takes the i_mutex, but on ext4
> >> it doesn't call generic_block_fiemap() for inodes, that have no
> >> extents and does not take the i_mutex in other cases. Other i_op
> >> functions, like setxattr/setattr, rely on the caller to lock the
> >> i_mutex.
> >>
> >> This patch moves the i_mutex locking from generic_block_fiemap()
> >> to ioctl_fiemap(), which is currently the only caller of i_op->fiemap().
> >>
> >> This change is needed in preparation for EVM to include a hash of
> >> the extent data to be used in the HMAC calculation. EVM is called
> >> when the i_mutex has already been taken.
> >>
> >> Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
> >
> > I'm just wondering whether or not we need the i_mutex lock to be taken
> > at all in GFS2.... the reason that it was moved up a level was that we
> > require i_mutex to be taken before the glock (rule being local locking
> > before cluster locking).
> >
> > However, we do hold a shared glock over the same portion of code and
> > that should be enough to prevent any changes in the on-disk extents for
> > the inode in question.
> >
> > Since we have the new truncate sequence, the inode cannot be truncated
> > outside of the glock any more, so I don't think that i_mutex really buys
> > us anything there, unless anybody else can spot a good reason for it?
> >
> > That said, the patch as is looks ok to me, its just that we may not need
> > i_mutex protection at all now,
> >
> > Steve.
> >
> 
> So you say that we might not need to lock i_mutex at all when we call
> __generic_block_fiemap?
> 
> - Dmitry
> 
Yes for GFS2. Although, also it makes little odds if it is held. I'm not
sure what the requirements are for the other filesystems, they may or
may not have their own locking,

Steve.

> >
> >> ---
> >>  fs/ext2/inode.c   |    4 ++--
> >>  fs/ext3/inode.c   |    4 ++--
> >>  fs/ext4/extents.c |    4 ++--
> >>  fs/gfs2/inode.c   |    3 ---
> >>  fs/ioctl.c        |    2 ++
> >>  5 files changed, 8 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
> >> index 6363ac6..0e7a0b3 100644
> >> --- a/fs/ext2/inode.c
> >> +++ b/fs/ext2/inode.c
> >> @@ -760,8 +760,8 @@ int ext2_get_block(struct inode *inode, sector_t iblock, struct buffer_head *bh_
> >>  int ext2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> >>               u64 start, u64 len)
> >>  {
> >> -     return generic_block_fiemap(inode, fieinfo, start, len,
> >> -                                 ext2_get_block);
> >> +     return __generic_block_fiemap(inode, fieinfo, start, len,
> >> +                                   ext2_get_block);
> >>  }
> >>
> >>  static int ext2_writepage(struct page *page, struct writeback_control *wbc)
> >> diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
> >> index a075973..0727254 100644
> >> --- a/fs/ext3/inode.c
> >> +++ b/fs/ext3/inode.c
> >> @@ -1046,8 +1046,8 @@ out:
> >>  int ext3_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> >>               u64 start, u64 len)
> >>  {
> >> -     return generic_block_fiemap(inode, fieinfo, start, len,
> >> -                                 ext3_get_block);
> >> +     return __generic_block_fiemap(inode, fieinfo, start, len,
> >> +                                   ext3_get_block);
> >>  }
> >>
> >>  /*
> >> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> >> index cd0c7ed..e960831 100644
> >> --- a/fs/ext4/extents.c
> >> +++ b/fs/ext4/extents.c
> >> @@ -4916,8 +4916,8 @@ int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> >>
> >>       /* fallback to generic here if not in extents fmt */
> >>       if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)))
> >> -             return generic_block_fiemap(inode, fieinfo, start, len,
> >> -                     ext4_get_block);
> >> +             return __generic_block_fiemap(inode, fieinfo, start, len,
> >> +                                           ext4_get_block);
> >>
> >>       if (fiemap_check_flags(fieinfo, EXT4_FIEMAP_FLAGS))
> >>               return -EBADR;
> >> diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
> >> index 4ce22e5..bb8d541 100644
> >> --- a/fs/gfs2/inode.c
> >> +++ b/fs/gfs2/inode.c
> >> @@ -1775,8 +1775,6 @@ static int gfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> >>       if (ret)
> >>               return ret;
> >>
> >> -     mutex_lock(&inode->i_mutex);
> >> -
> >>       ret = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED, 0, &gh);
> >>       if (ret)
> >>               goto out;
> >> @@ -1802,7 +1800,6 @@ static int gfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> >>
> >>       gfs2_glock_dq_uninit(&gh);
> >>  out:
> >> -     mutex_unlock(&inode->i_mutex);
> >>       return ret;
> >>  }
> >>
> >> diff --git a/fs/ioctl.c b/fs/ioctl.c
> >> index 29167be..320993f 100644
> >> --- a/fs/ioctl.c
> >> +++ b/fs/ioctl.c
> >> @@ -206,7 +206,9 @@ static int ioctl_fiemap(struct file *filp, unsigned long arg)
> >>       if (fieinfo.fi_flags & FIEMAP_FLAG_SYNC)
> >>               filemap_write_and_wait(inode->i_mapping);
> >>
> >> +     mutex_lock(&inode->i_mutex);
> >>       error = inode->i_op->fiemap(inode, &fieinfo, fiemap.fm_start, len);
> >> +     mutex_unlock(&inode->i_mutex);
> >>       fiemap.fm_flags = fieinfo.fi_flags;
> >>       fiemap.fm_mapped_extents = fieinfo.fi_extents_mapped;
> >>       if (copy_to_user(ufiemap, &fiemap, sizeof(fiemap)))
> >
> >



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] fiemap: move i_op->fiemap() locking to the ioctl_fiemap()
  2012-09-21 14:14 [PATCH 1/1] fiemap: move i_op->fiemap() locking to the ioctl_fiemap() Dmitry Kasatkin
  2012-09-21 14:27 ` Steven Whitehouse
@ 2012-09-21 22:59 ` Dave Chinner
  2012-09-24  8:13   ` Kasatkin, Dmitry
  1 sibling, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2012-09-21 22:59 UTC (permalink / raw)
  To: Dmitry Kasatkin
  Cc: sandeen, viro, swhiteho, tytso, linux-fsdevel,
	linux-security-module, zohar

On Fri, Sep 21, 2012 at 05:14:28PM +0300, Dmitry Kasatkin wrote:
> On some filesystems calling i_op->fiemap() takes the i_mutex,
> on others it doesn't.

Exactly by design. Many fiesystems don't require the i_mutex for
fiemap to be safe. e.g. the extent map is not protected by the
i_mutex in XFS or ext4, and so holding the i_mutex over fiemap is
incorrect.

As a result, extent maps can change even when someone is holding the
i_mutex. Any design that requires i_mutex to provide stable,
unchanging extent maps is fundamentally broken....

So that's a NACK from me...

> This change is needed in preparation for EVM to include a hash of
> the extent data to be used in the HMAC calculation. EVM is called
> when the i_mutex has already been taken.

What's EVM? What's HMAC? and what is it actually checksumming? The
extent map, or the data in the file? And if it is the extent map,
what's the purpose of using the extent map for this? You need to
explain how this is intended to be used, not use TLAs that people
unfamiliar with your application will not understand...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] fiemap: move i_op->fiemap() locking to the ioctl_fiemap()
  2012-09-21 22:59 ` Dave Chinner
@ 2012-09-24  8:13   ` Kasatkin, Dmitry
  2012-09-24  9:18     ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Kasatkin, Dmitry @ 2012-09-24  8:13 UTC (permalink / raw)
  To: Dave Chinner
  Cc: sandeen, viro, swhiteho, tytso, linux-fsdevel,
	linux-security-module, zohar, Alan Cox

On Sat, Sep 22, 2012 at 1:59 AM, Dave Chinner <david@fromorbit.com> wrote:
> On Fri, Sep 21, 2012 at 05:14:28PM +0300, Dmitry Kasatkin wrote:
>> On some filesystems calling i_op->fiemap() takes the i_mutex,
>> on others it doesn't.
>
> Exactly by design. Many fiesystems don't require the i_mutex for
> fiemap to be safe. e.g. the extent map is not protected by the
> i_mutex in XFS or ext4, and so holding the i_mutex over fiemap is
> incorrect.
>

Please read an explanation bellow, where I tell what is EVM.

But in my opinion it is a a bit wrong design if caller does not know
if any lock will or will not be locked.
Linux kernel has many non-locking functions and its locking wrappers.
Many callees requires and state that caller must lock something.

> As a result, extent maps can change even when someone is holding the
> i_mutex. Any design that requires i_mutex to provide stable,
> unchanging extent maps is fundamentally broken....
>

It is not necessarily that i_mutex to be locked to provide stable
unchanging extent maps.
FS might not require that.
But the caller might require some sort of atomicity.

> So that's a NACK from me...
>
>> This change is needed in preparation for EVM to include a hash of
>> the extent data to be used in the HMAC calculation. EVM is called
>> when the i_mutex has already been taken.
>
> What's EVM? What's HMAC? and what is it actually checksumming? The
> extent map, or the data in the file? And if it is the extent map,
> what's the purpose of using the extent map for this? You need to
> explain how this is intended to be used, not use TLAs that people
> unfamiliar with your application will not understand...
>

EVM - Extended Verification Module: security/integrity/evm

That is a part of integrity subsystem which provides protection
against offline modification of
a inode metadata: uid, gid, perm, ino, xattrs, etc..

EVM has several hooks, which are called when those above metadata changes.
And they are called when i_mutex is locked.
i_mutex is used to prevent racing between some security verification
and file metadata attribute update.

One of the patches we are working on now would like to use extent map
information to prevent certain attack.
Of course as above we do it in EVM hooks which are called under i_mutex.
And more over we need to ensure atomicity when reading extent map and
updating xattrs - just the same as it works now
for other inode metadata.

So, we need to be able to call i_op->fiemap under i_mutex locked.
It works fine for ext4, because ext4 does not lock i_mutex for files,
but it does not work for ext3/ext2, because
it does not have extent and use i_mutex to lock it.

In my opinion, moving i_mutex locking to ioctl_fiemap is quite reasonable idea.
It does not expose any requirement to FS to lock or not to lock i_mutex.

- Dmitry

> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] fiemap: move i_op->fiemap() locking to the ioctl_fiemap()
  2012-09-24  8:13   ` Kasatkin, Dmitry
@ 2012-09-24  9:18     ` Dave Chinner
  2012-09-24 11:28       ` Kasatkin, Dmitry
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2012-09-24  9:18 UTC (permalink / raw)
  To: Kasatkin, Dmitry
  Cc: sandeen, viro, swhiteho, tytso, linux-fsdevel,
	linux-security-module, zohar, Alan Cox

On Mon, Sep 24, 2012 at 11:13:30AM +0300, Kasatkin, Dmitry wrote:
> On Sat, Sep 22, 2012 at 1:59 AM, Dave Chinner <david@fromorbit.com> wrote:
> > On Fri, Sep 21, 2012 at 05:14:28PM +0300, Dmitry Kasatkin wrote:
> >> On some filesystems calling i_op->fiemap() takes the i_mutex,
> >> on others it doesn't.
> >
> > Exactly by design. Many fiesystems don't require the i_mutex for
> > fiemap to be safe. e.g. the extent map is not protected by the
> > i_mutex in XFS or ext4, and so holding the i_mutex over fiemap is
> > incorrect.
> >
> 
> Please read an explanation bellow, where I tell what is EVM.
> 
> But in my opinion it is a a bit wrong design if caller does not know
> if any lock will or will not be locked.

We've got plenty of interfaces where this is true. Look at .fsync,
for example:

$ gl -n 1 02c24a82
commit 02c24a82187d5a628c68edfe71ae60dc135cd178
Author: Josef Bacik <josef@redhat.com>
Date:   Sat Jul 16 20:44:56 2011 -0400

    fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers

    Btrfs needs to be able to control how filemap_write_and_wait_range() is called
    in fsync to make it less of a painful operation, so push down taking i_mutex and
    the calling of filemap_write_and_wait() down into the ->fsync() handlers.  Some
    file systems can drop taking the i_mutex altogether it seems, like ext3 and
    ocfs2.  For correctness sake I just pushed everything down in all cases to make
    sure that we keep the current behavior the same for everybody, and then each
    individual fs maintainer can make up their mind about what to do from there.
    Thanks,

    Acked-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Josef Bacik <josef@redhat.com>
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

i.e. i_mutex used to be wrapped around the outside of .fsync, but
that proved to be a problem for btrfs and other filesystems that *do
not need i_mutex* during fsync. Hence the locking got pushed down
into the filesystems, and those that did not need i_mutex got rid of
it. Now the only rule for calling .fsync is that you can't hold
i_mutex when calling it.

> Linux kernel has many non-locking functions and its locking wrappers.
> Many callees requires and state that caller must lock something.

Yes, they do, just like many callees require and state that the
caller must *not* lock something. Say, like might_sleep()....

The requirement for calling .fiemap/.fync/.fallocate/etc is that the
"caller must not hold i_mutex". Essentially, all new inode/file
methods we have added in the past couple of years make the
assumption that serialisation is the problem of the filesystem, not
the VFS. This has been done because there is not a
"one-size-fits-all" method for fine-grained locking in filesystems.

> > As a result, extent maps can change even when someone is holding the
> > i_mutex. Any design that requires i_mutex to provide stable,
> > unchanging extent maps is fundamentally broken....
> >
> 
> It is not necessarily that i_mutex to be locked to provide stable
> unchanging extent maps.
> FS might not require that.
> But the caller might require some sort of atomicity.

fiemap is not the interface you want, then. it gives *zero*
guarantee that what it returns actually reflects what is on disk.
What is on disk can change the moment the filesystem drops the lock
that protects the extent map, and because i_mutex doesn't protect
the extent map, you aren't getting the atomicity you require....

i.e. fiemap provides a point in time snapshot that is out of date
even before the snapshot is returned to the caller regardless of the
locks held by the caller.

> > So that's a NACK from me...
> >
> >> This change is needed in preparation for EVM to include a hash of
> >> the extent data to be used in the HMAC calculation. EVM is called
> >> when the i_mutex has already been taken.
> >
> > What's EVM? What's HMAC? and what is it actually checksumming? The
> > extent map, or the data in the file? And if it is the extent map,
> > what's the purpose of using the extent map for this? You need to
> > explain how this is intended to be used, not use TLAs that people
> > unfamiliar with your application will not understand...
> >
> 
> EVM - Extended Verification Module: security/integrity/evm
> 
> That is a part of integrity subsystem which provides protection
> against offline modification of
> a inode metadata: uid, gid, perm, ino, xattrs, etc..

*groan*

> EVM has several hooks, which are called when those above metadata changes.
> And they are called when i_mutex is locked.
> i_mutex is used to prevent racing between some security verification
> and file metadata attribute update.

Sounds great, until you consider the fact that inode metadata can be
updated without i_mutex being held. e.g. fallocate vs fiemap.  Have
you ever had a look at how many different ioctls various filesystems
implement that modify metadata without needing to hold i_mutex?

> One of the patches we are working on now would like to use extent map
> information to prevent certain attack.

I'm not a mind reader. What attack may that be?

> Of course as above we do it in EVM hooks which are called under i_mutex.
> And more over we need to ensure atomicity when reading extent map and
> updating xattrs - just the same as it works now
> for other inode metadata.

Fundamentally, extent maps are not protected by i_mutex, and that's
where your design breaks down. What you want cannot be provided by
i_mutex, and that's one of the reasons why i_mutex is not held above
.fiemap - it's not appropriate locking for the operation being
performed.

Here's a beautiful example: XFS_IOC_SWAPEXT.

This filesystem specific ioctl() atomically swaps the extent maps
between two inodes.  And it doesn't hold i_mutex to do it, and it
doesn't even inform the VFS that it has happened. Indeed, it is
*completely transparent* to the VFS because it's a special interface
to the filesystem used by filesystem maintenance tools, namely the
online defragmenter xfs_fsr. That means running xfs_fsr will really
screw with any EVM requirement that special security attributes and
extent maps are kept in sync....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] fiemap: move i_op->fiemap() locking to the ioctl_fiemap()
  2012-09-24  9:18     ` Dave Chinner
@ 2012-09-24 11:28       ` Kasatkin, Dmitry
  2012-09-25  1:56         ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Kasatkin, Dmitry @ 2012-09-24 11:28 UTC (permalink / raw)
  To: Dave Chinner
  Cc: sandeen, viro, swhiteho, tytso, linux-fsdevel,
	linux-security-module, zohar, Alan Cox

On Mon, Sep 24, 2012 at 12:18 PM, Dave Chinner <david@fromorbit.com> wrote:
> On Mon, Sep 24, 2012 at 11:13:30AM +0300, Kasatkin, Dmitry wrote:
>> On Sat, Sep 22, 2012 at 1:59 AM, Dave Chinner <david@fromorbit.com> wrote:
>> > On Fri, Sep 21, 2012 at 05:14:28PM +0300, Dmitry Kasatkin wrote:
>> >> On some filesystems calling i_op->fiemap() takes the i_mutex,
>> >> on others it doesn't.
>> >
>> > Exactly by design. Many fiesystems don't require the i_mutex for
>> > fiemap to be safe. e.g. the extent map is not protected by the
>> > i_mutex in XFS or ext4, and so holding the i_mutex over fiemap is
>> > incorrect.
>> >
>>
>> Please read an explanation bellow, where I tell what is EVM.
>>
>> But in my opinion it is a a bit wrong design if caller does not know
>> if any lock will or will not be locked.
>
> We've got plenty of interfaces where this is true. Look at .fsync,
> for example:
>
> $ gl -n 1 02c24a82
> commit 02c24a82187d5a628c68edfe71ae60dc135cd178
> Author: Josef Bacik <josef@redhat.com>
> Date:   Sat Jul 16 20:44:56 2011 -0400
>
>     fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers
>
>     Btrfs needs to be able to control how filemap_write_and_wait_range() is called
>     in fsync to make it less of a painful operation, so push down taking i_mutex and
>     the calling of filemap_write_and_wait() down into the ->fsync() handlers.  Some
>     file systems can drop taking the i_mutex altogether it seems, like ext3 and
>     ocfs2.  For correctness sake I just pushed everything down in all cases to make
>     sure that we keep the current behavior the same for everybody, and then each
>     individual fs maintainer can make up their mind about what to do from there.
>     Thanks,
>
>     Acked-by: Jan Kara <jack@suse.cz>
>     Signed-off-by: Josef Bacik <josef@redhat.com>
>     Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
>
> i.e. i_mutex used to be wrapped around the outside of .fsync, but
> that proved to be a problem for btrfs and other filesystems that *do
> not need i_mutex* during fsync. Hence the locking got pushed down
> into the filesystems, and those that did not need i_mutex got rid of
> it. Now the only rule for calling .fsync is that you can't hold
> i_mutex when calling it.

I see.

>
>> Linux kernel has many non-locking functions and its locking wrappers.
>> Many callees requires and state that caller must lock something.
>
> Yes, they do, just like many callees require and state that the
> caller must *not* lock something. Say, like might_sleep()....
>

Fair enough.

> The requirement for calling .fiemap/.fync/.fallocate/etc is that the
> "caller must not hold i_mutex". Essentially, all new inode/file
> methods we have added in the past couple of years make the
> assumption that serialisation is the problem of the filesystem, not
> the VFS. This has been done because there is not a
> "one-size-fits-all" method for fine-grained locking in filesystems.
>
>> > As a result, extent maps can change even when someone is holding the
>> > i_mutex. Any design that requires i_mutex to provide stable,
>> > unchanging extent maps is fundamentally broken....
>> >
>>
>> It is not necessarily that i_mutex to be locked to provide stable
>> unchanging extent maps.
>> FS might not require that.
>> But the caller might require some sort of atomicity.
>
> fiemap is not the interface you want, then. it gives *zero*
> guarantee that what it returns actually reflects what is on disk.
> What is on disk can change the moment the filesystem drops the lock
> that protects the extent map, and because i_mutex doesn't protect
> the extent map, you aren't getting the atomicity you require....
>

I use filemap_write_and_wait() before using fiemap.
And it seems to be persistent between reboots on ext4.

In our case i_mutex is not for FS itself, but for EVM to ensure
that hash over extents and its value in xattrs are stay in sync.


> i.e. fiemap provides a point in time snapshot that is out of date
> even before the snapshot is returned to the caller regardless of the
> locks held by the caller.
>

So what is the way at VFS to ensure that what ever is returned in fiemap
stays the same over reboots?

ioctl_fiemap() uses filemap_write_and_wait()...
And it seems to work well for me as well...

>> > So that's a NACK from me...
>> >
>> >> This change is needed in preparation for EVM to include a hash of
>> >> the extent data to be used in the HMAC calculation. EVM is called
>> >> when the i_mutex has already been taken.
>> >
>> > What's EVM? What's HMAC? and what is it actually checksumming? The
>> > extent map, or the data in the file? And if it is the extent map,
>> > what's the purpose of using the extent map for this? You need to
>> > explain how this is intended to be used, not use TLAs that people
>> > unfamiliar with your application will not understand...
>> >
>>
>> EVM - Extended Verification Module: security/integrity/evm
>>
>> That is a part of integrity subsystem which provides protection
>> against offline modification of
>> a inode metadata: uid, gid, perm, ino, xattrs, etc..
>
> *groan*
>
>> EVM has several hooks, which are called when those above metadata changes.
>> And they are called when i_mutex is locked.
>> i_mutex is used to prevent racing between some security verification
>> and file metadata attribute update.
>
> Sounds great, until you consider the fact that inode metadata can be
> updated without i_mutex being held. e.g. fallocate vs fiemap.  Have
> you ever had a look at how many different ioctls various filesystems
> implement that modify metadata without needing to hold i_mutex?
>

What inode metadata are you talking about?

do_truncate:
	mutex_lock(&dentry->d_inode->i_mutex);
	ret = notify_change(dentry, &newattrs);
	mutex_unlock(&dentry->d_inode->i_mutex);

chmod_common:
	error = notify_change(path->dentry, &newattrs);
	mutex_unlock(&inode->i_mutex);

chown_common:
	mutex_lock(&inode->i_mutex);
	error = security_path_chown(path, user, group);
	if (!error)
		error = notify_change(path->dentry, &newattrs);
	mutex_unlock(&inode->i_mutex);

It all happens with i_mutex locked.
They all call then
	if (inode->i_op->setattr)
		error = inode->i_op->setattr(dentry, attr);


>> One of the patches we are working on now would like to use extent map
>> information to prevent certain attack.
>
> I'm not a mind reader. What attack may that be?

That is a simple attack of modifying inode block map so that different
files points to the same blocks.
FSCK calls them "multiply claimed blocks"...

>
>> Of course as above we do it in EVM hooks which are called under i_mutex.
>> And more over we need to ensure atomicity when reading extent map and
>> updating xattrs - just the same as it works now
>> for other inode metadata.
>
> Fundamentally, extent maps are not protected by i_mutex, and that's
> where your design breaks down. What you want cannot be provided by
> i_mutex, and that's one of the reasons why i_mutex is not held above
> .fiemap - it's not appropriate locking for the operation being
> performed.

Really, as I said, we are under i_mutex not to protect FS,
but to make sure that extent content and extended attribute stays in sync.

>
> Here's a beautiful example: XFS_IOC_SWAPEXT.
>
> This filesystem specific ioctl() atomically swaps the extent maps
> between two inodes.  And it doesn't hold i_mutex to do it, and it
> doesn't even inform the VFS that it has happened. Indeed, it is
> *completely transparent* to the VFS because it's a special interface
> to the filesystem used by filesystem maintenance tools, namely the
> online defragmenter xfs_fsr. That means running xfs_fsr will really
> screw with any EVM requirement that special security attributes and
> extent maps are kept in sync....
>

That is a tough case then if VFS is unaware.
For integrity protection it might be a requirement not to use such
ways to modify file system.


But ok.. You are aware of the problem.
What is your suggestion on how we could get some integrity measurement
at VFS layer which
allows to identify if inode block mapping has changed?

Thanks,

Dmitry


> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] fiemap: move i_op->fiemap() locking to the ioctl_fiemap()
  2012-09-24 11:28       ` Kasatkin, Dmitry
@ 2012-09-25  1:56         ` Dave Chinner
  2012-09-26  8:22           ` Kasatkin, Dmitry
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2012-09-25  1:56 UTC (permalink / raw)
  To: Kasatkin, Dmitry
  Cc: sandeen, viro, swhiteho, tytso, linux-fsdevel,
	linux-security-module, zohar, Alan Cox

On Mon, Sep 24, 2012 at 02:28:15PM +0300, Kasatkin, Dmitry wrote:
> On Mon, Sep 24, 2012 at 12:18 PM, Dave Chinner <david@fromorbit.com> wrote:
> > On Mon, Sep 24, 2012 at 11:13:30AM +0300, Kasatkin, Dmitry wrote:
> >> On Sat, Sep 22, 2012 at 1:59 AM, Dave Chinner <david@fromorbit.com> wrote:
> >> > On Fri, Sep 21, 2012 at 05:14:28PM +0300, Dmitry Kasatkin wrote:
> >> It is not necessarily that i_mutex to be locked to provide stable
> >> unchanging extent maps.
> >> FS might not require that.
> >> But the caller might require some sort of atomicity.
> >
> > fiemap is not the interface you want, then. it gives *zero*
> > guarantee that what it returns actually reflects what is on disk.
> > What is on disk can change the moment the filesystem drops the lock
> > that protects the extent map, and because i_mutex doesn't protect
> > the extent map, you aren't getting the atomicity you require....
> >
> 
> I use filemap_write_and_wait() before using fiemap.
> And it seems to be persistent between reboots on ext4.

That's not the problem. Trying running fallocate() to randomly
preallocate space and punch holes in the file (without changing the
file size) in parallel with your EVM hash calculation....

> In our case i_mutex is not for FS itself, but for EVM to ensure
> that hash over extents and its value in xattrs are stay in sync.
> 
> 
> > i.e. fiemap provides a point in time snapshot that is out of date
> > even before the snapshot is returned to the caller regardless of the
> >
> 
> So what is the way at VFS to ensure that what ever is returned in fiemap
> stays the same over reboots?

The is no way to guarantee that.

Indeed, XFS can modify the extent map of an inode after the VFS has
disposed of the inode (e.g. during unmount or memory reclaim). The
changes after inode cache eviction mean that you can't even
guarantee that the extent map remains unchanged from close() to
open() a few minutes apart....

Extent maps and their manipulation belongs to the filesystem. The
VFS has no business getting involved in how the filesytem manages or
manipulates them.

> ioctl_fiemap() uses filemap_write_and_wait()...
> And it seems to work well for me as well...

filemap_write_and_wait() writes the current dirty data to disk. It
doesn't stop other threads from dirtying the inode or the extent map
from changing at any time.

> >> EVM - Extended Verification Module: security/integrity/evm
> >>
> >> That is a part of integrity subsystem which provides protection
> >> against offline modification of
> >> a inode metadata: uid, gid, perm, ino, xattrs, etc..
> >
> > *groan*
> >
> >> EVM has several hooks, which are called when those above metadata changes.
> >> And they are called when i_mutex is locked.
> >> i_mutex is used to prevent racing between some security verification
> >> and file metadata attribute update.
> >
> > Sounds great, until you consider the fact that inode metadata can be
> > updated without i_mutex being held. e.g. fallocate vs fiemap.  Have
> > you ever had a look at how many different ioctls various filesystems
> > implement that modify metadata without needing to hold i_mutex?
> >
> 
> What inode metadata are you talking about?

Extent maps. ioctl()s that manipulate extent maps. Filesystems
specific inode metadata and ioctls that manipulate that. ioctls that
don't go through VFS paths. Handle-based attribute manipulations via
ioctls.  Invisible IO that specifically avoids changing user visible
attributes like atime/ctime/mtime on reads and writes.

> do_truncate:
> 	mutex_lock(&dentry->d_inode->i_mutex);
> 	ret = notify_change(dentry, &newattrs);
> 	mutex_unlock(&dentry->d_inode->i_mutex);
> 
> chmod_common:
> 	error = notify_change(path->dentry, &newattrs);
> 	mutex_unlock(&inode->i_mutex);
> 
> chown_common:
> 	mutex_lock(&inode->i_mutex);
> 	error = security_path_chown(path, user, group);
> 	if (!error)
> 		error = notify_change(path->dentry, &newattrs);
> 	mutex_unlock(&inode->i_mutex);
> 
> It all happens with i_mutex locked.
> They all call then
> 	if (inode->i_op->setattr)
> 		error = inode->i_op->setattr(dentry, attr);

Sure. But i_mutex only specifically protects VFS maintained
metadata. The extent map is not a VFS maintained structure - it's a
filesystem internal detail.

> >> One of the patches we are working on now would like to use extent map
> >> information to prevent certain attack.
> >
> > I'm not a mind reader. What attack may that be?
> 
> That is a simple attack of modifying inode block map so that different
> files points to the same blocks.
> FSCK calls them "multiply claimed blocks"...

Just force users to run fsck before mounting?

Better question: why won't file data integrity checks detect such
errors? Data integrity checks will only pass if the duplicate block
has identical data, or someone is clever enough to find an existing
block on disk that magically causes a hash collision. Finding a hash
collision already on disk is so unlikely that someone who has the
access and ability to calculate a hash collision will simply
overwrite the file contents directly....

Seriously, if someone can modify your block device directly then
you've already lost and no amount of after-the-fact verification
will save you.

> >> Of course as above we do it in EVM hooks which are called under i_mutex.
> >> And more over we need to ensure atomicity when reading extent map and
> >> updating xattrs - just the same as it works now
> >> for other inode metadata.
> >
> > Fundamentally, extent maps are not protected by i_mutex, and that's
> > where your design breaks down. What you want cannot be provided by
> > i_mutex, and that's one of the reasons why i_mutex is not held above
> > .fiemap - it's not appropriate locking for the operation being
> > performed.
> 
> Really, as I said, we are under i_mutex not to protect FS,
> but to make sure that extent content and extended attribute stays in sync.

i_mutex does not give you that guarantee.

BTW, if you were concerned about integrity, you'd be running fsync
after writing the attribute to ensure the attribute goes straight to
disk and stays in sync with the extent map on disk, yes? You're not
running fsync, though, are you?  I'm pretty sure because otherwise
this conversion would be about fsync, not fiemap....

Even so, the xattr modification will be a separate transaction to
the extent map manipulation, so it's entirely possible that a crash
between the change of the extent map and writing of the xattr can
occur, and so you end up out of sync anyway. The only way to keep
them in sync is to have the filesystem write the xattr in the same
transaction that modifies the extent map.....

> > Here's a beautiful example: XFS_IOC_SWAPEXT.
> >
> > This filesystem specific ioctl() atomically swaps the extent maps
> > between two inodes.  And it doesn't hold i_mutex to do it, and it
> > doesn't even inform the VFS that it has happened. Indeed, it is
> > *completely transparent* to the VFS because it's a special interface
> > to the filesystem used by filesystem maintenance tools, namely the
> > online defragmenter xfs_fsr. That means running xfs_fsr will really
> > screw with any EVM requirement that special security attributes and
> > extent maps are kept in sync....
> >
> 
> That is a tough case then if VFS is unaware.
> For integrity protection it might be a requirement not to use such
> ways to modify file system.

Ok, so you're effectively saying that what you want to do with EVM
requires that no non-VFS based extent map manipulation will be
allowed. So, no automatic tiering via hot-block tracking. No file
defragmentation, automatic or manual. No automatic repair of
detected corruption. No automatic data replication. No HSMs. No
filesystem level data deduplication. No data restriping. No tree
rebalancing. No filesystem specific dump/restore or send/recv.

Perhaps you might want to rethin kwhat you are trying to acheive
with extent map checking?

> But ok.. You are aware of the problem.
> What is your suggestion on how we could get some integrity measurement
> at VFS layer which
> allows to identify if inode block mapping has changed?

1. Validate the data in the file, not the metadata that the filesystem
uses to index the data. (easy)

2. have the filesystem modify the integrity xattr in the same
transaction that modifies the extent map. (hard, might be impossible
for some filesytems))

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] fiemap: move i_op->fiemap() locking to the ioctl_fiemap()
  2012-09-25  1:56         ` Dave Chinner
@ 2012-09-26  8:22           ` Kasatkin, Dmitry
  2012-09-27  2:12             ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Kasatkin, Dmitry @ 2012-09-26  8:22 UTC (permalink / raw)
  To: Dave Chinner
  Cc: sandeen, viro, swhiteho, tytso, linux-fsdevel,
	linux-security-module, zohar, Alan Cox

On Tue, Sep 25, 2012 at 4:56 AM, Dave Chinner <david@fromorbit.com> wrote:
> On Mon, Sep 24, 2012 at 02:28:15PM +0300, Kasatkin, Dmitry wrote:
>> On Mon, Sep 24, 2012 at 12:18 PM, Dave Chinner <david@fromorbit.com> wrote:
>> > On Mon, Sep 24, 2012 at 11:13:30AM +0300, Kasatkin, Dmitry wrote:
>> >> On Sat, Sep 22, 2012 at 1:59 AM, Dave Chinner <david@fromorbit.com> wrote:
>> >> > On Fri, Sep 21, 2012 at 05:14:28PM +0300, Dmitry Kasatkin wrote:
>> >> It is not necessarily that i_mutex to be locked to provide stable
>> >> unchanging extent maps.
>> >> FS might not require that.
>> >> But the caller might require some sort of atomicity.
>> >
>> > fiemap is not the interface you want, then. it gives *zero*
>> > guarantee that what it returns actually reflects what is on disk.
>> > What is on disk can change the moment the filesystem drops the lock
>> > that protects the extent map, and because i_mutex doesn't protect
>> > the extent map, you aren't getting the atomicity you require....
>> >
>>
>> I use filemap_write_and_wait() before using fiemap.
>> And it seems to be persistent between reboots on ext4.
>
> That's not the problem. Trying running fallocate() to randomly
> preallocate space and punch holes in the file (without changing the
> file size) in parallel with your EVM hash calculation....
>
>> In our case i_mutex is not for FS itself, but for EVM to ensure
>> that hash over extents and its value in xattrs are stay in sync.
>>
>>
>> > i.e. fiemap provides a point in time snapshot that is out of date
>> > even before the snapshot is returned to the caller regardless of the
>> >
>>
>> So what is the way at VFS to ensure that what ever is returned in fiemap
>> stays the same over reboots?
>
> The is no way to guarantee that.
>
> Indeed, XFS can modify the extent map of an inode after the VFS has
> disposed of the inode (e.g. during unmount or memory reclaim). The
> changes after inode cache eviction mean that you can't even
> guarantee that the extent map remains unchanged from close() to
> open() a few minutes apart....
>
> Extent maps and their manipulation belongs to the filesystem. The
> VFS has no business getting involved in how the filesytem manages or
> manipulates them.
>
>> ioctl_fiemap() uses filemap_write_and_wait()...
>> And it seems to work well for me as well...
>
> filemap_write_and_wait() writes the current dirty data to disk. It
> doesn't stop other threads from dirtying the inode or the extent map
> from changing at any time.
>
>> >> EVM - Extended Verification Module: security/integrity/evm
>> >>
>> >> That is a part of integrity subsystem which provides protection
>> >> against offline modification of
>> >> a inode metadata: uid, gid, perm, ino, xattrs, etc..
>> >
>> > *groan*
>> >
>> >> EVM has several hooks, which are called when those above metadata changes.
>> >> And they are called when i_mutex is locked.
>> >> i_mutex is used to prevent racing between some security verification
>> >> and file metadata attribute update.
>> >
>> > Sounds great, until you consider the fact that inode metadata can be
>> > updated without i_mutex being held. e.g. fallocate vs fiemap.  Have
>> > you ever had a look at how many different ioctls various filesystems
>> > implement that modify metadata without needing to hold i_mutex?
>> >
>>
>> What inode metadata are you talking about?
>
> Extent maps. ioctl()s that manipulate extent maps. Filesystems
> specific inode metadata and ioctls that manipulate that. ioctls that
> don't go through VFS paths. Handle-based attribute manipulations via
> ioctls.  Invisible IO that specifically avoids changing user visible
> attributes like atime/ctime/mtime on reads and writes.
>
>> do_truncate:
>>       mutex_lock(&dentry->d_inode->i_mutex);
>>       ret = notify_change(dentry, &newattrs);
>>       mutex_unlock(&dentry->d_inode->i_mutex);
>>
>> chmod_common:
>>       error = notify_change(path->dentry, &newattrs);
>>       mutex_unlock(&inode->i_mutex);
>>
>> chown_common:
>>       mutex_lock(&inode->i_mutex);
>>       error = security_path_chown(path, user, group);
>>       if (!error)
>>               error = notify_change(path->dentry, &newattrs);
>>       mutex_unlock(&inode->i_mutex);
>>
>> It all happens with i_mutex locked.
>> They all call then
>>       if (inode->i_op->setattr)
>>               error = inode->i_op->setattr(dentry, attr);
>
> Sure. But i_mutex only specifically protects VFS maintained
> metadata. The extent map is not a VFS maintained structure - it's a
> filesystem internal detail.
>
>> >> One of the patches we are working on now would like to use extent map
>> >> information to prevent certain attack.
>> >
>> > I'm not a mind reader. What attack may that be?
>>
>> That is a simple attack of modifying inode block map so that different
>> files points to the same blocks.
>> FSCK calls them "multiply claimed blocks"...
>
> Just force users to run fsck before mounting?
>

Not good for mobile device from boot time point of view.

> Better question: why won't file data integrity checks detect such
> errors? Data integrity checks will only pass if the duplicate block
> has identical data, or someone is clever enough to find an existing
> block on disk that magically causes a hash collision. Finding a hash
> collision already on disk is so unlikely that someone who has the
> access and ability to calculate a hash collision will simply
> overwrite the file contents directly....
>

No.. that is not about a collision...

Possible attack to IMA - I have done it...

1. make a file copy.
    now we have 2 files with the same data and hash.
2. modify block map that second file points to the blocks of first...
    IMA verification succeeds, because content is the same
3. write some data to second file...
    it will also change content of first file.
4. create some low memory pressure to force kernel to drop pages.
5. open first file again.
    pages will be loaded again, but now verified by IMA, because inode
    is still in the cache and IMA still thinks that it is verified... BUM...


> Seriously, if someone can modify your block device directly then
> you've already lost and no amount of after-the-fact verification
> will save you.

Are you talking about offline or online modification?
Integrity protection against offline modification..
Online is protected by Access Control...

>
>> >> Of course as above we do it in EVM hooks which are called under i_mutex.
>> >> And more over we need to ensure atomicity when reading extent map and
>> >> updating xattrs - just the same as it works now
>> >> for other inode metadata.
>> >
>> > Fundamentally, extent maps are not protected by i_mutex, and that's
>> > where your design breaks down. What you want cannot be provided by
>> > i_mutex, and that's one of the reasons why i_mutex is not held above
>> > .fiemap - it's not appropriate locking for the operation being
>> > performed.
>>
>> Really, as I said, we are under i_mutex not to protect FS,
>> but to make sure that extent content and extended attribute stays in sync.
>
> i_mutex does not give you that guarantee.
>
> BTW, if you were concerned about integrity, you'd be running fsync
> after writing the attribute to ensure the attribute goes straight to
> disk and stays in sync with the extent map on disk, yes? You're not
> running fsync, though, are you?  I'm pretty sure because otherwise
> this conversion would be about fsync, not fiemap....
>
> Even so, the xattr modification will be a separate transaction to
> the extent map manipulation, so it's entirely possible that a crash
> between the change of the extent map and writing of the xattr can
> occur, and so you end up out of sync anyway. The only way to keep
> them in sync is to have the filesystem write the xattr in the same
> transaction that modifies the extent map.....
>
>> > Here's a beautiful example: XFS_IOC_SWAPEXT.
>> >
>> > This filesystem specific ioctl() atomically swaps the extent maps
>> > between two inodes.  And it doesn't hold i_mutex to do it, and it
>> > doesn't even inform the VFS that it has happened. Indeed, it is
>> > *completely transparent* to the VFS because it's a special interface
>> > to the filesystem used by filesystem maintenance tools, namely the
>> > online defragmenter xfs_fsr. That means running xfs_fsr will really
>> > screw with any EVM requirement that special security attributes and
>> > extent maps are kept in sync....
>> >
>>
>> That is a tough case then if VFS is unaware.
>> For integrity protection it might be a requirement not to use such
>> ways to modify file system.
>
> Ok, so you're effectively saying that what you want to do with EVM
> requires that no non-VFS based extent map manipulation will be
> allowed. So, no automatic tiering via hot-block tracking. No file
> defragmentation, automatic or manual. No automatic repair of
> detected corruption. No automatic data replication. No HSMs. No
> filesystem level data deduplication. No data restriping. No tree
> rebalancing. No filesystem specific dump/restore or send/recv.
>

Bad...

> Perhaps you might want to rethin kwhat you are trying to acheive
> with extent map checking?
>

I have to...

>> But ok.. You are aware of the problem.
>> What is your suggestion on how we could get some integrity measurement
>> at VFS layer which
>> allows to identify if inode block mapping has changed?
>
> 1. Validate the data in the file, not the metadata that the filesystem
> uses to index the data. (easy)
>
> 2. have the filesystem modify the integrity xattr in the same
> transaction that modifies the extent map. (hard, might be impossible
> for some filesytems))
>


I thought about using fiemap to eliminate attack I described.
I would not like such approach which somehow works for ext4 and not all others.


Are there any ways to detect that any of the pages have been dropped
from the kernel page cache?

- Dmitry

> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] fiemap: move i_op->fiemap() locking to the ioctl_fiemap()
  2012-09-26  8:22           ` Kasatkin, Dmitry
@ 2012-09-27  2:12             ` Dave Chinner
  2012-09-27  7:43               ` Kasatkin, Dmitry
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2012-09-27  2:12 UTC (permalink / raw)
  To: Kasatkin, Dmitry
  Cc: sandeen, viro, swhiteho, tytso, linux-fsdevel,
	linux-security-module, zohar, Alan Cox

On Wed, Sep 26, 2012 at 11:22:14AM +0300, Kasatkin, Dmitry wrote:
> On Tue, Sep 25, 2012 at 4:56 AM, Dave Chinner <david@fromorbit.com> wrote:
> > On Mon, Sep 24, 2012 at 02:28:15PM +0300, Kasatkin, Dmitry wrote:
> >> On Mon, Sep 24, 2012 at 12:18 PM, Dave Chinner <david@fromorbit.com> wrote:
> >> > On Mon, Sep 24, 2012 at 11:13:30AM +0300, Kasatkin, Dmitry wrote:
> >> >> One of the patches we are working on now would like to use extent map
> >> >> information to prevent certain attack.
> >> >
> >> > I'm not a mind reader. What attack may that be?
> >>
> >> That is a simple attack of modifying inode block map so that different
> >> files points to the same blocks.
> >> FSCK calls them "multiply claimed blocks"...
> >
> > Just force users to run fsck before mounting?
> 
> Not good for mobile device from boot time point of view.

Not good for a server with a 100TB filesystem either.

> > Better question: why won't file data integrity checks detect such
> > errors? Data integrity checks will only pass if the duplicate block
> > has identical data, or someone is clever enough to find an existing
> > block on disk that magically causes a hash collision. Finding a hash
> > collision already on disk is so unlikely that someone who has the
> > access and ability to calculate a hash collision will simply
> > overwrite the file contents directly....
> 
> No.. that is not about a collision...

It's just one possibility the moment you start talking about offline
modification of a block device.

> Possible attack to IMA - I have done it...
> 
> 1. make a file copy.
>     now we have 2 files with the same data and hash.
> 2. modify block map that second file points to the blocks of first...
>     IMA verification succeeds, because content is the same
> 3. write some data to second file...
>     it will also change content of first file.

And so IMA data verification will then fail on the first file when
it is read, and then you can go and find the problem causing the
verification error. I fail to see the problem at this point.

> 4. create some low memory pressure to force kernel to drop pages.
> 5. open first file again.
>     pages will be loaded again, but now verified by IMA, because inode
>     is still in the cache and IMA still thinks that it is verified... BUM...

Oh. You don't actually verify the contents every time it is loaded
from disk. There's your problem.  IOWs, you won't even detect a
filesystem corruption that causes this problem if the files have
already been verified and the inodes are still in memory.

If I were an attacker, I could easily prevent detection from
occurring simply by leaving an open file sitting around. IOWs, open
all the files I wanted to modify, read them, drop the page cachend
then modify the block device directly.  And now the files full of
unverified content will now be certified as valid...

> > Seriously, if someone can modify your block device directly then
> > you've already lost and no amount of after-the-fact verification
> > will save you.
> 
> Are you talking about offline or online modification?
> Integrity protection against offline modification..
> Online is protected by Access Control...

Either. Both. It doesn't matter. Someone modifying the block device
directly means they either have already broken your access controls
or they have physical access to your machine.

> Are there any ways to detect that any of the pages have been dropped
> from the kernel page cache?

I don't think there's any reliable method for detecting that as
present.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] fiemap: move i_op->fiemap() locking to the ioctl_fiemap()
  2012-09-27  2:12             ` Dave Chinner
@ 2012-09-27  7:43               ` Kasatkin, Dmitry
  2012-09-27 12:35                 ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Kasatkin, Dmitry @ 2012-09-27  7:43 UTC (permalink / raw)
  To: Dave Chinner
  Cc: sandeen, viro, swhiteho, tytso, linux-fsdevel,
	linux-security-module, zohar, Alan Cox

On Thu, Sep 27, 2012 at 5:12 AM, Dave Chinner <david@fromorbit.com> wrote:
> On Wed, Sep 26, 2012 at 11:22:14AM +0300, Kasatkin, Dmitry wrote:
>> On Tue, Sep 25, 2012 at 4:56 AM, Dave Chinner <david@fromorbit.com> wrote:
>> > On Mon, Sep 24, 2012 at 02:28:15PM +0300, Kasatkin, Dmitry wrote:
>> >> On Mon, Sep 24, 2012 at 12:18 PM, Dave Chinner <david@fromorbit.com> wrote:
>> >> > On Mon, Sep 24, 2012 at 11:13:30AM +0300, Kasatkin, Dmitry wrote:
>> >> >> One of the patches we are working on now would like to use extent map
>> >> >> information to prevent certain attack.
>> >> >
>> >> > I'm not a mind reader. What attack may that be?
>> >>
>> >> That is a simple attack of modifying inode block map so that different
>> >> files points to the same blocks.
>> >> FSCK calls them "multiply claimed blocks"...
>> >
>> > Just force users to run fsck before mounting?
>>
>> Not good for mobile device from boot time point of view.
>
> Not good for a server with a 100TB filesystem either.
>
>> > Better question: why won't file data integrity checks detect such
>> > errors? Data integrity checks will only pass if the duplicate block
>> > has identical data, or someone is clever enough to find an existing
>> > block on disk that magically causes a hash collision. Finding a hash
>> > collision already on disk is so unlikely that someone who has the
>> > access and ability to calculate a hash collision will simply
>> > overwrite the file contents directly....
>>
>> No.. that is not about a collision...
>
> It's just one possibility the moment you start talking about offline
> modification of a block device.
>
>> Possible attack to IMA - I have done it...
>>
>> 1. make a file copy.
>>     now we have 2 files with the same data and hash.
>> 2. modify block map that second file points to the blocks of first...
>>     IMA verification succeeds, because content is the same
>> 3. write some data to second file...
>>     it will also change content of first file.
>
> And so IMA data verification will then fail on the first file when
> it is read, and then you can go and find the problem causing the
> verification error. I fail to see the problem at this point.

You see it bellow...

>
>> 4. create some low memory pressure to force kernel to drop pages.
>> 5. open first file again.
>>     pages will be loaded again, but now verified by IMA, because inode
>>     is still in the cache and IMA still thinks that it is verified... BUM...
>
> Oh. You don't actually verify the contents every time it is loaded
> from disk. There's your problem.  IOWs, you won't even detect a
> filesystem corruption that causes this problem if the files have
> already been verified and the inodes are still in memory.
>

Like that.

> If I were an attacker, I could easily prevent detection from
> occurring simply by leaving an open file sitting around. IOWs, open
> all the files I wanted to modify, read them, drop the page cachend
> then modify the block device directly.  And now the files full of
> unverified content will now be certified as valid...
>

That is online attack. You need to be a root to modify block device directly...
But if you already became a root - game over.
Online protection is done by access control..
IMA protects against offline modification..

>> > Seriously, if someone can modify your block device directly then
>> > you've already lost and no amount of after-the-fact verification
>> > will save you.
>>
>> Are you talking about offline or online modification?
>> Integrity protection against offline modification..
>> Online is protected by Access Control...
>
> Either. Both. It doesn't matter. Someone modifying the block device
> directly means they either have already broken your access controls
> or they have physical access to your machine.
>

It is different...
You might have physical access to your machine storage but no root access.
You could remove storage and plug it to another machine, modify data.
For example you could add user to /etc/sudoers or change root password.
But when when you plug it back to your machine, IMA verification fails,
and you will not be able to become a root.

>> Are there any ways to detect that any of the pages have been dropped
>> from the kernel page cache?
>
> I don't think there's any reliable method for detecting that as
> present.


May be inode->i_mapping->nrpages counter might indicate that.
mm/filemap.c does ++ or --

Thanks for discussion..

- Dmitry

>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] fiemap: move i_op->fiemap() locking to the ioctl_fiemap()
  2012-09-27  7:43               ` Kasatkin, Dmitry
@ 2012-09-27 12:35                 ` Dave Chinner
  2012-09-27 13:11                   ` Kasatkin, Dmitry
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2012-09-27 12:35 UTC (permalink / raw)
  To: Kasatkin, Dmitry
  Cc: sandeen, viro, swhiteho, tytso, linux-fsdevel,
	linux-security-module, zohar, Alan Cox

On Thu, Sep 27, 2012 at 10:43:15AM +0300, Kasatkin, Dmitry wrote:
> On Thu, Sep 27, 2012 at 5:12 AM, Dave Chinner <david@fromorbit.com> wrote:
> > On Wed, Sep 26, 2012 at 11:22:14AM +0300, Kasatkin, Dmitry wrote:
> > If I were an attacker, I could easily prevent detection from
> > occurring simply by leaving an open file sitting around. IOWs, open
> > all the files I wanted to modify, read them, drop the page cachend
> > then modify the block device directly.  And now the files full of
> > unverified content will now be certified as valid...
> >
> 
> That is online attack. You need to be a root to modify block device directly...
> But if you already became a root - game over.
> Online protection is done by access control..
> IMA protects against offline modification..
> 
> >> > Seriously, if someone can modify your block device directly then
> >> > you've already lost and no amount of after-the-fact verification
> >> > will save you.
> >>
> >> Are you talking about offline or online modification?
> >> Integrity protection against offline modification..
> >> Online is protected by Access Control...
> >
> > Either. Both. It doesn't matter. Someone modifying the block device
> > directly means they either have already broken your access controls
> > or they have physical access to your machine.
> >
> 
> It is different...
> You might have physical access to your machine storage but no root access.
> You could remove storage and plug it to another machine, modify data.
> For example you could add user to /etc/sudoers or change root password.
> But when when you plug it back to your machine, IMA verification fails,
> and you will not be able to become a root.

Three words: Full disk encryption.

Besides, you're still missing the obvious attack for both online and
offline attackers: replace the kernel or change kernel command line
parameters to turn off IMA/EVM verification...

> >> Are there any ways to detect that any of the pages have been dropped
> >> from the kernel page cache?
> >
> > I don't think there's any reliable method for detecting that as
> > present.
> 
> 
> May be inode->i_mapping->nrpages counter might indicate that.
> mm/filemap.c does ++ or --

Remove a page, add a page. nr_pages is still the same. Not reliable.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] fiemap: move i_op->fiemap() locking to the ioctl_fiemap()
  2012-09-27 12:35                 ` Dave Chinner
@ 2012-09-27 13:11                   ` Kasatkin, Dmitry
  0 siblings, 0 replies; 14+ messages in thread
From: Kasatkin, Dmitry @ 2012-09-27 13:11 UTC (permalink / raw)
  To: Dave Chinner
  Cc: sandeen, viro, swhiteho, tytso, linux-fsdevel,
	linux-security-module, zohar, Alan Cox

On Thu, Sep 27, 2012 at 3:35 PM, Dave Chinner <david@fromorbit.com> wrote:
> On Thu, Sep 27, 2012 at 10:43:15AM +0300, Kasatkin, Dmitry wrote:
>> On Thu, Sep 27, 2012 at 5:12 AM, Dave Chinner <david@fromorbit.com> wrote:
>> > On Wed, Sep 26, 2012 at 11:22:14AM +0300, Kasatkin, Dmitry wrote:
>> > If I were an attacker, I could easily prevent detection from
>> > occurring simply by leaving an open file sitting around. IOWs, open
>> > all the files I wanted to modify, read them, drop the page cachend
>> > then modify the block device directly.  And now the files full of
>> > unverified content will now be certified as valid...
>> >
>>
>> That is online attack. You need to be a root to modify block device directly...
>> But if you already became a root - game over.
>> Online protection is done by access control..
>> IMA protects against offline modification..
>>
>> >> > Seriously, if someone can modify your block device directly then
>> >> > you've already lost and no amount of after-the-fact verification
>> >> > will save you.
>> >>
>> >> Are you talking about offline or online modification?
>> >> Integrity protection against offline modification..
>> >> Online is protected by Access Control...
>> >
>> > Either. Both. It doesn't matter. Someone modifying the block device
>> > directly means they either have already broken your access controls
>> > or they have physical access to your machine.
>> >
>>
>> It is different...
>> You might have physical access to your machine storage but no root access.
>> You could remove storage and plug it to another machine, modify data.
>> For example you could add user to /etc/sudoers or change root password.
>> But when when you plug it back to your machine, IMA verification fails,
>> and you will not be able to become a root.
>
> Three words: Full disk encryption.
>

That is known and always preferred option when hw and battery allows..

> Besides, you're still missing the obvious attack for both online and
> offline attackers: replace the kernel or change kernel command line
> parameters to turn off IMA/EVM verification...
>

I do not miss anything..
That is obvious. All that relies on "secure/trusted boot" when
bootloader & kernel & modules are verified..
And of course user is not able to pass own parameters...

That is also applied to LSM, when user should not be able to disable
SELinux or Smack...

>> >> Are there any ways to detect that any of the pages have been dropped
>> >> from the kernel page cache?
>> >
>> > I don't think there's any reliable method for detecting that as
>> > present.
>>
>>
>> May be inode->i_mapping->nrpages counter might indicate that.
>> mm/filemap.c does ++ or --
>
> Remove a page, add a page. nr_pages is still the same. Not reliable.
>

yes

Thanks
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2012-09-27 13:11 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-21 14:14 [PATCH 1/1] fiemap: move i_op->fiemap() locking to the ioctl_fiemap() Dmitry Kasatkin
2012-09-21 14:27 ` Steven Whitehouse
2012-09-21 14:33   ` Kasatkin, Dmitry
2012-09-21 14:39     ` Steven Whitehouse
2012-09-21 22:59 ` Dave Chinner
2012-09-24  8:13   ` Kasatkin, Dmitry
2012-09-24  9:18     ` Dave Chinner
2012-09-24 11:28       ` Kasatkin, Dmitry
2012-09-25  1:56         ` Dave Chinner
2012-09-26  8:22           ` Kasatkin, Dmitry
2012-09-27  2:12             ` Dave Chinner
2012-09-27  7:43               ` Kasatkin, Dmitry
2012-09-27 12:35                 ` Dave Chinner
2012-09-27 13:11                   ` Kasatkin, Dmitry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).