linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Race: data race between ext4_setattr() and acl_permission_check()
@ 2020-11-30 17:55 Gong, Sishuai
  2020-11-30 18:23 ` Theodore Y. Ts'o
  0 siblings, 1 reply; 3+ messages in thread
From: Gong, Sishuai @ 2020-11-30 17:55 UTC (permalink / raw)
  To: viro@zeniv.linux.org.uk; +Cc: linux-fsdevel@vger.kernel.org

Hi,

We found a data race in linux kernel 5.3.11 that we are able to reproduce in x86 under specific interleavings. Currently, we are not sure about the consequence of this race so we would like to confirm with the community if this can be a harmful bug.

------------------------------------------
Writer site

 /tmp/tmp.B7zb7od2zE-5.3.11/extract/linux-5.3.11/fs/ext4/inode.c:5599
       5579              error = PTR_ERR(handle);
       5580              goto err_out;
       5581          }
       5582
       5583          /* dquot_transfer() calls back ext4_get_inode_usage() which
       5584           * counts xattr inode references.
       5585           */
       5586          down_read(&EXT4_I(inode)->xattr_sem);
       5587          error = dquot_transfer(inode, attr);
       5588          up_read(&EXT4_I(inode)->xattr_sem);
       5589
       5590          if (error) {
       5591              ext4_journal_stop(handle);
       5592              return error;
       5593          }
       5594          /* Update corresponding info in inode so that everything is in
       5595           * one transaction */
       5596          if (attr->ia_valid & ATTR_UID)
       5597              inode->i_uid = attr->ia_uid;
       5598          if (attr->ia_valid & ATTR_GID)
 ==>   5599              inode->i_gid = attr->ia_gid;
       5600          error = ext4_mark_inode_dirty(handle, inode);
       5601          ext4_journal_stop(handle);
       5602      }
       5603
       5604      if (attr->ia_valid & ATTR_SIZE) {
       5605          handle_t *handle;
       5606          loff_t oldsize = inode->i_size;
       5607          int shrink = (attr->ia_size < inode->i_size);
       5608
       5609          if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) {
       5610              struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
       5611
       5612              if (attr->ia_size > sbi->s_bitmap_maxbytes)
       5613                  return -EFBIG;
       5614          }
       5615          if (!S_ISREG(inode->i_mode))
       5616              return -EINVAL;
       5617
       5618          if (IS_I_VERSION(inode) && attr->ia_size != inode->i_size)
       5619              inode_inc_iversion(inode);

------------------------------------------
Reader site

/tmp/tmp.B7zb7od2zE-5.3.11/extract/linux-5.3.11/fs/namei.c:306
        286
        287      return -EAGAIN;
        288  }
        289
        290  /*
        291   * This does the basic permission checking
        292   */
        293  static int acl_permission_check(struct inode *inode, int mask)
        294  {
        295      unsigned int mode = inode->i_mode;
        296
        297      if (likely(uid_eq(current_fsuid(), inode->i_uid)))
        298          mode >>= 6;
        299      else {
        300          if (IS_POSIXACL(inode) && (mode & S_IRWXG)) {
        301              int error = check_acl(inode, mask);
        302              if (error != -EAGAIN)
        303                  return error;
        304          }
        305
 ==>    306          if (in_group_p(inode->i_gid))
        307              mode >>= 3;
        308      }
        309
        310      /*
        311       * If the DACs are ok we don't need any capability check.
        312       */
        313      if ((mask & ~mode & (MAY_READ | MAY_WRITE | MAY_EXEC)) == 0)
        314          return 0;
        315      return -EACCES;
        316  }
        317
------------------------------------------
Writer calling trace

- do_fchownat
-- chown_common
--- notify_change

------------------------------------------
Reader calling trace

- do_execve
-- __do_execve_file
--- do_open_execat
---- do_filp_open
----- path_openat
------ link_path_walk
------- inode_permission
-------- generic_permission



Thanks,
Sishuai


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Race: data race between ext4_setattr() and acl_permission_check()
  2020-11-30 17:55 Race: data race between ext4_setattr() and acl_permission_check() Gong, Sishuai
@ 2020-11-30 18:23 ` Theodore Y. Ts'o
  2020-12-01 20:33   ` Gong, Sishuai
  0 siblings, 1 reply; 3+ messages in thread
From: Theodore Y. Ts'o @ 2020-11-30 18:23 UTC (permalink / raw)
  To: Gong, Sishuai; +Cc: viro@zeniv.linux.org.uk, linux-fsdevel@vger.kernel.org

On Mon, Nov 30, 2020 at 05:55:20PM +0000, Gong, Sishuai wrote:
> Hi,
> 
> We found a data race in linux kernel 5.3.11 that we are able to
> reproduce in x86 under specific interleavings. Currently, we are not
> sure about the consequence of this race so we would like to confirm
> with the community if this can be a harmful bug.

What do you mean by "data race" in this context?  If you have one
process setting a file's group id, while another process is trying to
open that same file and that open is depending on the process's group
access vs the file's group id, the open might succeed or fail
depending on whether the chgrp(2) is executed before or after the
open(2).

I could see data race if in some other context of the file open, the
group id is read, and so the group id might be inconsistent during the
course of operating on a particular system call.  In which case, we
might need to cache the group id value, or take some kind of lock,
etc.

But if your automated tool can't distinguish whether or not this is
the case, I'll gently suggest that it's not particuarly useful....
Maybe this is something that should be the subject of further
research?  The whole point of automated tools, after all, is to save
developers' time.  And not asking them to chase down potential
questions like "so is this real or not"?

Cheers,

					- Ted

> 
> ------------------------------------------
> Writer site
> 
>  /tmp/tmp.B7zb7od2zE-5.3.11/extract/linux-5.3.11/fs/ext4/inode.c:5599
>        5579              error = PTR_ERR(handle);
>        5580              goto err_out;
>        5581          }
>        5582
>        5583          /* dquot_transfer() calls back ext4_get_inode_usage() which
>        5584           * counts xattr inode references.
>        5585           */
>        5586          down_read(&EXT4_I(inode)->xattr_sem);
>        5587          error = dquot_transfer(inode, attr);
>        5588          up_read(&EXT4_I(inode)->xattr_sem);
>        5589
>        5590          if (error) {
>        5591              ext4_journal_stop(handle);
>        5592              return error;
>        5593          }
>        5594          /* Update corresponding info in inode so that everything is in
>        5595           * one transaction */
>        5596          if (attr->ia_valid & ATTR_UID)
>        5597              inode->i_uid = attr->ia_uid;
>        5598          if (attr->ia_valid & ATTR_GID)
>  ==>   5599              inode->i_gid = attr->ia_gid;
>        5600          error = ext4_mark_inode_dirty(handle, inode);
>        5601          ext4_journal_stop(handle);
>        5602      }
>        5603
>        5604      if (attr->ia_valid & ATTR_SIZE) {
>        5605          handle_t *handle;
>        5606          loff_t oldsize = inode->i_size;
>        5607          int shrink = (attr->ia_size < inode->i_size);
>        5608
>        5609          if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) {
>        5610              struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
>        5611
>        5612              if (attr->ia_size > sbi->s_bitmap_maxbytes)
>        5613                  return -EFBIG;
>        5614          }
>        5615          if (!S_ISREG(inode->i_mode))
>        5616              return -EINVAL;
>        5617
>        5618          if (IS_I_VERSION(inode) && attr->ia_size != inode->i_size)
>        5619              inode_inc_iversion(inode);
> 
> ------------------------------------------
> Reader site
> 
> /tmp/tmp.B7zb7od2zE-5.3.11/extract/linux-5.3.11/fs/namei.c:306
>         286
>         287      return -EAGAIN;
>         288  }
>         289
>         290  /*
>         291   * This does the basic permission checking
>         292   */
>         293  static int acl_permission_check(struct inode *inode, int mask)
>         294  {
>         295      unsigned int mode = inode->i_mode;
>         296
>         297      if (likely(uid_eq(current_fsuid(), inode->i_uid)))
>         298          mode >>= 6;
>         299      else {
>         300          if (IS_POSIXACL(inode) && (mode & S_IRWXG)) {
>         301              int error = check_acl(inode, mask);
>         302              if (error != -EAGAIN)
>         303                  return error;
>         304          }
>         305
>  ==>    306          if (in_group_p(inode->i_gid))
>         307              mode >>= 3;
>         308      }
>         309
>         310      /*
>         311       * If the DACs are ok we don't need any capability check.
>         312       */
>         313      if ((mask & ~mode & (MAY_READ | MAY_WRITE | MAY_EXEC)) == 0)
>         314          return 0;
>         315      return -EACCES;
>         316  }
>         317
> ------------------------------------------
> Writer calling trace
> 
> - do_fchownat
> -- chown_common
> --- notify_change
> 
> ------------------------------------------
> Reader calling trace
> 
> - do_execve
> -- __do_execve_file
> --- do_open_execat
> ---- do_filp_open
> ----- path_openat
> ------ link_path_walk
> ------- inode_permission
> -------- generic_permission
> 
> 
> 
> Thanks,
> Sishuai
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Race: data race between ext4_setattr() and acl_permission_check()
  2020-11-30 18:23 ` Theodore Y. Ts'o
@ 2020-12-01 20:33   ` Gong, Sishuai
  0 siblings, 0 replies; 3+ messages in thread
From: Gong, Sishuai @ 2020-12-01 20:33 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: viro@zeniv.linux.org.uk, linux-fsdevel@vger.kernel.org

Thanks for your feedback and we really value your opinion. Our project aims at finding rare data races quickly so diagnosing is not yet in our scope for now, but it definitely shouldn't be an excuse. We did some further analysis and we would like to report it as well. Hope this could be helpful.


------------------------------------------
Interleaving
We observed two different interleavings of two instructions, as illustrated below.

Interleaving 1
Writer							Reader
							if (in_group_p(inode->i_gid))
							// read value = 0 (initial value)
inode->i_gid = attr->ia_gid
// write value == 0xee00
...	
							if ((mask & ~mode & (MAY_READ | MAY_WRITE | MAY_EXEC)) == 0)
return 0;

Interleaving 2
Writer							Reader
inode->i_gid = attr->ia_gid
// write value == 0xee00
							if (in_group_p(inode->i_gid))
							// read value = 0xee00			
							...
							if ((mask & ~mode & (MAY_READ | MAY_WRITE | MAY_EXEC)) == 0)
return 0;

However, in both cases, the function acl_permission_check() returns will value 0, so we were not able to find any explicit impact it may have. If this attribute is lock-free semantically, then it should be fine.


In fact, we also found another writer site that could race with the reader. We noticed that both two writers are protected so we were curious about the reader. 
------------------------------------------
Another writer
/tmp/tmp.B7zb7od2zE-5.3.11/extract/linux-5.3.11/fs/attr.c:185
        165   * @inode:  the inode to be updated
        166   * @attr:   the new attributes
        167   *
        168   * setattr_copy must be called with i_mutex held.
        169   *
        170   * setattr_copy updates the inode's metadata with that specified
        171   * in attr. Noticeably missing is inode size update, which is more complex
        172   * as it requires pagecache updates.
        173   *
        174   * The inode is not marked as dirty after this operation. The rationale is
        175   * that for "simple" filesystems, the struct inode is the inode storage.
        176   * The caller is free to mark the inode dirty afterwards if needed.
        177   */
        178  void setattr_copy(struct inode *inode, const struct iattr *attr)
        179  {
        180      unsigned int ia_valid = attr->ia_valid;
        181
        182      if (ia_valid & ATTR_UID)
        183          inode->i_uid = attr->ia_uid;
        184      if (ia_valid & ATTR_GID)
 ==>    185          inode->i_gid = attr->ia_gid;
        186      if (ia_valid & ATTR_ATIME)
        187          inode->i_atime = timespec64_trunc(attr->ia_atime,
        188                            inode->i_sb->s_time_gran);
        189      if (ia_valid & ATTR_MTIME)
        190          inode->i_mtime = timespec64_trunc(attr->ia_mtime,
        191                            inode->i_sb->s_time_gran);
        192      if (ia_valid & ATTR_CTIME)
        193          inode->i_ctime = timespec64_trunc(attr->ia_ctime,
        194                            inode->i_sb->s_time_gran);
        195      if (ia_valid & ATTR_MODE) {
        196          umode_t mode = attr->ia_mode;
        197
        198          if (!in_group_p(inode->i_gid) &&
        199              !capable_wrt_inode_uidgid(inode, CAP_FSETID))
        200              mode &= ~S_ISGID;
        201          inode->i_mode = mode;
        202      }
        203  }

		
Thanks,
Sishuai

> On Nov 30, 2020, at 1:23 PM, Theodore Y. Ts'o <tytso@mit.edu> wrote:
> 
> On Mon, Nov 30, 2020 at 05:55:20PM +0000, Gong, Sishuai wrote:
>> Hi,
>> 
>> We found a data race in linux kernel 5.3.11 that we are able to
>> reproduce in x86 under specific interleavings. Currently, we are not
>> sure about the consequence of this race so we would like to confirm
>> with the community if this can be a harmful bug.
> 
> What do you mean by "data race" in this context?  If you have one
> process setting a file's group id, while another process is trying to
> open that same file and that open is depending on the process's group
> access vs the file's group id, the open might succeed or fail
> depending on whether the chgrp(2) is executed before or after the
> open(2).
> 
> I could see data race if in some other context of the file open, the
> group id is read, and so the group id might be inconsistent during the
> course of operating on a particular system call.  In which case, we
> might need to cache the group id value, or take some kind of lock,
> etc.
> 
> But if your automated tool can't distinguish whether or not this is
> the case, I'll gently suggest that it's not particuarly useful....
> Maybe this is something that should be the subject of further
> research?  The whole point of automated tools, after all, is to save
> developers' time.  And not asking them to chase down potential
> questions like "so is this real or not"?
> 
> Cheers,
> 
> 					- Ted
> 
>> 
>> ------------------------------------------
>> Writer site
>> 
>> /tmp/tmp.B7zb7od2zE-5.3.11/extract/linux-5.3.11/fs/ext4/inode.c:5599
>>       5579              error = PTR_ERR(handle);
>>       5580              goto err_out;
>>       5581          }
>>       5582
>>       5583          /* dquot_transfer() calls back ext4_get_inode_usage() which
>>       5584           * counts xattr inode references.
>>       5585           */
>>       5586          down_read(&EXT4_I(inode)->xattr_sem);
>>       5587          error = dquot_transfer(inode, attr);
>>       5588          up_read(&EXT4_I(inode)->xattr_sem);
>>       5589
>>       5590          if (error) {
>>       5591              ext4_journal_stop(handle);
>>       5592              return error;
>>       5593          }
>>       5594          /* Update corresponding info in inode so that everything is in
>>       5595           * one transaction */
>>       5596          if (attr->ia_valid & ATTR_UID)
>>       5597              inode->i_uid = attr->ia_uid;
>>       5598          if (attr->ia_valid & ATTR_GID)
>> ==>   5599              inode->i_gid = attr->ia_gid;
>>       5600          error = ext4_mark_inode_dirty(handle, inode);
>>       5601          ext4_journal_stop(handle);
>>       5602      }
>>       5603
>>       5604      if (attr->ia_valid & ATTR_SIZE) {
>>       5605          handle_t *handle;
>>       5606          loff_t oldsize = inode->i_size;
>>       5607          int shrink = (attr->ia_size < inode->i_size);
>>       5608
>>       5609          if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) {
>>       5610              struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
>>       5611
>>       5612              if (attr->ia_size > sbi->s_bitmap_maxbytes)
>>       5613                  return -EFBIG;
>>       5614          }
>>       5615          if (!S_ISREG(inode->i_mode))
>>       5616              return -EINVAL;
>>       5617
>>       5618          if (IS_I_VERSION(inode) && attr->ia_size != inode->i_size)
>>       5619              inode_inc_iversion(inode);
>> 
>> ------------------------------------------
>> Reader site
>> 
>> /tmp/tmp.B7zb7od2zE-5.3.11/extract/linux-5.3.11/fs/namei.c:306
>>        286
>>        287      return -EAGAIN;
>>        288  }
>>        289
>>        290  /*
>>        291   * This does the basic permission checking
>>        292   */
>>        293  static int acl_permission_check(struct inode *inode, int mask)
>>        294  {
>>        295      unsigned int mode = inode->i_mode;
>>        296
>>        297      if (likely(uid_eq(current_fsuid(), inode->i_uid)))
>>        298          mode >>= 6;
>>        299      else {
>>        300          if (IS_POSIXACL(inode) && (mode & S_IRWXG)) {
>>        301              int error = check_acl(inode, mask);
>>        302              if (error != -EAGAIN)
>>        303                  return error;
>>        304          }
>>        305
>> ==>    306          if (in_group_p(inode->i_gid))
>>        307              mode >>= 3;
>>        308      }
>>        309
>>        310      /*
>>        311       * If the DACs are ok we don't need any capability check.
>>        312       */
>>        313      if ((mask & ~mode & (MAY_READ | MAY_WRITE | MAY_EXEC)) == 0)
>>        314          return 0;
>>        315      return -EACCES;
>>        316  }
>>        317
>> ------------------------------------------
>> Writer calling trace
>> 
>> - do_fchownat
>> -- chown_common
>> --- notify_change
>> 
>> ------------------------------------------
>> Reader calling trace
>> 
>> - do_execve
>> -- __do_execve_file
>> --- do_open_execat
>> ---- do_filp_open
>> ----- path_openat
>> ------ link_path_walk
>> ------- inode_permission
>> -------- generic_permission
>> 
>> 
>> 
>> Thanks,
>> Sishuai
>> 


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-12-01 20:34 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-11-30 17:55 Race: data race between ext4_setattr() and acl_permission_check() Gong, Sishuai
2020-11-30 18:23 ` Theodore Y. Ts'o
2020-12-01 20:33   ` Gong, Sishuai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).