linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Neil Brown <neilb@suse.de>
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: linux-kernel@vger.kernel.org, dm-devel@redhat.com,
	linux-fsdevel@vger.kernel.org, jaxboe@fusionio.com,
	Milan Broz <mbroz@redhat.com>, Alasdair G Kergon <agk@redhat.com>
Subject: Re: i_size misuses
Date: Sun, 12 Sep 2010 08:05:43 +1000	[thread overview]
Message-ID: <20100912080543.34650074@notabene> (raw)
In-Reply-To: <Pine.LNX.4.64.1009080833180.16126@hs20-bc2-1.build.redhat.com>

On Wed, 8 Sep 2010 09:32:13 -0400 (EDT)
Mikulas Patocka <mpatocka@redhat.com> wrote:

> Hi
> 
> when reviewing some i_size problem, I searched the kernel for i_size usage 
> (the variable should really be written with i_size_write and read with 
> i_size_read).
> 
> Properly locked direct use of "i_size" inside memory management or 
> filesystems may not be a problem, but there are many problems in general 
> code outside mm.
> 
> The misuses are:
> SOUND/SOUND_FIRMWARE.C:l = filp->f_path.dentry->d_inode->i_size;
> KERNEL/RELAY.C:buf->dentry->d_inode->i_size = buf->early_bytes;
> KERNEL/RELAY.C:buf->dentry->d_inode->i_size += buf->chan->subbuf_size 
> -buf->padding[old_subbuf];
> DRIVERS/USB/CORE/INODE.C:dev->usbfs_dentry->d_inode->i_size = i_size;
> DRIVERS/MTD/DEVICES/BLOCK2MTD.C:dev->mtd.size = 
> dev->blkdev->bd_inode->i_size & PAGE_MASK;
> DRIVERS/MD/MD.C: many reads of i_size 
> DRIVERS/BLOCK/NBD.C: many writes to i_size without i_size_write
> DRIVERS/BLOCK/DRBD/DRBD_INT.H: return bdev ? bdev->bd_inode->i_size >> 9 : 0;
> DRIVERS/BLOCK/DRBD/DRBD_WRAPPERS.H: mdev->this_bdev->bd_inode->i_size = 
> (loff_t)size << 9;
> BLOCK/BLK-CORE.C:printk(KERN_INFO "%s: rw=%ld, want=%Lu, limit=%Lu\n",
>                 bdevname(bio->bi_bdev, b),
>                 bio->bi_rw,
>                 (unsigned long long)bio->bi_sector + bio_sectors(bio),
>                 (long long)(bio->bi_bdev->bd_inode->i_size >> 9));
> maxsector = bio->bi_bdev->bd_inode->i_size >> 9;
> BLOCK/COMPAT_IOCTL.C: size = bdev->bd_inode->i_size;
> return compat_put_u64(arg, bdev->bd_inode->i_size);
> BLOCK/IOCTL.C: if (start + len > (bdev->bd_inode->i_size >> 9))
> size = bdev->bd_inode->i_size;
> return put_u64(arg, bdev->bd_inode->i_size);
> 
> The problem with this code is that if you read i_size without i_size_read 
> and the size wraps around 32 bits, for example from 0xffffffff to 
> 0x100000000 , there is a possibility on 32-bit machines to read an invalid 
> value (either 0 or 0x1ffffffff). Similarly, if you write i_size without 
> i_size_write, the readers can see intermediate invalid values.
> 
> 
> The original problem that caused this investigation is the question, how a 
> block device driver can change the size of its device. Normal method (used 
> in a few drivers, including dm), consists of
> mutex_lock(&inode->i_mutex);
> i_size_write(inode, new_size);
> mutex_unlock(&inode->i_mutex);

Don't you just do

  set_capacity(gendisk, sectors);
  revalidate(gendisk);

??

NeilBrown

> 
> This is deadlock-prone, because i_mutex is also held on fsync path. 
> Therefore, this deadlock happens: fsync takes i_mutex and issues I/Os, 
> block device driver wants to change its size, so it waits on i_mutex, 
> the I/Os wait until the device driver did its internal maintenance and 
> changed the inode size. The driver doesn't change the size until fsync 
> finished.
> 
> Jens, as a block maintainer, please think about it and propose some 
> specification how to clean this up. Also a clean verifiable rule regarding 
> i_size should be specified and the code should be fixed to conform to the 
> rule: maybe we could rename i_size to __i_size and ban its using.
> 
> Mikulas
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

  reply	other threads:[~2010-09-11 22:05 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-08 13:32 i_size misuses Mikulas Patocka
2010-09-11 22:05 ` Neil Brown [this message]
2010-10-19 18:55   ` Mikulas Patocka
2010-10-19 22:33     ` Neil Brown
2010-10-20  0:09       ` Mikulas Patocka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100912080543.34650074@notabene \
    --to=neilb@suse.de \
    --cc=agk@redhat.com \
    --cc=dm-devel@redhat.com \
    --cc=jaxboe@fusionio.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mbroz@redhat.com \
    --cc=mpatocka@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).