From: Mikulas Patocka <mpatocka@redhat.com>
To: Neil Brown <neilb@suse.de>
Cc: linux-kernel@vger.kernel.org, dm-devel@redhat.com,
linux-fsdevel@vger.kernel.org, jaxboe@fusionio.com,
Milan Broz <mbroz@redhat.com>, Alasdair G Kergon <agk@redhat.com>
Subject: Re: i_size misuses
Date: Tue, 19 Oct 2010 20:09:17 -0400 (EDT) [thread overview]
Message-ID: <Pine.LNX.4.64.1010192000490.6227@hs20-bc2-1.build.redhat.com> (raw)
In-Reply-To: <20101020093316.346926c4@notabene>
On Wed, 20 Oct 2010, Neil Brown wrote:
> On Tue, 19 Oct 2010 14:55:31 -0400 (EDT)
> Mikulas Patocka <mpatocka@redhat.com> wrote:
>
> >
> >
> > On Sun, 12 Sep 2010, Neil Brown wrote:
> >
> > > On Wed, 8 Sep 2010 09:32:13 -0400 (EDT)
> > > Mikulas Patocka <mpatocka@redhat.com> wrote:
> > >
> > > > Hi
> > > >
> > > > when reviewing some i_size problem, I searched the kernel for i_size usage
> > > > (the variable should really be written with i_size_write and read with
> > > > i_size_read).
> > > >
> > > > Properly locked direct use of "i_size" inside memory management or
> > > > filesystems may not be a problem, but there are many problems in general
> > > > code outside mm.
> > > >
> > > > The misuses are:
> > > > SOUND/SOUND_FIRMWARE.C:l = filp->f_path.dentry->d_inode->i_size;
> > > > KERNEL/RELAY.C:buf->dentry->d_inode->i_size = buf->early_bytes;
> > > > KERNEL/RELAY.C:buf->dentry->d_inode->i_size += buf->chan->subbuf_size
> > > > -buf->padding[old_subbuf];
> > > > DRIVERS/USB/CORE/INODE.C:dev->usbfs_dentry->d_inode->i_size = i_size;
> > > > DRIVERS/MTD/DEVICES/BLOCK2MTD.C:dev->mtd.size =
> > > > dev->blkdev->bd_inode->i_size & PAGE_MASK;
> > > > DRIVERS/MD/MD.C: many reads of i_size
> > > > DRIVERS/BLOCK/NBD.C: many writes to i_size without i_size_write
> > > > DRIVERS/BLOCK/DRBD/DRBD_INT.H: return bdev ? bdev->bd_inode->i_size >> 9 : 0;
> > > > DRIVERS/BLOCK/DRBD/DRBD_WRAPPERS.H: mdev->this_bdev->bd_inode->i_size =
> > > > (loff_t)size << 9;
> > > > BLOCK/BLK-CORE.C:printk(KERN_INFO "%s: rw=%ld, want=%Lu, limit=%Lu\n",
> > > > bdevname(bio->bi_bdev, b),
> > > > bio->bi_rw,
> > > > (unsigned long long)bio->bi_sector + bio_sectors(bio),
> > > > (long long)(bio->bi_bdev->bd_inode->i_size >> 9));
> > > > maxsector = bio->bi_bdev->bd_inode->i_size >> 9;
> > > > BLOCK/COMPAT_IOCTL.C: size = bdev->bd_inode->i_size;
> > > > return compat_put_u64(arg, bdev->bd_inode->i_size);
> > > > BLOCK/IOCTL.C: if (start + len > (bdev->bd_inode->i_size >> 9))
> > > > size = bdev->bd_inode->i_size;
> > > > return put_u64(arg, bdev->bd_inode->i_size);
> > > >
> > > > The problem with this code is that if you read i_size without i_size_read
> > > > and the size wraps around 32 bits, for example from 0xffffffff to
> > > > 0x100000000 , there is a possibility on 32-bit machines to read an invalid
> > > > value (either 0 or 0x1ffffffff). Similarly, if you write i_size without
> > > > i_size_write, the readers can see intermediate invalid values.
> > > >
> > > >
> > > > The original problem that caused this investigation is the question, how a
> > > > block device driver can change the size of its device. Normal method (used
> > > > in a few drivers, including dm), consists of
> > > > mutex_lock(&inode->i_mutex);
> > > > i_size_write(inode, new_size);
> > > > mutex_unlock(&inode->i_mutex);
> > >
> > > Don't you just do
> > >
> > > set_capacity(gendisk, sectors);
> > > revalidate(gendisk);
> > >
> > > ??
> > >
> > > NeilBrown
> >
> > revalidate_disk() has still the problem that it changes i_size without
> > holding i_mutex and other kernel parts (for example generic_file_llseek)
> > assume that if we hold the lock, i_size_can't be changed.
>
> generic_file_llseek is not used for block devices. They use block_llseek
> which uses i_size_read, so I think it is safe.
>
> >
> > The rules for accessing i_size must be specified and followed.
>
> I agree. However the rules can be different for different file systems and
> file types.
> A filesystem that used the generic_* function would need to only change
> i_size under i_mutex as you say.
> For block devices it appears that the rule is that it can only be changed
> under bd_mutex.
> For 'relay' (which you mentioned above), it seem the relevant mutex is the
> global relay_channels_mutex, though I didn't read the code thoroughly to be
> sure.
You are right.
> It still would probably be useful to review all the i_size related code to
> ensure that it is safe, but you should not assume that everything follows the
> same rules. So first you need to work out the rule for a given subsystem,
> then audit it against that rule (and maybe document that rule if it isn't
> already documented!)
>
> NeilBrown
If you specify complex rules (i_size can be read directly under i_mutex
for files, but can't be read for block devices under i_mutex), it will
only complicate review and it may introduce bugs in the future ---
remember that Linux doesn't have a formal specification and the
specification is inferred from the code --- thus, you use
i_size_read(inode) somewhere and inode->i_size elsewhere, people will
infer a wrong specification --- as shown above on the few examples of
i_size abusers.
I'd change i_size to __i_size (this will warn people not to touch it) and
convert all accesses to i_size_read and i_size_write.
Mikulas
prev parent reply other threads:[~2010-10-20 0:09 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-08 13:32 i_size misuses Mikulas Patocka
2010-09-11 22:05 ` Neil Brown
2010-10-19 18:55 ` Mikulas Patocka
2010-10-19 22:33 ` Neil Brown
2010-10-20 0:09 ` Mikulas Patocka [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.1010192000490.6227@hs20-bc2-1.build.redhat.com \
--to=mpatocka@redhat.com \
--cc=agk@redhat.com \
--cc=dm-devel@redhat.com \
--cc=jaxboe@fusionio.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mbroz@redhat.com \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).