* [PATCH] loop: prevent bdev freeing while device in use
@ 2013-04-01 11:58 Anatol Pomozov
2013-04-01 15:16 ` Linus Torvalds
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Anatol Pomozov @ 2013-04-01 11:58 UTC (permalink / raw)
To: torvalds, linux-kernel; +Cc: tytso, sqazi, viro, yan, Anatol Pomozov
struct block_device lifecycle is defined by its inode (see fs/block_dev.c) -
block_device allocated first time we access /dev/loopXX and deallocated on
bdev_destroy_inode. When we create the device "losetup /dev/loopXX afile"
we want that block_device stay alive until we destroy the loop device
with "losetup -d".
But because we do not hold /dev/loopXX inode its counter goes 0, and
inode/bdev can be destroyed at any moment. Usually it happens at memory
pressure or when user drops inode cache (like in the test below). When later in
loop_clr_fd() we want to use bdev we have use-after-free error with following
stack:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000280
bd_set_size+0x10/0xa0
loop_clr_fd+0x1f8/0x420 [loop]
lo_ioctl+0x200/0x7e0 [loop]
lo_compat_ioctl+0x47/0xe0 [loop]
compat_blkdev_ioctl+0x341/0x1290
do_filp_open+0x42/0xa0
compat_sys_ioctl+0xc1/0xf20
do_sys_open+0x16e/0x1d0
sysenter_dispatch+0x7/0x1a
To prevent use-after-free we need to hold device inode in loop_set_fd()
and put it later in loop_clr_fd().
The issue is reprodusible on current Linus head and v3.3. Here is the test:
dd if=/dev/zero of=loop.file bs=1M count=1
while [ true ]; do
losetup /dev/loop0 loop.file
echo 2 > /proc/sys/vm/drop_caches
losetup -d /dev/loop0
done
Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
---
drivers/block/loop.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index fe5f640..7ab9520 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -922,6 +922,14 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
lo->lo_flags |= LO_FLAGS_PARTSCAN;
if (lo->lo_flags & LO_FLAGS_PARTSCAN)
ioctl_by_bdev(bdev, BLKRRPART, 0);
+
+ /* bdev lifecycle is defined by its bd_inode (see
+ * struct bdev_inode usage). In case of loop device we need to make
+ * sure that bdev deallocation will not happen between loop_set_fd()
+ * and loop_clr_fd() invocations. To do this we need to hold
+ * bdev inode here and put it later in loop_clr_fd().
+ */
+ ihold(bdev->bd_inode);
return 0;
out_clr:
@@ -1031,8 +1039,11 @@ static int loop_clr_fd(struct loop_device *lo)
memset(lo->lo_encrypt_key, 0, LO_KEY_SIZE);
memset(lo->lo_crypt_name, 0, LO_NAME_SIZE);
memset(lo->lo_file_name, 0, LO_NAME_SIZE);
- if (bdev)
+ if (bdev) {
+ BUG_ON(atomic_read(&bdev->bd_inode->i_count) < 2);
+ iput(bdev->bd_inode);
invalidate_bdev(bdev);
+ }
set_capacity(lo->lo_disk, 0);
loop_sysfs_exit(lo);
if (bdev) {
--
1.8.1.3
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH] loop: prevent bdev freeing while device in use
2013-04-01 11:58 [PATCH] loop: prevent bdev freeing while device in use Anatol Pomozov
@ 2013-04-01 15:16 ` Linus Torvalds
2013-04-01 17:00 ` Anatol Pomozov
2013-04-01 16:28 ` Al Viro
2013-04-01 16:47 ` Anatol Pomozov
2 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2013-04-01 15:16 UTC (permalink / raw)
To: Anatol Pomozov
Cc: Linux Kernel Mailing List, Theodore Ts'o, Salman Qazi,
Al Viro, yan
On Mon, Apr 1, 2013 at 4:58 AM, Anatol Pomozov <anatol.pomozov@gmail.com> wrote:
>
> To prevent use-after-free we need to hold device inode in loop_set_fd()
> and put it later in loop_clr_fd().
Is there something that guarantees that there's only one loop_set_fd()
and one paired loop_clr_fd()?
IOW, what protects us from somebody doing loop_clr_fd() on a device
that hasn't been set up yet, or doing multiple loop_set_fd calls?
I suspect the "lo->lo_state" is part of the answer, but it's very much
not obvious, and I'd want this to be explicitly explained.
Linus
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] loop: prevent bdev freeing while device in use
2013-04-01 15:16 ` Linus Torvalds
@ 2013-04-01 17:00 ` Anatol Pomozov
2013-04-01 22:53 ` Linus Torvalds
0 siblings, 1 reply; 9+ messages in thread
From: Anatol Pomozov @ 2013-04-01 17:00 UTC (permalink / raw)
To: Linus Torvalds
Cc: Linux Kernel Mailing List, Theodore Ts'o, Salman Qazi,
Al Viro, Guo Chao
Hi
On Mon, Apr 1, 2013 at 8:16 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Apr 1, 2013 at 4:58 AM, Anatol Pomozov <anatol.pomozov@gmail.com> wrote:
>>
>> To prevent use-after-free we need to hold device inode in loop_set_fd()
>> and put it later in loop_clr_fd().
>
> Is there something that guarantees that there's only one loop_set_fd()
> and one paired loop_clr_fd()?
Yes there is such guarantee.
Every time we call loop_set_fd() we check that loop_device->lo_state
is Lo_unbound and set it to Lo_bound If somebody will try to set_fd
again it will get EBUSY. And if we try to loop_clr_fd() on unbound
loop device we'll get ENXIO.
loop_set_fd/loop_clr_fd (and any other loop ioctl) is called under
loop_device->lo_ctl_mutex.
>
> IOW, what protects us from somebody doing loop_clr_fd() on a device
> that hasn't been set up yet, or doing multiple loop_set_fd calls?
> I suspect the "lo->lo_state" is part of the answer, but it's very much
> not obvious, and I'd want this to be explicitly explained.
>
> Linus
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] loop: prevent bdev freeing while device in use
2013-04-01 17:00 ` Anatol Pomozov
@ 2013-04-01 22:53 ` Linus Torvalds
2013-04-02 5:58 ` Anatol Pomozov
0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2013-04-01 22:53 UTC (permalink / raw)
To: Anatol Pomozov, stable
Cc: Linux Kernel Mailing List, Theodore Ts'o, Salman Qazi,
Al Viro, Guo Chao
On Mon, Apr 1, 2013 at 10:00 AM, Anatol Pomozov
<anatol.pomozov@gmail.com> wrote:
> Hi
>
> On Mon, Apr 1, 2013 at 8:16 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>> On Mon, Apr 1, 2013 at 4:58 AM, Anatol Pomozov <anatol.pomozov@gmail.com> wrote:
>>>
>>> To prevent use-after-free we need to hold device inode in loop_set_fd()
>>> and put it later in loop_clr_fd().
>>
>> Is there something that guarantees that there's only one loop_set_fd()
>> and one paired loop_clr_fd()?
>
> Yes there is such guarantee.
>
> Every time we call loop_set_fd() we check that loop_device->lo_state
> is Lo_unbound and set it to Lo_bound If somebody will try to set_fd
> again it will get EBUSY. And if we try to loop_clr_fd() on unbound
> loop device we'll get ENXIO.
>
> loop_set_fd/loop_clr_fd (and any other loop ioctl) is called under
> loop_device->lo_ctl_mutex.
Ok, good enough for me, I applied it, and it's commit
c1681bf8a7b1b98edee8b862a42c19c4e53205fd in my tree.
I assume it should go to stable too, because none of this is new, is
it? Did you check how far back this applies? I assume this goes back
pretty much forever, no?
Linus
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] loop: prevent bdev freeing while device in use
2013-04-01 22:53 ` Linus Torvalds
@ 2013-04-02 5:58 ` Anatol Pomozov
2013-04-02 6:08 ` Al Viro
0 siblings, 1 reply; 9+ messages in thread
From: Anatol Pomozov @ 2013-04-02 5:58 UTC (permalink / raw)
To: Linus Torvalds, Ayan George
Cc: stable, Linux Kernel Mailing List, Theodore Ts'o, Salman Qazi,
Al Viro, Guo Chao
Hi
On Mon, Apr 1, 2013 at 3:53 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Apr 1, 2013 at 10:00 AM, Anatol Pomozov
> <anatol.pomozov@gmail.com> wrote:
>> Hi
>>
>> On Mon, Apr 1, 2013 at 8:16 AM, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>>> On Mon, Apr 1, 2013 at 4:58 AM, Anatol Pomozov <anatol.pomozov@gmail.com> wrote:
>>>>
>>>> To prevent use-after-free we need to hold device inode in loop_set_fd()
>>>> and put it later in loop_clr_fd().
>>>
>>> Is there something that guarantees that there's only one loop_set_fd()
>>> and one paired loop_clr_fd()?
>>
>> Yes there is such guarantee.
>>
>> Every time we call loop_set_fd() we check that loop_device->lo_state
>> is Lo_unbound and set it to Lo_bound If somebody will try to set_fd
>> again it will get EBUSY. And if we try to loop_clr_fd() on unbound
>> loop device we'll get ENXIO.
>>
>> loop_set_fd/loop_clr_fd (and any other loop ioctl) is called under
>> loop_device->lo_ctl_mutex.
>
> Ok, good enough for me, I applied it, and it's commit
> c1681bf8a7b1b98edee8b862a42c19c4e53205fd in my tree.
>
> I assume it should go to stable too, because none of this is new, is
> it? Did you check how far back this applies? I assume this goes back
> pretty much forever, no?
I bisected kernel using test from my commit and it points to
4c823cc3d568277aa6340d8df6981e34f4c4dee5 (appeared in kernel 3.2).
But even despite i cannot repro the crash on 3.0-stable, the
underlying issue (block_device is not locked) still exists there. So I
think patch should go to stable as well.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] loop: prevent bdev freeing while device in use
2013-04-02 5:58 ` Anatol Pomozov
@ 2013-04-02 6:08 ` Al Viro
0 siblings, 0 replies; 9+ messages in thread
From: Al Viro @ 2013-04-02 6:08 UTC (permalink / raw)
To: Anatol Pomozov
Cc: Linus Torvalds, Ayan George, stable, Linux Kernel Mailing List,
Theodore Ts'o, Salman Qazi, Guo Chao
On Mon, Apr 01, 2013 at 10:58:55PM -0700, Anatol Pomozov wrote:
> >>
> >> loop_set_fd/loop_clr_fd (and any other loop ioctl) is called under
> >> loop_device->lo_ctl_mutex.
> >
> > Ok, good enough for me, I applied it, and it's commit
> > c1681bf8a7b1b98edee8b862a42c19c4e53205fd in my tree.
> >
> > I assume it should go to stable too, because none of this is new, is
> > it? Did you check how far back this applies? I assume this goes back
> > pretty much forever, no?
>
> I bisected kernel using test from my commit and it points to
> 4c823cc3d568277aa6340d8df6981e34f4c4dee5 (appeared in kernel 3.2).
>
> But even despite i cannot repro the crash on 3.0-stable, the
> underlying issue (block_device is not locked) still exists there. So I
> think patch should go to stable as well.
... except that you are doing invalidate *after* having done bdput. Which
is probably valid (we have the same bdev pinned down by opened file used
to issue the ioclt), but it's a really bad style; this should be in opposite
order.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] loop: prevent bdev freeing while device in use
2013-04-01 11:58 [PATCH] loop: prevent bdev freeing while device in use Anatol Pomozov
2013-04-01 15:16 ` Linus Torvalds
@ 2013-04-01 16:28 ` Al Viro
2013-04-01 16:49 ` Anatol Pomozov
2013-04-01 16:47 ` Anatol Pomozov
2 siblings, 1 reply; 9+ messages in thread
From: Al Viro @ 2013-04-01 16:28 UTC (permalink / raw)
To: Anatol Pomozov; +Cc: torvalds, linux-kernel, tytso, sqazi, yan
On Mon, Apr 01, 2013 at 04:58:05AM -0700, Anatol Pomozov wrote:
> lo->lo_flags |= LO_FLAGS_PARTSCAN;
> if (lo->lo_flags & LO_FLAGS_PARTSCAN)
> ioctl_by_bdev(bdev, BLKRRPART, 0);
> +
> + /* bdev lifecycle is defined by its bd_inode (see
> + * struct bdev_inode usage). In case of loop device we need to make
> + * sure that bdev deallocation will not happen between loop_set_fd()
> + * and loop_clr_fd() invocations. To do this we need to hold
> + * bdev inode here and put it later in loop_clr_fd().
> + */
> + ihold(bdev->bd_inode);
That's open-coded bdgrab()
> + if (bdev) {
> + BUG_ON(atomic_read(&bdev->bd_inode->i_count) < 2);
> + iput(bdev->bd_inode);
... and that - bdput() (I'd drop that BUG_ON())
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] loop: prevent bdev freeing while device in use
2013-04-01 16:28 ` Al Viro
@ 2013-04-01 16:49 ` Anatol Pomozov
0 siblings, 0 replies; 9+ messages in thread
From: Anatol Pomozov @ 2013-04-01 16:49 UTC (permalink / raw)
To: Al Viro; +Cc: Linus Torvalds, LKML, Theodore Ts'o, sqazi, Guo Chao
Hi
On Mon, Apr 1, 2013 at 9:28 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Mon, Apr 01, 2013 at 04:58:05AM -0700, Anatol Pomozov wrote:
>> lo->lo_flags |= LO_FLAGS_PARTSCAN;
>> if (lo->lo_flags & LO_FLAGS_PARTSCAN)
>> ioctl_by_bdev(bdev, BLKRRPART, 0);
>> +
>> + /* bdev lifecycle is defined by its bd_inode (see
>> + * struct bdev_inode usage). In case of loop device we need to make
>> + * sure that bdev deallocation will not happen between loop_set_fd()
>> + * and loop_clr_fd() invocations. To do this we need to hold
>> + * bdev inode here and put it later in loop_clr_fd().
>> + */
>> + ihold(bdev->bd_inode);
>
> That's open-coded bdgrab()
Done. It also requires EXPORTING bdgrab.
The test still passes.
>
>> + if (bdev) {
>> + BUG_ON(atomic_read(&bdev->bd_inode->i_count) < 2);
>> + iput(bdev->bd_inode);
>
> ... and that - bdput() (I'd drop that BUG_ON())
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] loop: prevent bdev freeing while device in use
2013-04-01 11:58 [PATCH] loop: prevent bdev freeing while device in use Anatol Pomozov
2013-04-01 15:16 ` Linus Torvalds
2013-04-01 16:28 ` Al Viro
@ 2013-04-01 16:47 ` Anatol Pomozov
2 siblings, 0 replies; 9+ messages in thread
From: Anatol Pomozov @ 2013-04-01 16:47 UTC (permalink / raw)
To: torvalds, linux-kernel; +Cc: tytso, sqazi, viro, yan, Anatol Pomozov
struct block_device lifecycle is defined by its inode (see fs/block_dev.c) -
block_device allocated first time we access /dev/loopXX and deallocated on
bdev_destroy_inode. When we create the device "losetup /dev/loopXX afile"
we want that block_device stay alive until we destroy the loop device
with "losetup -d".
But because we do not hold /dev/loopXX inode its counter goes 0, and
inode/bdev can be destroyed at any moment. Usually it happens at memory
pressure or when user drops inode cache (like in the test below). When later in
loop_clr_fd() we want to use bdev we have use-after-free error with following
stack:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000280
bd_set_size+0x10/0xa0
loop_clr_fd+0x1f8/0x420 [loop]
lo_ioctl+0x200/0x7e0 [loop]
lo_compat_ioctl+0x47/0xe0 [loop]
compat_blkdev_ioctl+0x341/0x1290
do_filp_open+0x42/0xa0
compat_sys_ioctl+0xc1/0xf20
do_sys_open+0x16e/0x1d0
sysenter_dispatch+0x7/0x1a
To prevent use-after-free we need to grab the device in loop_set_fd()
and put it later in loop_clr_fd().
The issue is reprodusible on current Linus head and v3.3. Here is the test:
dd if=/dev/zero of=loop.file bs=1M count=1
while [ true ]; do
losetup /dev/loop0 loop.file
echo 2 > /proc/sys/vm/drop_caches
losetup -d /dev/loop0
done
Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
---
drivers/block/loop.c | 9 ++++++++-
fs/block_dev.c | 1 +
2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index fe5f640..2c127f9 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -922,6 +922,11 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
lo->lo_flags |= LO_FLAGS_PARTSCAN;
if (lo->lo_flags & LO_FLAGS_PARTSCAN)
ioctl_by_bdev(bdev, BLKRRPART, 0);
+
+ /* Grab the block_device to prevent its destruction after we
+ * put /dev/loopXX inode. Later in loop_clr_fd() we bdput(bdev).
+ */
+ bdgrab(bdev);
return 0;
out_clr:
@@ -1031,8 +1036,10 @@ static int loop_clr_fd(struct loop_device *lo)
memset(lo->lo_encrypt_key, 0, LO_KEY_SIZE);
memset(lo->lo_crypt_name, 0, LO_NAME_SIZE);
memset(lo->lo_file_name, 0, LO_NAME_SIZE);
- if (bdev)
+ if (bdev) {
+ bdput(bdev);
invalidate_bdev(bdev);
+ }
set_capacity(lo->lo_disk, 0);
loop_sysfs_exit(lo);
if (bdev) {
diff --git a/fs/block_dev.c b/fs/block_dev.c
index aea605c..aae187a 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -551,6 +551,7 @@ struct block_device *bdgrab(struct block_device *bdev)
ihold(bdev->bd_inode);
return bdev;
}
+EXPORT_SYMBOL(bdgrab);
long nr_blockdev_pages(void)
{
--
1.8.1.3
^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2013-04-02 6:08 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-01 11:58 [PATCH] loop: prevent bdev freeing while device in use Anatol Pomozov
2013-04-01 15:16 ` Linus Torvalds
2013-04-01 17:00 ` Anatol Pomozov
2013-04-01 22:53 ` Linus Torvalds
2013-04-02 5:58 ` Anatol Pomozov
2013-04-02 6:08 ` Al Viro
2013-04-01 16:28 ` Al Viro
2013-04-01 16:49 ` Anatol Pomozov
2013-04-01 16:47 ` Anatol Pomozov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox