All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Dmitry Vyukov <dvyukov@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	dave.hansen@linux.intel.com, Hugh Dickins <hughd@google.com>,
	Joe Perches <joe@perches.com>,
	sds@tycho.nsa.gov, Oleg Nesterov <oleg@redhat.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Rik van Riel <riel@redhat.com>,
	mhocko@suse.cz, gang.chen.5i5j@gmail.com,
	Peter Feiner <pfeiner@google.com>,
	aarcange@redhat.com, "linux-mm@kvack.org" <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	syzkaller@googlegroups.com, Kostya Serebryany <kcc@google.com>,
	Alexander Potapenko <glider@google.com>,
	Andrey Konovalov <andreyknvl@google.com>,
	Sasha Levin <sasha.levin@oracle.com>
Subject: Re: GPF in shm_lock ipc
Date: Mon, 12 Oct 2015 21:10:40 +0300	[thread overview]
Message-ID: <20151012181040.GC6447@node> (raw)
In-Reply-To: <20151012174945.GC3170@linux-uzut.site>

On Mon, Oct 12, 2015 at 10:49:45AM -0700, Davidlohr Bueso wrote:
> On Mon, 12 Oct 2015, Kirill A. Shutemov wrote:
> 
> >On Mon, Oct 12, 2015 at 11:55:44AM +0200, Dmitry Vyukov wrote:
> >Here's slightly simplified and more human readable reproducer:
> >
> >#define _GNU_SOURCE
> >#include <stdlib.h>
> >#include <sys/ipc.h>
> >#include <sys/mman.h>
> >#include <sys/shm.h>
> >
> >#define PAGE_SIZE 4096
> >
> >int main()
> >{
> >	int id;
> >	void *p;
> >
> >	id = shmget(IPC_PRIVATE, 3 * PAGE_SIZE, 0);
> >	p = shmat(id, NULL, 0);
> >	shmctl(id, IPC_RMID, NULL);
> >	remap_file_pages(p, 3 * PAGE_SIZE, 0, 7, 0);
> >
> >       return 0;
> >}
> 
> Thanks!
> 
> >>
> >>On commit dd36d7393d6310b0c1adefb22fba79c3cf8a577c
> >>(git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git)
> >>
> >>------------[ cut here ]------------
> >>WARNING: CPU: 2 PID: 2636 at ipc/shm.c:162 shm_open+0x74/0x80()
> >>Modules linked in:
> >>CPU: 2 PID: 2636 Comm: a.out Not tainted 4.3.0-rc3+ #37
> >>Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> >> ffffffff81bcb43c ffff88081bf0bd70 ffffffff812fe8d6 0000000000000000
> >> ffff88081bf0bda8 ffffffff81051ff1 ffffffffffffffea ffff88081b896ca8
> >> ffff880819b81620 ffff8800bbaa6d00 ffff880819b81600 ffff88081bf0bdb8
> >>Call Trace:
> >> [<     inline     >] __dump_stack lib/dump_stack.c:15
> >> [<ffffffff812fe8d6>] dump_stack+0x44/0x5e lib/dump_stack.c:50
> >> [<ffffffff81051ff1>] warn_slowpath_common+0x81/0xc0 kernel/panic.c:447
> >> [<ffffffff810520e5>] warn_slowpath_null+0x15/0x20 kernel/panic.c:480
> >> [<     inline     >] shm_lock ipc/shm.c:162
> >> [<ffffffff81295c64>] shm_open+0x74/0x80 ipc/shm.c:196
> >> [<ffffffff81295cbe>] shm_mmap+0x4e/0x80 ipc/shm.c:399 (discriminator 2)
> >> [<ffffffff81142d14>] mmap_region+0x3c4/0x5e0 mm/mmap.c:1627
> >> [<ffffffff81143227>] do_mmap+0x2f7/0x3d0 mm/mmap.c:1402
> >> [<     inline     >] do_mmap_pgoff include/linux/mm.h:1930
> >> [<     inline     >] SYSC_remap_file_pages mm/mmap.c:2694
> >> [<ffffffff811434a9>] SyS_remap_file_pages+0x179/0x240 mm/mmap.c:2641
> >> [<ffffffff81859a97>] entry_SYSCALL_64_fastpath+0x12/0x6a
> >>arch/x86/entry/entry_64.S:185
> >>---[ end trace 0873e743fc645a8c ]---
> >
> >Okay. The problem is that SysV IPC SHM doesn't expect the memory region to
> >be mmap'ed after IPC_RMID, but remap_file_pages() manages to create new
> >VMA using existing one.
> 
> Indeed, naughty users should not be mapping/(re)attaching after IPC_RMID.
> This is common to all things ipc, not only to shm. And while Linux nowadays
> does enforce that nothing touch a segment marked for deletion[1], we have
> contradictory scenarios where the resource is only freed once the last attached
> process exits.
> 
> [1] https://lkml.org/lkml/2015/10/12/483
> 
> So this warning used to in fact be a full BUG_ON, but ultimately the ipc
> subsystem acknowledges that this situation is possible but fully blames the
> user responsible, and therefore we only warn about bogus usage.
> 
> >I'm not sure what the right way to fix it. The SysV SHM VMA is pretty
> >normal from mm POV (no special flags, etc.) and it meats remap_file_pages
> >criteria (shared mapping). Every fix I can think of on mm side is ugly.
> >
> >Probably better to teach shm_mmap() to fall off gracefully in case of
> >non-existing shmid? I'm not familiar with IPC code.
> >Could anyone look into it?
> 
> Yeah, this was my approach as well. Very little tested other than it solves
> the above warning. Basically we don't want to be doing mmap if the segment
> was deleted, thus return a corresponding error instead of triggering the
> same error later on after mmaping, via shm_open(). I still need to think
> a bit more about this, but seems legit if we don't hurt userspace while
> at it (at least the idea, not considering any overhead in doing the idr
> lookup). Thoughts?
> 
> Thanks,
> Davidlohr
> 
> diff --git a/ipc/shm.c b/ipc/shm.c
> index 4178727..9615f19 100644
> --- a/ipc/shm.c
> +++ b/ipc/shm.c
> @@ -385,9 +385,25 @@ static struct mempolicy *shm_get_policy(struct vm_area_struct *vma,
>  static int shm_mmap(struct file *file, struct vm_area_struct *vma)
>  {
> -	struct shm_file_data *sfd = shm_file_data(file);
> +	struct file *vma_file = vma->vm_file;
> +	struct shm_file_data *sfd = shm_file_data(vma_file);
> +	struct ipc_ids *ids = &shm_ids(sfd->ns);
> +	struct kern_ipc_perm *shp;
>  	int ret;
> +	rcu_read_lock();
> +	shp = ipc_obtain_object_check(ids, sfd->id);
> +	if (IS_ERR(shp)) {
> +		ret = -EINVAL;
> +		goto err;
> +	}
> +
> +	if (!ipc_valid_object(shp)) {
> +		ret = -EIDRM;
> +		goto err;
> +	}
> +	rcu_read_unlock();
> +

Hm. Isn't it racy? What prevents IPC_RMID from happening after this point?
Shouldn't we bump shm_nattch here? Or some other refcount?


>  	ret = sfd->file->f_op->mmap(sfd->file, vma);
>  	if (ret != 0)
>  		return ret;
> @@ -399,6 +415,9 @@ static int shm_mmap(struct file *file, struct vm_area_struct *vma)
>  	shm_open(vma);
>  	return ret;
> +err:
> +	rcu_read_unlock();
> +	return ret;
>  }
>  static int shm_release(struct inode *ino, struct file *file)
> 
> 
> 
> 
> 
> 

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Dmitry Vyukov <dvyukov@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	dave.hansen@linux.intel.com, Hugh Dickins <hughd@google.com>,
	Joe Perches <joe@perches.com>,
	sds@tycho.nsa.gov, Oleg Nesterov <oleg@redhat.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Rik van Riel <riel@redhat.com>,
	mhocko@suse.cz, gang.chen.5i5j@gmail.com,
	Peter Feiner <pfeiner@google.com>,
	aarcange@redhat.com, "linux-mm@kvack.org" <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	syzkaller@googlegroups.com, Kostya Serebryany <kcc@google.com>,
	Alexander Potapenko <glider@google.com>,
	Andrey Konovalov <andreyknvl@google.com>,
	Sasha Levin <sasha.levin@oracle.com>
Subject: Re: GPF in shm_lock ipc
Date: Mon, 12 Oct 2015 21:10:40 +0300	[thread overview]
Message-ID: <20151012181040.GC6447@node> (raw)
In-Reply-To: <20151012174945.GC3170@linux-uzut.site>

On Mon, Oct 12, 2015 at 10:49:45AM -0700, Davidlohr Bueso wrote:
> On Mon, 12 Oct 2015, Kirill A. Shutemov wrote:
> 
> >On Mon, Oct 12, 2015 at 11:55:44AM +0200, Dmitry Vyukov wrote:
> >Here's slightly simplified and more human readable reproducer:
> >
> >#define _GNU_SOURCE
> >#include <stdlib.h>
> >#include <sys/ipc.h>
> >#include <sys/mman.h>
> >#include <sys/shm.h>
> >
> >#define PAGE_SIZE 4096
> >
> >int main()
> >{
> >	int id;
> >	void *p;
> >
> >	id = shmget(IPC_PRIVATE, 3 * PAGE_SIZE, 0);
> >	p = shmat(id, NULL, 0);
> >	shmctl(id, IPC_RMID, NULL);
> >	remap_file_pages(p, 3 * PAGE_SIZE, 0, 7, 0);
> >
> >       return 0;
> >}
> 
> Thanks!
> 
> >>
> >>On commit dd36d7393d6310b0c1adefb22fba79c3cf8a577c
> >>(git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git)
> >>
> >>------------[ cut here ]------------
> >>WARNING: CPU: 2 PID: 2636 at ipc/shm.c:162 shm_open+0x74/0x80()
> >>Modules linked in:
> >>CPU: 2 PID: 2636 Comm: a.out Not tainted 4.3.0-rc3+ #37
> >>Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> >> ffffffff81bcb43c ffff88081bf0bd70 ffffffff812fe8d6 0000000000000000
> >> ffff88081bf0bda8 ffffffff81051ff1 ffffffffffffffea ffff88081b896ca8
> >> ffff880819b81620 ffff8800bbaa6d00 ffff880819b81600 ffff88081bf0bdb8
> >>Call Trace:
> >> [<     inline     >] __dump_stack lib/dump_stack.c:15
> >> [<ffffffff812fe8d6>] dump_stack+0x44/0x5e lib/dump_stack.c:50
> >> [<ffffffff81051ff1>] warn_slowpath_common+0x81/0xc0 kernel/panic.c:447
> >> [<ffffffff810520e5>] warn_slowpath_null+0x15/0x20 kernel/panic.c:480
> >> [<     inline     >] shm_lock ipc/shm.c:162
> >> [<ffffffff81295c64>] shm_open+0x74/0x80 ipc/shm.c:196
> >> [<ffffffff81295cbe>] shm_mmap+0x4e/0x80 ipc/shm.c:399 (discriminator 2)
> >> [<ffffffff81142d14>] mmap_region+0x3c4/0x5e0 mm/mmap.c:1627
> >> [<ffffffff81143227>] do_mmap+0x2f7/0x3d0 mm/mmap.c:1402
> >> [<     inline     >] do_mmap_pgoff include/linux/mm.h:1930
> >> [<     inline     >] SYSC_remap_file_pages mm/mmap.c:2694
> >> [<ffffffff811434a9>] SyS_remap_file_pages+0x179/0x240 mm/mmap.c:2641
> >> [<ffffffff81859a97>] entry_SYSCALL_64_fastpath+0x12/0x6a
> >>arch/x86/entry/entry_64.S:185
> >>---[ end trace 0873e743fc645a8c ]---
> >
> >Okay. The problem is that SysV IPC SHM doesn't expect the memory region to
> >be mmap'ed after IPC_RMID, but remap_file_pages() manages to create new
> >VMA using existing one.
> 
> Indeed, naughty users should not be mapping/(re)attaching after IPC_RMID.
> This is common to all things ipc, not only to shm. And while Linux nowadays
> does enforce that nothing touch a segment marked for deletion[1], we have
> contradictory scenarios where the resource is only freed once the last attached
> process exits.
> 
> [1] https://lkml.org/lkml/2015/10/12/483
> 
> So this warning used to in fact be a full BUG_ON, but ultimately the ipc
> subsystem acknowledges that this situation is possible but fully blames the
> user responsible, and therefore we only warn about bogus usage.
> 
> >I'm not sure what the right way to fix it. The SysV SHM VMA is pretty
> >normal from mm POV (no special flags, etc.) and it meats remap_file_pages
> >criteria (shared mapping). Every fix I can think of on mm side is ugly.
> >
> >Probably better to teach shm_mmap() to fall off gracefully in case of
> >non-existing shmid? I'm not familiar with IPC code.
> >Could anyone look into it?
> 
> Yeah, this was my approach as well. Very little tested other than it solves
> the above warning. Basically we don't want to be doing mmap if the segment
> was deleted, thus return a corresponding error instead of triggering the
> same error later on after mmaping, via shm_open(). I still need to think
> a bit more about this, but seems legit if we don't hurt userspace while
> at it (at least the idea, not considering any overhead in doing the idr
> lookup). Thoughts?
> 
> Thanks,
> Davidlohr
> 
> diff --git a/ipc/shm.c b/ipc/shm.c
> index 4178727..9615f19 100644
> --- a/ipc/shm.c
> +++ b/ipc/shm.c
> @@ -385,9 +385,25 @@ static struct mempolicy *shm_get_policy(struct vm_area_struct *vma,
>  static int shm_mmap(struct file *file, struct vm_area_struct *vma)
>  {
> -	struct shm_file_data *sfd = shm_file_data(file);
> +	struct file *vma_file = vma->vm_file;
> +	struct shm_file_data *sfd = shm_file_data(vma_file);
> +	struct ipc_ids *ids = &shm_ids(sfd->ns);
> +	struct kern_ipc_perm *shp;
>  	int ret;
> +	rcu_read_lock();
> +	shp = ipc_obtain_object_check(ids, sfd->id);
> +	if (IS_ERR(shp)) {
> +		ret = -EINVAL;
> +		goto err;
> +	}
> +
> +	if (!ipc_valid_object(shp)) {
> +		ret = -EIDRM;
> +		goto err;
> +	}
> +	rcu_read_unlock();
> +

Hm. Isn't it racy? What prevents IPC_RMID from happening after this point?
Shouldn't we bump shm_nattch here? Or some other refcount?


>  	ret = sfd->file->f_op->mmap(sfd->file, vma);
>  	if (ret != 0)
>  		return ret;
> @@ -399,6 +415,9 @@ static int shm_mmap(struct file *file, struct vm_area_struct *vma)
>  	shm_open(vma);
>  	return ret;
> +err:
> +	rcu_read_unlock();
> +	return ret;
>  }
>  static int shm_release(struct inode *ino, struct file *file)
> 
> 
> 
> 
> 
> 

-- 
 Kirill A. Shutemov

  reply	other threads:[~2015-10-12 18:10 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-12  9:55 GPF in shm_lock ipc Dmitry Vyukov
2015-10-12  9:55 ` Dmitry Vyukov
2015-10-12 11:41 ` Vlastimil Babka
2015-10-12 11:41   ` Vlastimil Babka
2015-10-12 11:44   ` Dmitry Vyukov
2015-10-12 11:44     ` Dmitry Vyukov
2015-10-12 12:27 ` Kirill A. Shutemov
2015-10-12 12:27   ` Kirill A. Shutemov
2015-10-12 17:49   ` Davidlohr Bueso
2015-10-12 17:49     ` Davidlohr Bueso
2015-10-12 18:10     ` Kirill A. Shutemov [this message]
2015-10-12 18:10       ` Kirill A. Shutemov
2015-10-12 18:55       ` Davidlohr Bueso
2015-10-12 18:55         ` Davidlohr Bueso
2015-10-13  3:18         ` Davidlohr Bueso
2015-10-13  3:18           ` Davidlohr Bueso
2015-10-13 12:30           ` Kirill A. Shutemov
2015-10-13 12:30             ` Kirill A. Shutemov
2015-10-29 15:33             ` Dmitry Vyukov
2015-10-29 15:33               ` Dmitry Vyukov
2015-11-05 14:23               ` Kirill A. Shutemov
2015-11-05 14:23                 ` Kirill A. Shutemov
2015-12-21 15:44                 ` Dmitry Vyukov
2015-12-21 15:44                   ` Dmitry Vyukov
2016-01-02 11:33                   ` Manfred Spraul
2016-01-02 11:33                     ` Manfred Spraul
2016-01-02 12:19                     ` Dmitry Vyukov
2016-01-02 12:19                       ` Dmitry Vyukov
2016-01-02 15:58                       ` Manfred Spraul
2016-01-02 15:58                         ` Manfred Spraul
2016-02-02  3:25                   ` Andrew Morton
2016-02-02  3:25                     ` Andrew Morton
2016-02-02 21:32                     ` Dmitry Vyukov
2016-02-02 21:32                       ` Dmitry Vyukov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151012181040.GC6447@node \
    --to=kirill@shutemov.name \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andreyknvl@google.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dvyukov@google.com \
    --cc=gang.chen.5i5j@gmail.com \
    --cc=glider@google.com \
    --cc=hughd@google.com \
    --cc=joe@perches.com \
    --cc=kcc@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=oleg@redhat.com \
    --cc=pfeiner@google.com \
    --cc=riel@redhat.com \
    --cc=sasha.levin@oracle.com \
    --cc=sds@tycho.nsa.gov \
    --cc=syzkaller@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.