From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261998AbTELIKk (ORCPT ); Mon, 12 May 2003 04:10:40 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S262000AbTELIKk (ORCPT ); Mon, 12 May 2003 04:10:40 -0400 Received: from mxintern.kundenserver.de ([212.227.126.204]:59086 "EHLO mxintern.kundenserver.de") by vger.kernel.org with ESMTP id S261998AbTELIKg (ORCPT ); Mon, 12 May 2003 04:10:36 -0400 Date: Mon, 12 May 2003 10:25:12 +0200 From: Dominik Vogt To: linux-kernel@vger.kernel.org Subject: Bug: MM Oops in 2.4.20 Message-ID: <20030512082512.GA18352@gmx.de> Reply-To: dominik.vogt@gmx.de Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.28i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org The 2.4.20 (and earlier kernels, at least 2.4.19 and 2.4.18) have a bug that causes an Oops (NULL pointer dereference) when a process A "exec"s while some other process B reads A's /proc//maps file. This code is from fs/exec.c, function exec_mmap(): A1 old_mm = current->mm; A2 if (old_mm && atomic_read(&old_mm->mm_users) == 1) { A3 mm_release(); A4 exit_mmap(old_mm); A5 return 0; A6 } And this is from fs/proc/array.c, function proc_pid_read_maps(): B1 task_lock(task); B2 mm = task->mm; B3 if (mm) B4 atomic_inc(&mm->mm_users); B5 task_unlock(task); Let's assume process A just called execve() and its mm->mm_users is one. Process A has already executed the test in A2 when a timer interrupt schedules process B ("cat /proc//maps"). Now, process B grabs the mm structure of process A, increases the mm_users counter and continues to access it. Later, process A resumes execution and calls exit_mmap(), destroying the structure that is still used by B. In our specific case, we get a NULL pointer dereference in proc_pid_maps_get_line(): dev = map->vm_file->f_dentry->d_inode->i_dev; because vm_file is in the free_list and f_dentry is NULL. On our site, we get about one Oops every 2 days on 200 machines (runnig lsof and a concurrent "exec uptime" once a minute). -- In some places in the code, specifically in fs/proc, accessing mm->mm_users or mm->map is protected with a task_(un)lock, while it is not protected at all in many other places, e.g. schedule(), exec_mmap(), flush_tlb_mm() (in arch/alpha/kernel/smp.c), ... We were able to stabilize our systems by protecting the above code from fs/exec.c with task_lock() and task_unlock(), but I am quite sure this patch is not sufficient. Also, there are many other places in the proc fs that have similar code as the one in proc_pid_read_maps(), so the problem should occur in many other situations too. Bye Dominik ^_^ ^_^ P.S.: Please cc me.