From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Tue, 10 Dec 2002 01:01:18 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Tue, 10 Dec 2002 01:01:18 -0500 Received: from smtp.sw.oz.au ([203.31.96.1]:12816 "EHLO smtp.sw.oz.au") by vger.kernel.org with ESMTP id ; Tue, 10 Dec 2002 01:01:04 -0500 Date: Tue, 10 Dec 2002 17:08:41 +1100 From: Kingsley Cheung To: Andrew Morton Cc: linux-kernel@vger.kernel.org, trivial@rustcorp.com.au Subject: Re: [TRIVIAL PATCH 2.4.20] madvise_willneed makes bad limit comparison Message-ID: <20021210170841.D8843@aurema.com> References: <20021209150426.D12270@aurema.com> <3DF43855.19F24E73@digeo.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <3DF43855.19F24E73@digeo.com>; from akpm@digeo.com on Sun, Dec 08, 2002 at 10:29:41PM -0800 Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Dec 08, 2002 at 10:29:41PM -0800, Andrew Morton wrote: > > It's surely a bug, but looking at the code, one does ask "what > on earth is it trying to do"? > Thanks for that Andrew. Yisshhhh. I merely took the bug at face value. This is not as trivial as I first thought. > 1) -EIO is not a recognised (or appropriate) return value. > Aye, I overlooked that. > 2) If the MADV_WILLNEED call fails, all the user needs to do is to > use a smaller chunk, and walk across the file using that chunk > size! The only system-protecting limit here is the request queue > size. > > 3) We don't know that the application will try to map all that readahead > at the same time anyway. And if it does, the rlimits will catch it. > Yes. Though currently there is no enforcement for RLIMIT_RSS implemented. I guess when its there it will catch it when the process starts faulting on those pages. > Linus used "half the size of the inactive list" in sys_readahead. That's > probably as good as anything else. I'd suggest that we just share > that bit of code in madvise. > > I agree that failing with an error is inappropriate. > > We should limit the readahead according to machine size, disk bandwidth, > free memory availability, shoe size, etc. And once that's done then > it _has_ to return success. Otherwise the application would see > different results depending on system size and activity. > > It is just "advice". So then something of the following without the check is more appropriate or a starting point then? diff -urN linux-2.4.20/mm/filemap.c linux-2.4.20patched/mm/filemap.c --- linux-2.4.20/mm/filemap.c Mon Dec 9 14:19:13 2002 +++ linux-2.4.20patched/mm/filemap.c Tue Dec 10 15:30:05 2002 @@ -2455,7 +2455,7 @@ { long error = -EBADF; struct file * file; - unsigned long size, rlim_rss; + unsigned long size, max; /* Doesn't work if there's no mapped file. */ if (!vma->vm_file) @@ -2469,12 +2469,10 @@ end = vma->vm_end; end = ((end - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff; - /* Make sure this doesn't exceed the process's max rss. */ - error = -EIO; - rlim_rss = current->rlim ? current->rlim[RLIMIT_RSS].rlim_cur : - LONG_MAX; /* default: see resource.h */ - if ((vma->vm_mm->rss + (end - start)) > rlim_rss) - return error; + /* Like sys_readahead, limit to a sane percentage of inactive list.. */ + max = nr_inactive_pages / 2; + if ((end - start) > max) + end = start + max; /* round to cluster boundaries if this isn't a "random" area. */ if (!VM_RandomReadHint(vma)) { -- Kingsley