From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Mon, 9 Dec 2002 17:20:00 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Mon, 9 Dec 2002 17:19:59 -0500 Received: from packet.digeo.com ([12.110.80.53]:46990 "EHLO packet.digeo.com") by vger.kernel.org with ESMTP id ; Mon, 9 Dec 2002 17:19:58 -0500 Message-ID: <3DF43855.19F24E73@digeo.com> Date: Sun, 08 Dec 2002 22:29:41 -0800 From: Andrew Morton X-Mailer: Mozilla 4.79 [en] (X11; U; Linux 2.5.46 i686) X-Accept-Language: en MIME-Version: 1.0 To: Kingsley Cheung CC: linux-kernel@vger.kernel.org, trivial@rustcorp.com.au Subject: Re: [TRIVIAL PATCH 2.4.20] madvise_willneed makes bad limit comparison References: <20021209150426.D12270@aurema.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 09 Dec 2002 06:29:42.0340 (UTC) FILETIME=[5BA0F040:01C29F4C] Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Kingsley Cheung wrote: > > Hi, > > 'madvise_willneed' makes an incorrect rss limit comparison. It > directly compares rlim[RLIMIT_RSS].rlim_cur to rss. The former is in > bytes, whereas the latter is in pages. The fix for this is trivial. > It's surely a bug, but looking at the code, one does ask "what on earth is it trying to do"? 1) -EIO is not a recognised (or appropriate) return value. 2) If the MADV_WILLNEED call fails, all the user needs to do is to use a smaller chunk, and walk across the file using that chunk size! The only system-protecting limit here is the request queue size. 3) We don't know that the application will try to map all that readahead at the same time anyway. And if it does, the rlimits will catch it. Linus used "half the size of the inactive list" in sys_readahead. That's probably as good as anything else. I'd suggest that we just share that bit of code in madvise. hmm. Also the new readahead code will allocate all that memory up-front before putting it under I/O. I'll fix up do_page_cache_readahead() for that. > [As an aside, one question is whether this limit check is needed at > all. Most rss limit enforcement implementations that I've seen are > 'soft', whereas this would give the limit 'hard' semantics. Do we > really want 'hard' limit semantics?] > I agree that failing with an error is inappropriate. We should limit the readahead according to machine size, disk bandwidth, free memory availability, shoe size, etc. And once that's done then it _has_ to return success. Otherwise the application would see different results depending on system size and activity. It is just "advice".