From mboxrd@z Thu Jan 1 00:00:00 1970 From: Edward Shishkin Subject: Re: reiser4: FITRIM ioctl -- how to grab the space? Date: Sat, 16 Aug 2014 10:09:44 +0200 Message-ID: <53EF11C8.20209@gmail.com> References: <3405506.BC0S4TX54B@intelfx-laptop> <53E80077.2060300@gmail.com> <2275451.mu6kquuFD5@intelfx-laptop> <2026408.Yl74NqGZKK@intelfx-laptop> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=W3VmRqQGaBA0CWpjebN9Bjvj6EpGcsjXzO0yUrR4ATA=; b=jzx7lik0191VG0Z8+JnaH9MIpW0jRIZtYcvh9iOUJeW02PdyGoDTRSMTW1wiOIHsM2 y493c5YlsgN4QjWlBsWbQ375SKsHNItvqeXGYBPDLwtMBZA8DQQuBl9yq4QK2Gg8YZDq F6yzbmcDOvN1w2CI+5/DieXPOcdqBsM3TEEb1PYHCqSRv77ALaZXjtiyilzeJr/T8vNa GBnmpmXumgTuyj4fKY6i9GrnRDqiSPfquCuh29GB+wRxAsUP0PLBUhwfUhcsZYoiww4L T1+6J8AOLV6VAYzEcAQQQ2jOl3edsvbnnecHXYrUeSqjLHE1sbHOeZEomu0CQ/6m5fX8 QCZw== In-Reply-To: <2026408.Yl74NqGZKK@intelfx-laptop> Sender: reiserfs-devel-owner@vger.kernel.org List-ID: Content-Type: text/plain; charset="us-ascii"; format="flowed" To: Ivan Shapovalov , reiserfs-devel@vger.kernel.org On 08/16/2014 02:44 AM, Ivan Shapovalov wrote: > On Monday 11 August 2014 at 13:39:12, Ivan Shapovalov wrote: >> [...] >>>> I've meant "grabbing all space and then allocating all space" -- so there won't >>>> be multiple grabs or multiple atoms. >>>> >>>> Then all processes grabbing space with BA_CAN_COMMIT will wait for the discard >>>> atom to commit. >>> >>> It seems such waiting will screw up the system. No? >> I was afraid of such situations, but how would that happen? The discard atom's >> commit will always be able to proceed as it doesn't grab space at all. >> >>>> (Actually, there is a small race window between grabbing space >>>> and creating an atom...) >>> >>> Which one? >> BA_CAN_COMMIT machinery does wait only for atoms, not for contexts. If >> process X happens to grab space between us grabbing space and creating an atom, >> it will get -ENOSPC even with BA_CAN_COMMIT. I still don't see any "races" here. How atom creation is related to grabbing space? Are we talking about races in the existing code? f so, please show the racing paths.. >> >>>> The only problem is to wait for (sbinfo->block_count == sbinfo->blocks_used + >>>> sbinfo->blocks_free) condition, i. e. until no blocks are reserved in any form, >>>> and then to grab all space atomically wrt. reaching this condition. >>>> >>>> Again, if this is not feasible, I'll go with the multiple atoms approach. I >>>> just want to make sure. >>>> > ...so, I've almost given up implementing this :) great! > > In kernel there is a read-write semaphore implementation called rwsem. > I've added a per-superblock instance of rwsem with following semantics: > > - when count of grabbed+special (not free or used) blocks is increased by any > means, the semaphore is taken for reading before taking spinlock and > modifying counters > > - if the counters already were non-zero, the semaphore has been already taken > for reading (reader count > 1) and it is released once while under spinlock > (so that reader count always stays at 1) > > - when count of grabbed+special blocks is decreased and drops to zero, the > semaphore is released once (so reader count drops to 0 unless there is a race > with increasing the count) > > - on second try of BA_CAN_COMMIT grabbing (if there was not enough space), > the semaphore is taken for writing instead of for reading, ensuring > that every block is either permanently used or free. The write lock is > converted to read lock after grabbing required space. > > This "almost" works. The main problem is that Linux rwsem implementation > is write-biased: that is, if there are writers waiting, readers count can't > increase. That is, a process must not take a semaphore for reading in second > time if it is responsible for releasing the "first time" reader. > > The comment in original rwsem implementation by Andrew Morton states following: > "It is NOT legal for one task to down_read() an rwsem multiple times." > > reader1 writer1 > ------------------------ > down_read() > down_write() > up_write() > down_read() > up_read() > up_read() > > This is a deadlock: reader1's down_read() blocks on writer1's up_write(), > while writer1's down_write() blocks on reader1's second up_read(). > > A force grab (or a grab preceded by grab_space_enable(), or a used2something) > deadlocks 100% in presence of waiting writers, and so does the corresponding > transaction commit. > > So I need to find a way to take rwsem in a read-biased mode... Any advice is > accepted, including "give up with adding of yet another lock and go with > multiple transactions" :) IMHO this is too complicated. Why don't you want to grab, say, 20M per iteration? It should work without any problems, just maintain a counter of blocks allocated in the iteration.. Thanks, Edward.