From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751877AbdIEOu2 (ORCPT ); Tue, 5 Sep 2017 10:50:28 -0400 Received: from mx1.redhat.com ([209.132.183.28]:44140 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751732AbdIEOuX (ORCPT ); Tue, 5 Sep 2017 10:50:23 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 8255DC070E25 Authentication-Results: ext-mx07.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx07.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=dhowells@redhat.com Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 From: David Howells In-Reply-To: <20170905132951.GB1774378@devbig577.frc2.facebook.com> References: <20170905132951.GB1774378@devbig577.frc2.facebook.com> <150428045304.25051.1778333106306853298.stgit@warthog.procyon.org.uk> To: Tejun Heo Cc: dhowells@redhat.com, linux-afs@lists.infradead.org, linux-fsdevel@vger.kernel.org, Lai Jiangshan , linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 01/11] workqueue: Add a decrement-after-return and wake if 0 facility MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <27488.1504623016.1@warthog.procyon.org.uk> Date: Tue, 05 Sep 2017 15:50:16 +0100 Message-ID: <27489.1504623016@warthog.procyon.org.uk> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Tue, 05 Sep 2017 14:50:23 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Tejun Heo wrote: > Given how work items are used, I think this is too inviting to abuses > where people build complex event chains through these counters and > those chains would be completely opaque. If the goal is protecting > .text of a work item, can't we just do that? Can you please describe > your use case in more detail? With one of my latest patches to AFS, there's a set of cell records, where each cell has a manager work item that mainains that cell, including refreshing DNS records and excising expired records from the list. Performing the excision in the manager work item makes handling the fscache index cookie easier (you can't have two cookies attached to the same object), amongst other things. There's also an overseer work item that maintains a single expiry timer for all the cells and queues the per-cell work items to do DNS updates and cell removal. The reason that the overseer exists is that it makes it easier to do a put on a cell. The cell decrements the cell refcount and then wants to schedule the cell for destruction - but it's no longer permitted to touch the cell. I could use atomic_dec_and_lock(), but that's messy. It's cleaner just to set the timer on the overseer and leave it to that. However, if someone does rmmod, I have to be able to clean everything up. The overseer timer may be queued or running; the overseer may be queued *and* running and may get queued again by the timer; and each cell's work item may be queued *and* running and may get queued again by the manager. > Why can't it be done via the usual "flush from exit"? Well, it can, but you need a flush for each separate level of dependencies, where one dependency will kick off another level of dependency during the cleanup. So what I think I would have to do is set a flag to say that no one is allowed to set the timer now (this shouldn't happen outside of server or volume cache clearance), delete the timer synchronously, flush the work queue four times and then do an RCU barrier. However, since I have volumes with dependencies on servers and cells, possibly with their own managers, I think I may need up to 12 flushes, possibly with interspersed RCU barriers. It's much simpler to count out the objects than to try and get the flushing right. David