From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1751877AbdIEOu2 (ORCPT <rfc822;w@1wt.eu>);
        Tue, 5 Sep 2017 10:50:28 -0400
Received: from mx1.redhat.com ([209.132.183.28]:44140 "EHLO mx1.redhat.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1751732AbdIEOuX (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 5 Sep 2017 10:50:23 -0400
DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 8255DC070E25
Authentication-Results: ext-mx07.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
Authentication-Results: ext-mx07.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=dhowells@redhat.com
Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley
        Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United
        Kingdom.
        Registered in England and Wales under Company Registration No. 3798903
From: David Howells <dhowells@redhat.com>
In-Reply-To: <20170905132951.GB1774378@devbig577.frc2.facebook.com>
References: <20170905132951.GB1774378@devbig577.frc2.facebook.com> <150428045304.25051.1778333106306853298.stgit@warthog.procyon.org.uk>
To: Tejun Heo <tj@kernel.org>
Cc: dhowells@redhat.com, linux-afs@lists.infradead.org,
        linux-fsdevel@vger.kernel.org, Lai Jiangshan <jiangshanlai@gmail.com>,
        linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 01/11] workqueue: Add a decrement-after-return and wake if 0 facility
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <27488.1504623016.1@warthog.procyon.org.uk>
Date: Tue, 05 Sep 2017 15:50:16 +0100
Message-ID: <27489.1504623016@warthog.procyon.org.uk>
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Tue, 05 Sep 2017 14:50:23 +0000 (UTC)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Tejun Heo <tj@kernel.org> wrote:

> Given how work items are used, I think this is too inviting to abuses
> where people build complex event chains through these counters and
> those chains would be completely opaque.  If the goal is protecting
> .text of a work item, can't we just do that?  Can you please describe
> your use case in more detail?

With one of my latest patches to AFS, there's a set of cell records, where
each cell has a manager work item that mainains that cell, including
refreshing DNS records and excising expired records from the list.  Performing
the excision in the manager work item makes handling the fscache index cookie
easier (you can't have two cookies attached to the same object), amongst other
things.

There's also an overseer work item that maintains a single expiry timer for
all the cells and queues the per-cell work items to do DNS updates and cell
removal.

The reason that the overseer exists is that it makes it easier to do a put on
a cell.  The cell decrements the cell refcount and then wants to schedule the
cell for destruction - but it's no longer permitted to touch the cell.  I
could use atomic_dec_and_lock(), but that's messy.  It's cleaner just to set
the timer on the overseer and leave it to that.

However, if someone does rmmod, I have to be able to clean everything up.  The
overseer timer may be queued or running; the overseer may be queued *and*
running and may get queued again by the timer; and each cell's work item may
be queued *and* running and may get queued again by the manager.

> Why can't it be done via the usual "flush from exit"?

Well, it can, but you need a flush for each separate level of dependencies,
where one dependency will kick off another level of dependency during the
cleanup.

So what I think I would have to do is set a flag to say that no one is allowed
to set the timer now (this shouldn't happen outside of server or volume cache
clearance), delete the timer synchronously, flush the work queue four times
and then do an RCU barrier.

However, since I have volumes with dependencies on servers and cells, possibly
with their own managers, I think I may need up to 12 flushes, possibly with
interspersed RCU barriers.

It's much simpler to count out the objects than to try and get the flushing
right.

David