From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A70701534EC for ; Mon, 8 Dec 2025 19:30:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765222220; cv=none; b=CPQMW7w8GIpH4XbTmJ5sYlfus926lt3cA9iG8103e8oxBuUqwcLvZb9RE+0JSO+V61Qk9pjyDeOXiZ9mpbsv5GYhngyVSTc17AIIXlND51oOYgyGlBK2E2dqssstYP7/WACTXPu03L+eQJAED1GdRNeF+HVarSWEcoC9HzEjlzw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765222220; c=relaxed/simple; bh=nQvRRKid9XfwtE0AP1OBBErmAQ7XF8d/+g6EjsqQ0SU=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: In-Reply-To:Content-Type:Content-Disposition; b=J5O9AZXiKoT9SwFLcWh1CNeeZKDZ99cSpd4QfmjLbwkFbheormurBM6CfqUmzg4ByKequmB7ZdFYjbR7Be8ebZlm04nanS9lvbH9VNXzrrukxMpEXS9TJfMuv+/lerYvVauo9wkxxupjDr4IQlmQPPk9q76mihTiKPud4veXWJk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=HDubBlWP; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="HDubBlWP" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1765222217; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=IILdOtg9q1qS4u+Zz8CzXyjTQ/FxGm6zcSuR6vZ1UhI=; b=HDubBlWPPdxwRC3HD4dhy1R0Kg42PtHHS5dca3hgu1sn8bTg/49LKGIGSCzFJ1g6RcgYSl mkcGWtP1PtaBJ0Cu7ABPanF0G7l2BnN/WhNnoYPh6Gq4vZZKron2eKkiVKoG8CMokW1Jf+ jsLKGLyVrL3PkZ6h5gQyo+/o3u+Pt9E= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-369-LIEoMTP1PA2xa1oB62rppQ-1; Mon, 08 Dec 2025 14:30:14 -0500 X-MC-Unique: LIEoMTP1PA2xa1oB62rppQ-1 X-Mimecast-MFC-AGG-ID: LIEoMTP1PA2xa1oB62rppQ_1765222213 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id ECCE4195606D; Mon, 8 Dec 2025 19:30:12 +0000 (UTC) Received: from bmarzins-01.fast.eng.rdu2.dc.redhat.com (unknown [10.6.23.247]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 5FA101800346; Mon, 8 Dec 2025 19:30:12 +0000 (UTC) Received: from bmarzins-01.fast.eng.rdu2.dc.redhat.com (localhost [127.0.0.1]) by bmarzins-01.fast.eng.rdu2.dc.redhat.com (8.18.1/8.17.1) with ESMTPS id 5B8JUBif1705112 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Mon, 8 Dec 2025 14:30:11 -0500 Received: (from bmarzins@localhost) by bmarzins-01.fast.eng.rdu2.dc.redhat.com (8.18.1/8.18.1/Submit) id 5B8JUAhZ1705111; Mon, 8 Dec 2025 14:30:10 -0500 Date: Mon, 8 Dec 2025 14:30:10 -0500 From: Benjamin Marzinski To: Martin Wilck Cc: Brian Bunker , dm-devel@lists.linux.dev, Krishna Kant Subject: Re: [PATCH v3] mulitpath-tools Add purge capability to multipath-tools Message-ID: References: <20251203161933.90318-1-brian@purestorage.com> <8a93aa8d60eb620469c0e0f4dc535ddc8eb3c7b1.camel@suse.com> Precedence: bulk X-Mailing-List: dm-devel@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In-Reply-To: <8a93aa8d60eb620469c0e0f4dc535ddc8eb3c7b1.camel@suse.com> X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: khHQs5HHx-qTyAaNELgBkunH2s9p7dUdDKFPDrBDA6w_1765222213 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit On Mon, Dec 08, 2025 at 01:27:32PM +0100, Martin Wilck wrote: > On Thu, 2025-12-04 at 14:33 -0500, Benjamin Marzinski wrote: > > On Wed, Dec 03, 2025 at 08:19:33AM -0800, Brian Bunker wrote: > > > How about a simplifying the purge thread slightly. When the checker > > gets > > a path it is supposed to purge, it sets purge_path, just like now. I > > don't think we need to track purge_retries at all. When the purge > > loop > > is triggered, it locks the vecs lock, and finds a single path to > > purge, > > like now. The only info it needs to save is the udev device (with a > > new > > refcount taken like now) and a dup of pp->fd (You need this to > > guarantee > > that the devnode won't get reused if the device is deleted and a new > > one > > is re-added). You could also save the name, but > > udev_device_get_devnode() will get it for you.  Like I mentioned > > above, > > you need to have a pthread cleanup handler that gets run to clean > > these > > up if they are set and the thread is cancelled, just like how the > > vecs > > lock is handled. Then purgeloop() clears pp->purge_path, drops the > > vecs > > lock, does the sysfs delete, and cleans up the purge_info by calling > > pthread_cleanup_pop(). Finally it loops back to reacquiring the vecs > > lock and finding the next path to delete. I don't really see a value > > in > > all the stats tracking. If it goes through all the paths, and doesn't > > find one in need of purging, it waits, just like now. > > I'd go one step further and suggest an approach similar to what we do > in the uevent handler, for which we have a separate list of uevents > that shares no data with the main multipath threads. > > Roughly like this: > > enum purge_state { > PURGE_REQUESTED = 1, > PURGE_DONE, > PURGE_ERROR > }; > > struct path_to_purge { > struct list_head queue; > struct udev_device *udev; /* one ref taken while on the list > */ > int fd; /* dup of pp->fd */ > enum purge_state state; > }; > > // global list_head for the queue of paths to be purged. > static LIST_HEAD(purge_queue); > > If multipathd detects a path to purge in the checker thread, it > allocates and adds a path_to_purge entry with state PURGE_REQUESTED to > the queue and triggers the purge thread e.g. with a pthreads condition > variable. > > The purge thread then walks through the list (without modifying the > list itself), sends a delete request to the kernel and atomically sets > the state field to PURGE_DONE when successful. > > multipathd's checker thread walks the list next time and removes > entries that have state != PURGE_REQUESTED, closing the fd and unref- > ing the struct udev. IOW the resource allocation/freeing for this list > and its elements would be handled by the checker alone. The checker > would not immediately drop internal references to the path; this would > be handled by the remove uevent handler as usual. > > The purge thread would never write to the list, except to the state > attribute of list members (which we could protect from being freed via > RCU). > > This would make the purge thread almost an independent process. I'm a little confused about this design. Lets say a path gets removed while on this list. If the purgeloop doesn't modify the list, then how does the remove function know that it's safe to remove the item. The purgeloop could be in the process notifying sysfs right now, making use of the udev device. Similarly multipathd will change a path's udev device when it gets a new uevent. How does it know if it's safe to update the pointer on this list. Someone has to unref the purgeloops extra ref, and multipathd can't, once it's updated pp->udev and forgotten the old value. You might be able to solve this with more purge_states so that the purgeloop could signal that it's actually working on the device, but I'm pretty sure you'd need actual locking to avoid races here. Aside from the timing issue, we're adding multiple places where multipathd needs to go checking and modifying this list as part of other actions, which seems like it gets pretty fiddly. If we instead say that purgeloop is in charge of cleaning up the list, there would still need to be locking, because multipathd might want to take items off the list, if they become reconnected or removed before the thread gets around to purging. In theory, you could just not modify the list in these cases, and just let the purgeloop run on those paths, but then you would be removing paths that have been reconnected, and possibly keeping a number of references to udev devices and fds for paths that are gone (assuming that deleting devices can actually block for a chunk of time). Also, multipathd would need to check the list so paths didn't get added multiple times, if the checker re-ran before the path was purged. Again, unless you were fine with keeping unnecessary extra references to the device. My original idea was to try to keep purgeloop from needing to grab the vecs lock, but I just kept coming up with extra complications that this would cause. So I settled on something where the normal multipath code just needs to set a path variable to communicate with the purgeloop, and the purgeloop just needs to hold the vecs lock while looking through the path list until it finds a device in need of purging. My other option make the purgeloop do much more work. The idea was that multipathd just put the pathname on a list when it found a disconnected path, and the purge loop would pull items off that list. This would still need locking, but you could probably do it with rcu locking. Alternatively, you could use sockets to send the pathnames to the purgeloop thread, to avoid locking and possibly make the purgeloop it's own process completely. For each path, the purgeloop would 1. open the path device (to make sure it didn't get deleted and reused) 2. check the path holders to make sure it was part of a multipath device 3. run a TUR check on the path to make sure that it was still disconnected 4. delete the path, if so. 5. close the path device. This is completely disconnected from multipath, minus the pathname communication, and should be able to make the right decisions about paths that were removed or reconnected, without sharing any resources that need cleaing up. It just makes the purgeloop duplicate a bunch of work that multipathd already did. Thoughts? Am I misunderstanding your design? -Ben > Opinions? > > Martin