From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C2F8818C034 for ; Fri, 15 Nov 2024 14:01:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731679304; cv=none; b=e34FufeqHtZlTInS53ZrdiOjdaJkAYBu2grZNSMXUn9E06gn50pE+s2Awa6UnrZJvT8CPiD9K3WkJnrn9Wj8vN5dq4oTkNJ/Haxc/baZr6Xulm7ttIu5BFIYSZWdd9p35b6Y6Agbd+n1O28xO45V0yMi5VxDtJm8NVXtKATjT7A= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731679304; c=relaxed/simple; bh=B4KHL+8l4Uin0NSIEfjnQ3Vj5XtM+B15BTpqbYl19Vg=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=YAVeno5VFTDF+iJSVF8neOF/KBV/xaukRdf9HamBu9YptFixxR/btlZfDMPlvTb7/Eehg1jnSHEJMlIJXb2chfDSU77qVai+7uxUwh1+t7DXoWmJeuZ51affGtPwEvcTdQkAm51uQqNv1jicozTFeDtir3mM2luRwCgR+vU8TeE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=MWaNPV8F; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="MWaNPV8F" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1731679301; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=dmYQVZ33rJyNBXEcWOvkPyU2ctnWfc31NEOf4H61ApE=; b=MWaNPV8FbLPNIIxWOtiUKwyKQ6hE7/G/v7kzOzUJbfPm+nFQpdleMQA29Ruvb73DUSBNcZ XPblacvZZ9xjj24K97DupsHP+EbOcXVodf0Pb9PWqZD63T8o9EJTeHjpdGPjIDFHmYfWmJ UEM5PT4TPEWBrQoli5mQXWn1dO2R1ts= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-489-AGvBuC85MRqwpsu31gl76w-1; Fri, 15 Nov 2024 09:01:34 -0500 X-MC-Unique: AGvBuC85MRqwpsu31gl76w-1 X-Mimecast-MFC-AGG-ID: AGvBuC85MRqwpsu31gl76w Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 9C9BD1975AC7; Fri, 15 Nov 2024 14:01:30 +0000 (UTC) Received: from bfoster (unknown [10.22.80.120]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id E5B191953880; Fri, 15 Nov 2024 14:01:27 +0000 (UTC) Date: Fri, 15 Nov 2024 09:03:00 -0500 From: Brian Foster To: Dave Chinner Cc: Long Li , brauner@kernel.org, djwong@kernel.org, cem@kernel.org, linux-xfs@vger.kernel.org, yi.zhang@huawei.com, houtao1@huawei.com, yangerkun@huawei.com Subject: Re: [PATCH v2 1/2] iomap: fix zero padding data issue in concurrent append writes Message-ID: References: <20241113091907.56937-1-leo.lilong@huawei.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 On Fri, Nov 15, 2024 at 07:01:58AM +1100, Dave Chinner wrote: > On Thu, Nov 14, 2024 at 01:04:31PM -0500, Brian Foster wrote: > > On Thu, Nov 14, 2024 at 10:34:26AM +0800, Long Li wrote: > > > On Wed, Nov 13, 2024 at 11:13:49AM -0500, Brian Foster wrote: > > ISTM that for the above merge scenario to happen we'd either need > > writeback of the thread 1 write to race just right with the thread 2 > > write, or have two writeback cycles where the completion of the first is > > still pending by the time the second completes. Either of those seem far > > less likely than either writeback seeing i_size == 8k from the start, or > > the thread 2 write completing sometime after the thread 1 ioend has > > already been completed. Hm? > > I think that this should be fairly easy to trigger with concurrent > sub-block buffered writes to O_DSYNC|O_APPEND opened files. The fact > we drop the IOLOCK before calling generic_write_sync() to flush the > data pretty much guarantees that there will be IO in flight whilst > other write() calls have extended the file in memory and are then > waiting for the current writeback on the folio to complete before > submitting their own writeback IO. > Hmmm.. that kind of sounds like opposite to me. Note that the example given was distinctly not append only, as that allows multiple technically "merge-worthy" ioends to be in completion at the same time. If you're doing O_DSYNC|O_APPEND sub-block buffered writes, then by definition there is going to be folio overlap across writes, and we don't submit a dirty&&writeback folio for writeback until preexisting writeback state clears. So ISTM that creates a situation where even if the append I/O is multithreaded, you'll just end up more likely to lockstep across writebacks and won't ever submit the second ioend before the first completes. Hm? That said, we're kind of getting in the weeds here.. I don't think it matters that much if these ioends merge.. I'm basically throwing out the proposition that maybe it's not worth the additional code to ensure that they always do. I.e., suppose we opted to trim the io_size when appropriate based on i_size so the completion side size update is accurate, but otherwise just left off the rounding helper thing and let the existing code work as it is. Would that lack of guaranteed block alignment have any practical impact on functionality or performance? ISTM the add_to_ioend() part is superfluous so it probably wouldn't have much effect on I/O sizes, if any. The skipped merge example Long Li gives seems technically possible, but I'm not aware of a workload where that would occur frequently enough that failing to merge such an ioend would have a noticeable impact.. hm? Again.. not something I care tremendously about, just trying to keep things simple if possible. If it were me and there's not something we can put to that obviously breaks, I'd start with that and then if that does prove wrong, it's obviously simple to fix by tacking on the rounding thing later. Just my .02. Brian > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com >