From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 29E63C02182 for ; Thu, 23 Jan 2025 03:06:20 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id C0A0010E26D; Thu, 23 Jan 2025 03:06:19 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="ZZFd89I6"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.15]) by gabe.freedesktop.org (Postfix) with ESMTPS id 5C1AE10E26D for ; Thu, 23 Jan 2025 03:04:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1737601462; x=1769137462; h=date:message-id:from:to:cc:subject:in-reply-to: references:mime-version; bh=S+i2XF9jNF7dIcMiZgmNqDr0x6u9zDLoDmsMwLqKOyY=; b=ZZFd89I64j15Mng68TfeQSoVKRuPOHXPz+0dTvUkzS6swwSLa99fJXI2 UgqXXRR65F6JcUQAWT+TB6W8WLlZ00FhHoW5tySXA1BTWCINGvcdKmDQL bjuI31jiFeOq7jcgrp92vd5MupQUHh+heSLY6MYpbglK6etQCcP0rU7XL PYYHQToPqw/52hCRQBaTz2lgq22kdW4hnDu/usonxw2ycugXILIrnCQXN jObnTXSJuaJSkoMEVM+NS0KZKO9aLayiWIca0wyRp7wp6xAIxBbkoRGnQ 9OFSlC5GrFySb81uQlF0a5SbzQ4G3EgJWHZpldel126PDxt4Q6TDX3PMH A==; X-CSE-ConnectionGUID: vghOSO+TROq5bcXN3Y3RvA== X-CSE-MsgGUID: udIpFSo6TbSefuoXRTTQzg== X-IronPort-AV: E=McAfee;i="6700,10204,11323"; a="38251472" X-IronPort-AV: E=Sophos;i="6.13,227,1732608000"; d="scan'208";a="38251472" Received: from fmviesa008.fm.intel.com ([10.60.135.148]) by fmvoesa109.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jan 2025 19:04:22 -0800 X-CSE-ConnectionGUID: vhi0ghq+Q2mibi6k15z7zQ== X-CSE-MsgGUID: Fst8l/C2S2SGcoa9CawIRw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,227,1732608000"; d="scan'208";a="107439305" Received: from orsosgc001.jf.intel.com (HELO orsosgc001.intel.com) ([10.165.21.142]) by fmviesa008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jan 2025 19:04:21 -0800 Date: Wed, 22 Jan 2025 19:04:21 -0800 Message-ID: <85frlafcy2.wl-ashutosh.dixit@intel.com> From: "Dixit, Ashutosh" To: Rodrigo Vivi Cc: , Umesh Nerlige Ramappa Subject: Re: [PATCH v2 2/2] drm/xe/oa: Fix locking for stream->pollin In-Reply-To: References: <20250122040204.3239397-1-ashutosh.dixit@intel.com> <20250122040204.3239397-3-ashutosh.dixit@intel.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?ISO-8859-4?Q?Goj=F2?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/28.2 (x86_64-redhat-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" On Wed, 22 Jan 2025 02:23:23 -0800, Rodrigo Vivi wrote: > Hi Rodrigo, Thanks for your inputs, it helped me look at the code more closely and clarify some things. I have responded to your comments below, but in case you don't want to go through that, the summary is that I would like to drop this patch from consideration at this time, basically because it needs more changes. > On Tue, Jan 21, 2025 at 08:02:04PM -0800, Ashutosh Dixit wrote: > > Previously locking was skipped for stream->pollin. This "mostly" worked > > because pollin is u32/bool, even when stream->pollin is accessed > > concurrently in multiple threads. However, with stream->pollin moving under > > stream->oa_buffer.ptr_lock in this series, implement the correct way to > > access stream->pollin, which is to access it under > > stream->oa_buffer.ptr_lock. > > > > v2: Update commit message to explain the "why" of this change (Rodrigo) > > Document the change in scope for stream->oa_buffer.ptr_lock (Rodrigo) > > First of all thanks for the rework. > But I believe I didn't have enough coffee today yet, because > I'm still failing to understand why... > > Breaking your explanation to see if I can understand: > 'mostly' - Why mostly? Did we face bugs? > > 'worked because pollin is u32/bool' - this sounds like 'works by luck' No, there are no known bugs in the current code in the kernel. That is what 'mostly' is supposed to mean: basically the requirements on this code are pretty lax, and even if writes into the boolean stream->pollin from different threads cross each other, or they stomp on each other, things work out. So not 'works by luck' or 'works most of the time' but the code works even in absence of this locking. > > 'with stream->pollin moving under stream->oa_buffer.ptr_lock' - Why? > > I believe this is the main why that I had yesterday and that continues > today. Why are we using the oa_buffer pointer lock to also protect > the a stream variable? > > Why don't you use the stream_lock? Or why don't you create a dedicated > polling_lock? > So, stream->oa_buffer.ptr_lock _is_ the correct lock to use for 'stream->pollin', because 'stream->pollin' is intimately connected to the same pointers 'ptr_lock' is protecting. You can see this in Patch 1. Also, there is basically one 'struct xe_oa_buffer' per stream (and stream_lock is a mutex which can't be used from interrupt context). So rather than using a stream level lock, we can just move pollin into stream->oa_buffer. So it will become stream->oa_buffer.pollin. > I'm sorry for not been clear yesterday, wasting your time and 1 cycle... No, thanks for your comments, it made me think about all this. > > @@ -562,8 +563,10 @@ static ssize_t xe_oa_read(struct file *file, char __user *buf, > > * Also in case of -EIO, we have already waited for data before returning > > * -EIO, so need to wait again > > */ > > + spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags); > > if (ret != -ENOSPC && ret != -EIO) > > stream->pollin = false; > > + spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags); Also this is the part I don't like, this should be moved into xe_oa_append_reports() where stream->oa_buffer.ptr_lock is already held. Otherwise we are dropping and re-grabbing the lock etc. So imo the basic direction of this patch is correct but I want to make all these changes and resubmit this as part of a separate series. So please ignore this patch for now. I will go ahead and merge Patch 1 since that is ok and was already reviewed as a separate series: https://patchwork.freedesktop.org/series/143575/ Thanks. -- Ashutosh > > > > /* Possible values for ret are 0, -EFAULT, -ENOSPC, -EIO, -EINVAL, ... */ > > return offset ?: (ret ?: -EAGAIN); > > @@ -573,6 +576,7 @@ static __poll_t xe_oa_poll_locked(struct xe_oa_stream *stream, > > struct file *file, poll_table *wait) > > { > > __poll_t events = 0; > > + unsigned long flags; > > > > poll_wait(file, &stream->poll_wq, wait); > > > > @@ -582,8 +586,10 @@ static __poll_t xe_oa_poll_locked(struct xe_oa_stream *stream, > > * in use. We rely on hrtimer xe_oa_poll_check_timer_cb to notify us when there > > * are samples to read > > */ > > + spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags); > > if (stream->pollin) > > events |= EPOLLIN; > > + spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags); > > > > return events; > > } > > diff --git a/drivers/gpu/drm/xe/xe_oa_types.h b/drivers/gpu/drm/xe/xe_oa_types.h > > index 52e33c37d5ee8..5c4ea13f646fc 100644 > > --- a/drivers/gpu/drm/xe/xe_oa_types.h > > +++ b/drivers/gpu/drm/xe/xe_oa_types.h > > @@ -159,7 +159,7 @@ struct xe_oa_buffer { > > /** @vaddr: mapped vaddr of the OA buffer */ > > u8 *vaddr; > > > > - /** @ptr_lock: Lock protecting reads/writes to head/tail pointers */ > > + /** @ptr_lock: Lock protecting reads/writes to head/tail pointers and stream->pollin */ > > spinlock_t ptr_lock; > > > > /** @head: Cached head to read from */ > > -- > > 2.47.1 > >