From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3F68F1AC456; Wed, 4 Dec 2024 11:35:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=185.176.79.56 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733312108; cv=none; b=LTMZmGywCDz0AcDGQsw07A0Eg0m13RLAbjpZ0dJuIGh34JKSP6iZAcuFuNxkpNTw+rYnTKgZvnCEmgD1TpuVj0n0rd+oKAxceNyoFwnvoqfvKC1QQdJ5+lGh2RD+5HIMm4wjBHQFrUrE25eTJCD0fgre3RQh79wQhl1XeBjf9pU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733312108; c=relaxed/simple; bh=NCPtgkOSRmGk52EeOB2IiMnKVoVlPuUFd28dOdYf67k=; h=From:To:CC:Subject:Date:Message-ID:References:In-Reply-To: Content-Type:MIME-Version; b=rpGWFFB4QUPZD1jDAxXnRwAuedf+NN56ec+RacH7vysZKZJpUnE+ZnWsK5CfDemSqM6WP0vQONJ/GFZXnlCLwmRs7x9jdtusRLF6TipHR3hkxi29xqI23pvX8/wlsI7e0tDlOp2kxpQ7tBJxsxDiUyDhbfwQRgsLrT9mnU6JbQ0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=185.176.79.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.18.186.31]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4Y3FmB6yxmz6K9P6; Wed, 4 Dec 2024 19:34:18 +0800 (CST) Received: from frapeml500006.china.huawei.com (unknown [7.182.85.219]) by mail.maildlp.com (Postfix) with ESMTPS id BF8B2140113; Wed, 4 Dec 2024 19:35:02 +0800 (CST) Received: from frapeml500007.china.huawei.com (7.182.85.172) by frapeml500006.china.huawei.com (7.182.85.219) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Wed, 4 Dec 2024 12:35:02 +0100 Received: from frapeml500007.china.huawei.com ([7.182.85.172]) by frapeml500007.china.huawei.com ([7.182.85.172]) with mapi id 15.01.2507.039; Wed, 4 Dec 2024 12:35:02 +0100 From: Shiju Jose To: Shiju Jose , Steven Rostedt CC: "dave.jiang@intel.com" , "dan.j.williams@intel.com" , Jonathan Cameron , "alison.schofield@intel.com" , "nifan.cxl@gmail.com" , "vishal.l.verma@intel.com" , "ira.weiny@intel.com" , "dave@stgolabs.net" , "linux-cxl@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Linuxarm , tanxiaofei , "Zengtao (B)" Subject: RE: [PATCH v4 3/6] cxl/events: Update General Media Event Record to CXL spec rev 3.1 Thread-Topic: [PATCH v4 3/6] cxl/events: Update General Media Event Record to CXL spec rev 3.1 Thread-Index: AQHbOy/jEKs9N8Ou/EWSGme10Yml3rLJc4IQgABPV4CAAS95QIAATDQAgAA1zlD///prgIABEYyggAHA21CABm0UcIABWkrw Date: Wed, 4 Dec 2024 11:35:02 +0000 Message-ID: References: <20241120093745.1847-1-shiju.jose@huawei.com> <20241120093745.1847-4-shiju.jose@huawei.com> <180fcfd623c64cdb86cdc9059f749af0@huawei.com> <20241126120237.1598854d@gandalf.local.home> <20241127104132.6c1729e1@gandalf.local.home> <53a299d3cca6417d90d553e8399f834b@huawei.com> <20241127133407.7bc1376a@gandalf.local.home> <3c9808a694d242cab35bab67602edebf@huawei.com> In-Reply-To: Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Precedence: bulk X-Mailing-List: linux-cxl@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 >-----Original Message----- >From: Shiju Jose >Sent: 03 December 2024 15:22 >To: Steven Rostedt >Cc: dave.jiang@intel.com; dan.j.williams@intel.com; Jonathan Cameron >; alison.schofield@intel.com; >nifan.cxl@gmail.com; vishal.l.verma@intel.com; ira.weiny@intel.com; >dave@stgolabs.net; linux-cxl@vger.kernel.org; linux-kernel@vger.kernel.org= ; >Linuxarm ; tanxiaofei ; >Zengtao (B) >Subject: RE: [PATCH v4 3/6] cxl/events: Update General Media Event Record = to >CXL spec rev 3.1 > >[...] >> >>Hi Steve, >> >>I debug this case and please find the info, 1. rasdaemon : read() from >>format file return around 1/3rd of the >> actual data in the file only when the total size of the format's file >> is above 4K bytes (page size), For example, in this case, the total >>size was 4512 bytes, >> but read only 1674 bytes. >> This partial data resulted tep_parse_event() in libtraceevent >>failed internally in the parse_format() >> and since __parse_event() does not return error when parse_format() = fail, >> thus initialization of the event does not fail. >> >> The following solution in rasdaemon solved the issue, >> (provided if no other fix for the above issue with read()), >> 1. read() and accumulate content of format file until EOF reached. >> 2. Increased the buffer size from 4K bytes to 8K bytes. >> 3. May be __parse_event() in libtraceevent and thus >>tep_parse_event() return error >> when parse_format() fail so that the initialization would fail >>if the input data is >> corrupted or partial? > >Hi Steve, > >Identified the root cause of this issue in the kernel: >The read() function for the event's format file calls seq_read() and >seq_read_iter() in the kernel, which allocates a buffer of PAGE_SIZE (4 KB= ) for >sequential reads. However, it detects an overflow in the following marked = call >(seq_has_overflowed()) and returns with partial data read. > >=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >File : https://elixir.bootlin.com/linux/v6.13-rc1/source/fs/seq_file.c > >ssize_t seq_read_iter(struct kiocb *iocb, struct iov_iter *iter) { > struct seq_file *m =3D iocb->ki_filp->private_data; [...] > /* grab buffer if we didn't have one */ > if (!m->buf) { >---------> m->buf =3D seq_buf_alloc(m->size =3D PAGE_SIZE); > if (!m->buf) > goto Enomem; > } > // something left in the buffer - copy it out first [...] > // get a non-empty record in the buffer > m->from =3D 0; > p =3D m->op->start(m, &m->index); > while (1) { > err =3D PTR_ERR(p); > if (!p || IS_ERR(p)) // EOF or an error > break; >[...] > } > // EOF or an error > m->op->stop(m, p); > m->count =3D 0; > goto Done; >Fill: > // one non-empty record is in the buffer; if they want more, > // try to fit more in, but in any case we need to advance > // the iterator once for every record shown. > while (1) { > size_t offs =3D m->count; > loff_t pos =3D m->index; > >[...] > err =3D m->op->show(m, p); > if (err > 0) { // ->show() says "skip it" > m->count =3D offs; >----------> } else if (err || seq_has_overflowed(m)) { > m->count =3D offs; > break; > } > } > m->op->stop(m, p); > n =3D copy_to_iter(m->buf, m->count, iter); > copied +=3D n; > m->count -=3D n; > m->from =3D n; >Done: >[...] > goto Done; >} >EXPORT_SYMBOL(seq_read_iter); >=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >The following fix in the kernel's seq_read_iter() allows userspace to read= the >content of the format file etc in one go if the requested size exceeds PAG= E_SIZE, >and resolves the parsing error caused by the event's format file exceeding >PAGE_SIZE. > >=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >diff --git a/fs/seq_file.c b/fs/seq_file.c index e676c8b0cf5d..f1f1af18056= 2 >100644 >--- a/fs/seq_file.c >+++ b/fs/seq_file.c >@@ -207,7 +207,11 @@ ssize_t seq_read_iter(struct kiocb *iocb, struct iov_= iter >*iter) > > /* grab buffer if we didn't have one */ > if (!m->buf) { >- m->buf =3D seq_buf_alloc(m->size =3D PAGE_SIZE); >+ if (iter->count % PAGE_SIZE) >+ m->size =3D ((iter->count / PAGE_SIZE) + 1) * PAGE= _SIZE; >+ else >+ m->size =3D (iter->count / PAGE_SIZE) * PAGE_SIZE; >+ m->buf =3D seq_buf_alloc(m->size); > if (!m->buf) > goto Enomem; > } >=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >Can you suggest which fix is the right one, kernel based fix or rasdaemon= based >fix (which mentioned in the previous email)? Following change for reading trace event's format file in the kernel worked= fine to fix the parsing error, which seems better than changing the common seq_read().= =20 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index 7266ec2a4eea..9cb1c5a1f070 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -1674,6 +1674,35 @@ static int trace_format_open(struct inode *inode, st= ruct file *file) return 0; } =20 +/** + * trace_format_read - read() method for format file. + * @file: the file to read from + * @buf: the buffer to read to + * @size: the maximum number of bytes to read + * @ppos: the current position in the file + * + * * Return: + * * %0 - No bytes copied (EOF). + * * %>0 - Number of bytes copied. + * * %<0 - Error code. + */ +static ssize_t trace_format_read(struct file *file, char __user *buf, size= _t size, loff_t *ppos) +{ + ssize_t copied =3D 0; + ssize_t ret; + + do { + ret =3D seq_read(file, buf + copied, size - copied, ppos); + if (ret < 0) + return ret; + copied +=3D ret; + if (copied >=3D size) + break; + } while (ret); + + return copied; +} + #ifdef CONFIG_PERF_EVENTS static ssize_t event_id_read(struct file *filp, char __user *ubuf, size_t cnt, loff_t *pp= os) @@ -2157,7 +2186,7 @@ static const struct file_operations ftrace_enable_fop= s =3D { =20 static const struct file_operations ftrace_event_format_fops =3D { .open =3D trace_format_open, - .read =3D seq_read, + .read =3D trace_format_read, .llseek =3D seq_lseek, .release =3D seq_release, }; =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >[...] >>>>-- Steve >>> >> > Thanks, Shiju