linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: K Prateek Nayak <kprateek.nayak@amd.com>
To: Hillf Danton <hdanton@sina.com>,
	"Sapkal, Swapnil" <swapnil.sapkal@amd.com>
Cc: Oleg Nesterov <oleg@redhat.com>,
	Mateusz Guzik <mjguzik@gmail.com>,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	<linux-fsdevel@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] pipe_read: don't wake up the writer if the pipe is still full
Date: Tue, 4 Mar 2025 11:05:57 +0530	[thread overview]
Message-ID: <0d17fc70-01a8-43b4-aec6-5cede5c8f7ba@amd.com> (raw)
In-Reply-To: <20250304050644.2983-1-hdanton@sina.com>

Hello Hillf,

On 3/4/2025 10:36 AM, Hillf Danton wrote:
> On Mon, 3 Mar 2025 15:16:34 +0530 "Sapkal, Swapnil" <swapnil.sapkal@amd.com>
>> On 2/28/2025 10:03 PM, Oleg Nesterov wrote:
>>> And... I know, I know you already hate me ;)
>>>
>>
>> Not at all :)
>>
>>> but if you have time, could you check if this patch (with or without the
>>> previous debugging patch) makes any difference? Just to be sure.
>>>
>>
>> Sure, I will give this a try.
>>
>> But in the meanwhile me and Prateek tried some of the experiments in the weekend.
>> We were able to reproduce this issue on a third generation EPYC system as well as
>> on an Intel Emerald Rapids (2 X INTEL(R) XEON(R) PLATINUM 8592+).
>>
>> We tried heavy hammered tracing approach over the weekend on top of your debug patch.
>> I have attached the debug patch below. With tracing we found the following case for
>> pipe_writable():
>>
>>     hackbench-118768  [206] .....  1029.550601: pipe_write: 000000005eea28ff: 0: 37 38 16: 1
>>
>> Here,
>>
>> head = 37
>> tail = 38
>> max_usage = 16
>> pipe_full() returns 1.
>>
>> Between reading of head and later the tail, the tail seems to have moved ahead of the
>> head leading to wraparound. Applying the following changes I have not yet run into a
>> hang on the original machine where I first saw it:
>>
>> diff --git a/fs/pipe.c b/fs/pipe.c
>> index ce1af7592780..a1931c817822 100644
>> --- a/fs/pipe.c
>> +++ b/fs/pipe.c
>> @@ -417,9 +417,19 @@ static inline int is_packetized(struct file *file)
>>    /* Done while waiting without holding the pipe lock - thus the READ_ONCE() */
>>    static inline bool pipe_writable(const struct pipe_inode_info *pipe)
>>    {
>> -	unsigned int head = READ_ONCE(pipe->head);
>> -	unsigned int tail = READ_ONCE(pipe->tail);
>>    	unsigned int max_usage = READ_ONCE(pipe->max_usage);
>> +	unsigned int head, tail;
>> +
>> +	tail = READ_ONCE(pipe->tail);
>> +	/*
>> +	 * Since the unsigned arithmetic in this lockless preemptible context
>> +	 * relies on the fact that the tail can never be ahead of head, read
>> +	 * the head after the tail to ensure we've not missed any updates to
>> +	 * the head. Reordering the reads can cause wraparounds and give the
>> +	 * illusion that the pipe is full.
>> +	 */
>> +	smp_rmb();
>> +	head = READ_ONCE(pipe->head);
>>    
>>    	return !pipe_full(head, tail, max_usage) ||
>>    		!READ_ONCE(pipe->readers);
>> ---
>>
>> smp_rmb() on x86 is a nop and even without the barrier we were not able to
>> reproduce the hang even after 10000 iterations.
>>
> My $.02 that changes the wait condition.
> Not sure it makes sense for you.
> 
> --- x/fs/pipe.c
> +++ y/fs/pipe.c
> @@ -430,7 +430,7 @@ pipe_write(struct kiocb *iocb, struct io
>   {
>   	struct file *filp = iocb->ki_filp;
>   	struct pipe_inode_info *pipe = filp->private_data;
> -	unsigned int head;
> +	unsigned int head, tail;
>   	ssize_t ret = 0;
>   	size_t total_len = iov_iter_count(from);
>   	ssize_t chars;
> @@ -573,11 +573,13 @@ pipe_write(struct kiocb *iocb, struct io
>   		 * after waiting we need to re-check whether the pipe
>   		 * become empty while we dropped the lock.
>   		 */
> +		tail = pipe->tail;
>   		mutex_unlock(&pipe->mutex);
>   		if (was_empty)
>   			wake_up_interruptible_sync_poll(&pipe->rd_wait, EPOLLIN | EPOLLRDNORM);
>   		kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
> -		wait_event_interruptible_exclusive(pipe->wr_wait, pipe_writable(pipe));
> +		wait_event_interruptible_exclusive(pipe->wr_wait,
> +				!READ_ONCE(pipe->readers) || tail != READ_ONCE(pipe->tail));

That could work too for the case highlighted but in case the head too
has moved by the time the writer wakes up, it'll lead to an extra
wakeup.

Linus' diff seems cleaner and seems to cover all racy scenarios.

>   		mutex_lock(&pipe->mutex);
>   		was_empty = pipe_empty(pipe->head, pipe->tail);
>   		wake_next_writer = true;
> --

-- 
Thanks and Regards,
Prateek


  reply	other threads:[~2025-03-04  5:36 UTC|newest]

Thread overview: 109+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-02 14:07 [PATCH] pipe_read: don't wake up the writer if the pipe is still full Oleg Nesterov
2025-01-02 16:20 ` WangYuli
2025-01-02 16:46   ` Oleg Nesterov
2025-01-04  8:42 ` Christian Brauner
2025-01-31  9:49 ` K Prateek Nayak
2025-01-31 13:23   ` Oleg Nesterov
2025-01-31 20:06   ` Linus Torvalds
2025-02-02 17:01     ` Oleg Nesterov
2025-02-02 18:39       ` Linus Torvalds
2025-02-02 19:32         ` Oleg Nesterov
2025-02-04 11:17         ` Christian Brauner
2025-02-03  9:05       ` K Prateek Nayak
2025-02-04 13:49         ` Oleg Nesterov
2025-02-24  9:26 ` Sapkal, Swapnil
2025-02-24 14:24   ` Oleg Nesterov
2025-02-24 18:36     ` Linus Torvalds
2025-02-25 14:26       ` Oleg Nesterov
2025-02-25 11:57     ` Oleg Nesterov
2025-02-26  5:55       ` Sapkal, Swapnil
2025-02-26 11:38         ` Oleg Nesterov
2025-02-26 17:56           ` Sapkal, Swapnil
2025-02-26 18:12             ` Oleg Nesterov
2025-03-03 13:00       ` Alexey Gladkov
2025-03-03 15:46         ` K Prateek Nayak
2025-03-03 17:18           ` Alexey Gladkov
2025-02-26 13:18     ` Mateusz Guzik
2025-02-26 13:21       ` Mateusz Guzik
2025-02-26 17:16         ` Oleg Nesterov
2025-02-27 16:18       ` Sapkal, Swapnil
2025-02-27 16:34         ` Mateusz Guzik
2025-02-27 21:12         ` Oleg Nesterov
2025-02-28  5:58           ` Sapkal, Swapnil
2025-02-28 14:30             ` Oleg Nesterov
2025-02-28 16:33               ` Oleg Nesterov
2025-03-03  9:46                 ` Sapkal, Swapnil
2025-03-03 14:37                   ` Mateusz Guzik
2025-03-03 14:51                     ` Mateusz Guzik
2025-03-03 15:31                       ` K Prateek Nayak
2025-03-03 17:54                         ` Mateusz Guzik
2025-03-03 18:11                           ` Linus Torvalds
2025-03-03 18:33                             ` Mateusz Guzik
2025-03-03 18:55                               ` Linus Torvalds
2025-03-03 19:06                                 ` Mateusz Guzik
2025-03-03 20:27                                 ` Oleg Nesterov
2025-03-03 20:46                                   ` Linus Torvalds
2025-03-04  5:31                                     ` K Prateek Nayak
2025-03-04  6:32                                       ` Linus Torvalds
2025-03-04 12:54                                     ` Oleg Nesterov
2025-03-04 13:25                                       ` Oleg Nesterov
2025-03-04 18:28                                       ` Linus Torvalds
2025-03-04 22:11                                         ` Oleg Nesterov
2025-03-05  4:40                                         ` K Prateek Nayak
2025-03-05  4:52                                           ` Linus Torvalds
2025-03-04 13:51                                     ` [PATCH] fs/pipe: Read pipe->{head,tail} atomically outside pipe->mutex K Prateek Nayak
2025-03-04 18:36                                       ` Alexey Gladkov
2025-03-04 19:03                                       ` Linus Torvalds
2025-03-05 15:31                                     ` [PATCH] pipe_read: don't wake up the writer if the pipe is still full Rasmus Villemoes
2025-03-05 16:50                                       ` Linus Torvalds
2025-03-06  9:48                                         ` Rasmus Villemoes
2025-03-06 14:42                                           ` Rasmus Villemoes
2025-03-05 16:40                                     ` Linus Torvalds
2025-03-06  8:35                                       ` Rasmus Villemoes
2025-03-06 17:59                                         ` Linus Torvalds
2025-03-06  9:28                                       ` Rasmus Villemoes
2025-03-06 11:39                                       ` [RFC PATCH 0/3] pipe: Convert pipe->{head,tail} to unsigned short K Prateek Nayak
2025-03-06 11:39                                         ` [RFC PATCH 1/3] fs/pipe: Limit the slots in pipe_resize_ring() K Prateek Nayak
2025-03-06 12:28                                           ` Oleg Nesterov
2025-03-06 15:26                                             ` K Prateek Nayak
2025-03-06 11:39                                         ` [RFC PATCH 2/3] fs/splice: Atomically read pipe->{head,tail} in opipe_prep() K Prateek Nayak
2025-03-06 11:39                                         ` [RFC PATCH 3/3] treewide: pipe: Convert all references to pipe->{head,tail,max_usage,ring_size} to unsigned short K Prateek Nayak
2025-03-06 12:32                                           ` Oleg Nesterov
2025-03-06 12:41                                             ` Oleg Nesterov
2025-03-06 15:33                                               ` K Prateek Nayak
2025-03-06 18:04                                                 ` Linus Torvalds
2025-03-06 14:27                                             ` Rasmus Villemoes
2025-03-03 18:32                           ` [PATCH] pipe_read: don't wake up the writer if the pipe is still full K Prateek Nayak
2025-03-04  5:22                             ` K Prateek Nayak
2025-03-03 16:49                   ` Oleg Nesterov
2025-03-04  5:06                   ` Hillf Danton
2025-03-04  5:35                     ` K Prateek Nayak [this message]
2025-03-04 10:29                       ` Hillf Danton
2025-03-04 12:34                         ` Oleg Nesterov
2025-03-04 23:35                           ` Hillf Danton
2025-03-04 23:49                             ` Oleg Nesterov
2025-03-05  4:56                               ` Hillf Danton
2025-03-05 11:44                                 ` Oleg Nesterov
2025-03-05 22:46                                   ` Hillf Danton
2025-03-06  9:30                                     ` Oleg Nesterov
2025-03-07  6:08                                       ` Hillf Danton
2025-03-07  6:24                                         ` K Prateek Nayak
2025-03-07 10:46                                           ` Hillf Danton
2025-03-07 11:29                                             ` Oleg Nesterov
2025-03-07 12:34                                               ` Oleg Nesterov
2025-03-07 23:56                                                 ` Hillf Danton
2025-03-09 14:01                                                   ` K Prateek Nayak
2025-03-09 17:02                                                   ` Oleg Nesterov
2025-03-10 10:49                                                     ` Hillf Danton
2025-03-10 11:09                                                       ` Oleg Nesterov
2025-03-10 11:37                                                         ` Hillf Danton
2025-03-10 12:43                                                           ` Oleg Nesterov
2025-03-10 23:33                                                             ` Hillf Danton
2025-03-11  0:26                                                               ` Linus Torvalds
2025-03-11  6:54                                                               ` Oleg Nesterov
     [not found]                                                               ` <20250311112922.3342-1-hdanton@sina.com>
2025-03-11 11:53                                                                 ` Oleg Nesterov
2025-03-07 11:26                                         ` Oleg Nesterov
2025-02-27 12:50   ` Oleg Nesterov
2025-02-27 13:52     ` Oleg Nesterov
2025-02-27 15:59     ` Mateusz Guzik
2025-02-27 16:28       ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0d17fc70-01a8-43b4-aec6-5cede5c8f7ba@amd.com \
    --to=kprateek.nayak@amd.com \
    --cc=hdanton@sina.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mjguzik@gmail.com \
    --cc=oleg@redhat.com \
    --cc=swapnil.sapkal@amd.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).