[PATCH vfs/vfs.fixes v2] eventpoll: Set epoll timeout if it's in the future

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH vfs/vfs.fixes v2] eventpoll: Set epoll timeout if it's in the future
@ 2025-04-16 18:58 Joe Damato
  2025-04-17  7:56 ` Christian Brauner
  2025-04-26 12:29 ` Christian Brauner
  0 siblings, 2 replies; 11+ messages in thread
From: Joe Damato @ 2025-04-16 18:58 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: jack, brauner, Joe Damato, Alexander Viro, David S. Miller,
	Eric Dumazet, Sridhar Samudrala, Alexander Duyck, open list

Avoid an edge case where epoll_wait arms a timer and calls schedule()
even if the timer will expire immediately.

For example: if the user has specified an epoll busy poll usecs which is
equal or larger than the epoll_wait/epoll_pwait2 timeout, it is
unnecessary to call schedule_hrtimeout_range; the busy poll usecs have
consumed the entire timeout duration so it is unnecessary to induce
scheduling latency by calling schedule() (via schedule_hrtimeout_range).

This can be measured using a simple bpftrace script:

tracepoint:sched:sched_switch
/ args->prev_pid == $1 /
{
  print(kstack());
  print(ustack());
}

Before this patch is applied:

  Testing an epoll_wait app with busy poll usecs set to 1000, and
  epoll_wait timeout set to 1ms using the script above shows:

     __traceiter_sched_switch+69
     __schedule+1495
     schedule+32
     schedule_hrtimeout_range+159
     do_epoll_wait+1424
     __x64_sys_epoll_wait+97
     do_syscall_64+95
     entry_SYSCALL_64_after_hwframe+118

     epoll_wait+82

  Which is unexpected; the busy poll usecs should have consumed the
  entire timeout and there should be no reason to arm a timer.

After this patch is applied: the same test scenario does not generate a
call to schedule() in the above edge case. If the busy poll usecs are
reduced (for example usecs: 100, epoll_wait timeout 1ms) the timer is
armed as expected.

Fixes: bf3b9f6372c4 ("epoll: Add busy poll support to epoll with socket fds.")
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 v2: 
   - No longer an RFC and rebased on vfs/vfs.fixes
   - Added Jan's Reviewed-by
   - Added Fixes tag
   - No functional changes from the RFC

 rfcv1: https://lore.kernel.org/linux-fsdevel/20250415184346.39229-1-jdamato@fastly.com/

 fs/eventpoll.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 100376863a44..4bc264b854c4 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1996,6 +1996,14 @@ static int ep_try_send_events(struct eventpoll *ep,
 	return res;
 }
 
+static int ep_schedule_timeout(ktime_t *to)
+{
+	if (to)
+		return ktime_after(*to, ktime_get());
+	else
+		return 1;
+}
+
 /**
  * ep_poll - Retrieves ready events, and delivers them to the caller-supplied
  *           event buffer.
@@ -2103,7 +2111,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
 
 		write_unlock_irq(&ep->lock);
 
-		if (!eavail)
+		if (!eavail && ep_schedule_timeout(to))
 			timed_out = !schedule_hrtimeout_range(to, slack,
 							      HRTIMER_MODE_ABS);
 		__set_current_state(TASK_RUNNING);

base-commit: a681b7c17dd21d5aa0da391ceb27a2007ba970a4
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH vfs/vfs.fixes v2] eventpoll: Set epoll timeout if it's in the future
  2025-04-16 18:58 [PATCH vfs/vfs.fixes v2] eventpoll: Set epoll timeout if it's in the future Joe Damato
@ 2025-04-17  7:56 ` Christian Brauner
  2025-04-26 12:29 ` Christian Brauner
  1 sibling, 0 replies; 11+ messages in thread
From: Christian Brauner @ 2025-04-17  7:56 UTC (permalink / raw)
  To: linux-fsdevel, Joe Damato
  Cc: Christian Brauner, jack, Alexander Viro, David S. Miller,
	Eric Dumazet, Sridhar Samudrala, Alexander Duyck, linux-kernel

On Wed, 16 Apr 2025 18:58:25 +0000, Joe Damato wrote:
> Avoid an edge case where epoll_wait arms a timer and calls schedule()
> even if the timer will expire immediately.
> 
> For example: if the user has specified an epoll busy poll usecs which is
> equal or larger than the epoll_wait/epoll_pwait2 timeout, it is
> unnecessary to call schedule_hrtimeout_range; the busy poll usecs have
> consumed the entire timeout duration so it is unnecessary to induce
> scheduling latency by calling schedule() (via schedule_hrtimeout_range).
> 
> [...]

Applied to the vfs.fixes branch of the vfs/vfs.git tree.
Patches in the vfs.fixes branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.fixes

[1/1] eventpoll: Set epoll timeout if it's in the future
      https://git.kernel.org/vfs/vfs/c/0a65bc27bd64

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH vfs/vfs.fixes v2] eventpoll: Set epoll timeout if it's in the future
  2025-04-16 18:58 [PATCH vfs/vfs.fixes v2] eventpoll: Set epoll timeout if it's in the future Joe Damato
  2025-04-17  7:56 ` Christian Brauner
@ 2025-04-26 12:29 ` Christian Brauner
  2025-04-28 12:14   ` Jan Kara
  1 sibling, 1 reply; 11+ messages in thread
From: Christian Brauner @ 2025-04-26 12:29 UTC (permalink / raw)
  To: Joe Damato
  Cc: linux-fsdevel, jack, Alexander Viro, David S. Miller,
	Eric Dumazet, Sridhar Samudrala, Alexander Duyck, open list

On Wed, Apr 16, 2025 at 06:58:25PM +0000, Joe Damato wrote:
> Avoid an edge case where epoll_wait arms a timer and calls schedule()
> even if the timer will expire immediately.
> 
> For example: if the user has specified an epoll busy poll usecs which is
> equal or larger than the epoll_wait/epoll_pwait2 timeout, it is
> unnecessary to call schedule_hrtimeout_range; the busy poll usecs have
> consumed the entire timeout duration so it is unnecessary to induce
> scheduling latency by calling schedule() (via schedule_hrtimeout_range).
> 
> This can be measured using a simple bpftrace script:
> 
> tracepoint:sched:sched_switch
> / args->prev_pid == $1 /
> {
>   print(kstack());
>   print(ustack());
> }
> 
> Before this patch is applied:
> 
>   Testing an epoll_wait app with busy poll usecs set to 1000, and
>   epoll_wait timeout set to 1ms using the script above shows:
> 
>      __traceiter_sched_switch+69
>      __schedule+1495
>      schedule+32
>      schedule_hrtimeout_range+159
>      do_epoll_wait+1424
>      __x64_sys_epoll_wait+97
>      do_syscall_64+95
>      entry_SYSCALL_64_after_hwframe+118
> 
>      epoll_wait+82
> 
>   Which is unexpected; the busy poll usecs should have consumed the
>   entire timeout and there should be no reason to arm a timer.
> 
> After this patch is applied: the same test scenario does not generate a
> call to schedule() in the above edge case. If the busy poll usecs are
> reduced (for example usecs: 100, epoll_wait timeout 1ms) the timer is
> armed as expected.
> 
> Fixes: bf3b9f6372c4 ("epoll: Add busy poll support to epoll with socket fds.")
> Signed-off-by: Joe Damato <jdamato@fastly.com>
> Reviewed-by: Jan Kara <jack@suse.cz>
> ---
>  v2: 
>    - No longer an RFC and rebased on vfs/vfs.fixes
>    - Added Jan's Reviewed-by
>    - Added Fixes tag
>    - No functional changes from the RFC
> 
>  rfcv1: https://lore.kernel.org/linux-fsdevel/20250415184346.39229-1-jdamato@fastly.com/
> 
>  fs/eventpoll.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index 100376863a44..4bc264b854c4 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -1996,6 +1996,14 @@ static int ep_try_send_events(struct eventpoll *ep,
>  	return res;
>  }
>  
> +static int ep_schedule_timeout(ktime_t *to)
> +{
> +	if (to)
> +		return ktime_after(*to, ktime_get());
> +	else
> +		return 1;
> +}
> +
>  /**
>   * ep_poll - Retrieves ready events, and delivers them to the caller-supplied
>   *           event buffer.
> @@ -2103,7 +2111,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
>  
>  		write_unlock_irq(&ep->lock);
>  
> -		if (!eavail)
> +		if (!eavail && ep_schedule_timeout(to))
>  			timed_out = !schedule_hrtimeout_range(to, slack,
>  							      HRTIMER_MODE_ABS);

Isn't this buggy? If @to is non-NULL and ep_schedule_timeout() returns
false you want to set timed_out to 1 to break the wait. Otherwise you
hang, no?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH vfs/vfs.fixes v2] eventpoll: Set epoll timeout if it's in the future
  2025-04-26 12:29 ` Christian Brauner
@ 2025-04-28 12:14   ` Jan Kara
  2025-04-28 13:18     ` Tudor Ambarus
  2025-04-28 16:50     ` Joe Damato
  0 siblings, 2 replies; 11+ messages in thread
From: Jan Kara @ 2025-04-28 12:14 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Joe Damato, linux-fsdevel, jack, Alexander Viro, David S. Miller,
	Eric Dumazet, Sridhar Samudrala, Alexander Duyck, open list

On Sat 26-04-25 14:29:15, Christian Brauner wrote:
> On Wed, Apr 16, 2025 at 06:58:25PM +0000, Joe Damato wrote:
> > Avoid an edge case where epoll_wait arms a timer and calls schedule()
> > even if the timer will expire immediately.
> > 
> > For example: if the user has specified an epoll busy poll usecs which is
> > equal or larger than the epoll_wait/epoll_pwait2 timeout, it is
> > unnecessary to call schedule_hrtimeout_range; the busy poll usecs have
> > consumed the entire timeout duration so it is unnecessary to induce
> > scheduling latency by calling schedule() (via schedule_hrtimeout_range).
> > 
> > This can be measured using a simple bpftrace script:
> > 
> > tracepoint:sched:sched_switch
> > / args->prev_pid == $1 /
> > {
> >   print(kstack());
> >   print(ustack());
> > }
> > 
> > Before this patch is applied:
> > 
> >   Testing an epoll_wait app with busy poll usecs set to 1000, and
> >   epoll_wait timeout set to 1ms using the script above shows:
> > 
> >      __traceiter_sched_switch+69
> >      __schedule+1495
> >      schedule+32
> >      schedule_hrtimeout_range+159
> >      do_epoll_wait+1424
> >      __x64_sys_epoll_wait+97
> >      do_syscall_64+95
> >      entry_SYSCALL_64_after_hwframe+118
> > 
> >      epoll_wait+82
> > 
> >   Which is unexpected; the busy poll usecs should have consumed the
> >   entire timeout and there should be no reason to arm a timer.
> > 
> > After this patch is applied: the same test scenario does not generate a
> > call to schedule() in the above edge case. If the busy poll usecs are
> > reduced (for example usecs: 100, epoll_wait timeout 1ms) the timer is
> > armed as expected.
> > 
> > Fixes: bf3b9f6372c4 ("epoll: Add busy poll support to epoll with socket fds.")
> > Signed-off-by: Joe Damato <jdamato@fastly.com>
> > Reviewed-by: Jan Kara <jack@suse.cz>
> > ---
> >  v2: 
> >    - No longer an RFC and rebased on vfs/vfs.fixes
> >    - Added Jan's Reviewed-by
> >    - Added Fixes tag
> >    - No functional changes from the RFC
> > 
> >  rfcv1: https://lore.kernel.org/linux-fsdevel/20250415184346.39229-1-jdamato@fastly.com/
> > 
> >  fs/eventpoll.c | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> > index 100376863a44..4bc264b854c4 100644
> > --- a/fs/eventpoll.c
> > +++ b/fs/eventpoll.c
> > @@ -1996,6 +1996,14 @@ static int ep_try_send_events(struct eventpoll *ep,
> >  	return res;
> >  }
> >  
> > +static int ep_schedule_timeout(ktime_t *to)
> > +{
> > +	if (to)
> > +		return ktime_after(*to, ktime_get());
> > +	else
> > +		return 1;
> > +}
> > +
> >  /**
> >   * ep_poll - Retrieves ready events, and delivers them to the caller-supplied
> >   *           event buffer.
> > @@ -2103,7 +2111,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
> >  
> >  		write_unlock_irq(&ep->lock);
> >  
> > -		if (!eavail)
> > +		if (!eavail && ep_schedule_timeout(to))
> >  			timed_out = !schedule_hrtimeout_range(to, slack,
> >  							      HRTIMER_MODE_ABS);
> 
> Isn't this buggy? If @to is non-NULL and ep_schedule_timeout() returns
> false you want to set timed_out to 1 to break the wait. Otherwise you
> hang, no?

Yep, looks like that. Good spotting!

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH vfs/vfs.fixes v2] eventpoll: Set epoll timeout if it's in the future
  2025-04-28 12:14   ` Jan Kara
@ 2025-04-28 13:18     ` Tudor Ambarus
  2025-04-28 13:32       ` Tudor Ambarus
  2025-04-28 16:50     ` Joe Damato
  1 sibling, 1 reply; 11+ messages in thread
From: Tudor Ambarus @ 2025-04-28 13:18 UTC (permalink / raw)
  To: Jan Kara, Christian Brauner
  Cc: Joe Damato, linux-fsdevel, Alexander Viro, David S. Miller,
	Eric Dumazet, Sridhar Samudrala, Alexander Duyck, open list



On 4/28/25 1:14 PM, Jan Kara wrote:
> On Sat 26-04-25 14:29:15, Christian Brauner wrote:
>> On Wed, Apr 16, 2025 at 06:58:25PM +0000, Joe Damato wrote:
>>> Avoid an edge case where epoll_wait arms a timer and calls schedule()
>>> even if the timer will expire immediately.
>>>
>>> For example: if the user has specified an epoll busy poll usecs which is
>>> equal or larger than the epoll_wait/epoll_pwait2 timeout, it is
>>> unnecessary to call schedule_hrtimeout_range; the busy poll usecs have
>>> consumed the entire timeout duration so it is unnecessary to induce
>>> scheduling latency by calling schedule() (via schedule_hrtimeout_range).
>>>
>>> This can be measured using a simple bpftrace script:
>>>
>>> tracepoint:sched:sched_switch
>>> / args->prev_pid == $1 /
>>> {
>>>   print(kstack());
>>>   print(ustack());
>>> }
>>>
>>> Before this patch is applied:
>>>
>>>   Testing an epoll_wait app with busy poll usecs set to 1000, and
>>>   epoll_wait timeout set to 1ms using the script above shows:
>>>
>>>      __traceiter_sched_switch+69
>>>      __schedule+1495
>>>      schedule+32
>>>      schedule_hrtimeout_range+159
>>>      do_epoll_wait+1424
>>>      __x64_sys_epoll_wait+97
>>>      do_syscall_64+95
>>>      entry_SYSCALL_64_after_hwframe+118
>>>
>>>      epoll_wait+82
>>>
>>>   Which is unexpected; the busy poll usecs should have consumed the
>>>   entire timeout and there should be no reason to arm a timer.
>>>
>>> After this patch is applied: the same test scenario does not generate a
>>> call to schedule() in the above edge case. If the busy poll usecs are
>>> reduced (for example usecs: 100, epoll_wait timeout 1ms) the timer is
>>> armed as expected.
>>>
>>> Fixes: bf3b9f6372c4 ("epoll: Add busy poll support to epoll with socket fds.")
>>> Signed-off-by: Joe Damato <jdamato@fastly.com>
>>> Reviewed-by: Jan Kara <jack@suse.cz>
>>> ---
>>>  v2: 
>>>    - No longer an RFC and rebased on vfs/vfs.fixes
>>>    - Added Jan's Reviewed-by
>>>    - Added Fixes tag
>>>    - No functional changes from the RFC
>>>
>>>  rfcv1: https://lore.kernel.org/linux-fsdevel/20250415184346.39229-1-jdamato@fastly.com/
>>>
>>>  fs/eventpoll.c | 10 +++++++++-
>>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
>>> index 100376863a44..4bc264b854c4 100644
>>> --- a/fs/eventpoll.c
>>> +++ b/fs/eventpoll.c
>>> @@ -1996,6 +1996,14 @@ static int ep_try_send_events(struct eventpoll *ep,
>>>  	return res;
>>>  }
>>>  
>>> +static int ep_schedule_timeout(ktime_t *to)
>>> +{
>>> +	if (to)
>>> +		return ktime_after(*to, ktime_get());
>>> +	else
>>> +		return 1;
>>> +}
>>> +
>>>  /**
>>>   * ep_poll - Retrieves ready events, and delivers them to the caller-supplied
>>>   *           event buffer.
>>> @@ -2103,7 +2111,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
>>>  
>>>  		write_unlock_irq(&ep->lock);
>>>  
>>> -		if (!eavail)
>>> +		if (!eavail && ep_schedule_timeout(to))
>>>  			timed_out = !schedule_hrtimeout_range(to, slack,
>>>  							      HRTIMER_MODE_ABS);
>>
>> Isn't this buggy? If @to is non-NULL and ep_schedule_timeout() returns
>> false you want to set timed_out to 1 to break the wait. Otherwise you
>> hang, no?
> 
> Yep, looks like that. Good spotting!
> 

I second that. Also, isn't ep_schedule_timeout buggy too? It compares a
timeout value with the current time, that has to be reworked as well.

I see this patch already queued for stable/linux-6.14.y. What's the plan
with it?

Thanks,
ta

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH vfs/vfs.fixes v2] eventpoll: Set epoll timeout if it's in the future
  2025-04-28 13:18     ` Tudor Ambarus
@ 2025-04-28 13:32       ` Tudor Ambarus
  0 siblings, 0 replies; 11+ messages in thread
From: Tudor Ambarus @ 2025-04-28 13:32 UTC (permalink / raw)
  To: Jan Kara, Christian Brauner
  Cc: Joe Damato, linux-fsdevel, Alexander Viro, David S. Miller,
	Eric Dumazet, Sridhar Samudrala, Alexander Duyck, open list



On 4/28/25 2:18 PM, Tudor Ambarus wrote:
> isn't ep_schedule_timeout buggy too? It compares a
> timeout value with the current time, that has to be reworked as well.

Ah, I see the timeout is relative to ktime_get_ts64() in
SYSCALL_DEFINE6(epoll_pwait2, ...), please ignore.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH vfs/vfs.fixes v2] eventpoll: Set epoll timeout if it's in the future
  2025-04-28 12:14   ` Jan Kara
  2025-04-28 13:18     ` Tudor Ambarus
@ 2025-04-28 16:50     ` Joe Damato
  2025-04-28 22:32       ` Carlos Llamas
  1 sibling, 1 reply; 11+ messages in thread
From: Joe Damato @ 2025-04-28 16:50 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christian Brauner, linux-fsdevel, Alexander Viro, David S. Miller,
	Eric Dumazet, Sridhar Samudrala, Alexander Duyck, open list

On Mon, Apr 28, 2025 at 02:14:45PM +0200, Jan Kara wrote:
> On Sat 26-04-25 14:29:15, Christian Brauner wrote:
> > On Wed, Apr 16, 2025 at 06:58:25PM +0000, Joe Damato wrote:
> > > Avoid an edge case where epoll_wait arms a timer and calls schedule()
> > > even if the timer will expire immediately.
> > > 
> > > For example: if the user has specified an epoll busy poll usecs which is
> > > equal or larger than the epoll_wait/epoll_pwait2 timeout, it is
> > > unnecessary to call schedule_hrtimeout_range; the busy poll usecs have
> > > consumed the entire timeout duration so it is unnecessary to induce
> > > scheduling latency by calling schedule() (via schedule_hrtimeout_range).
> > > 
> > > This can be measured using a simple bpftrace script:
> > > 
> > > tracepoint:sched:sched_switch
> > > / args->prev_pid == $1 /
> > > {
> > >   print(kstack());
> > >   print(ustack());
> > > }
> > > 
> > > Before this patch is applied:
> > > 
> > >   Testing an epoll_wait app with busy poll usecs set to 1000, and
> > >   epoll_wait timeout set to 1ms using the script above shows:
> > > 
> > >      __traceiter_sched_switch+69
> > >      __schedule+1495
> > >      schedule+32
> > >      schedule_hrtimeout_range+159
> > >      do_epoll_wait+1424
> > >      __x64_sys_epoll_wait+97
> > >      do_syscall_64+95
> > >      entry_SYSCALL_64_after_hwframe+118
> > > 
> > >      epoll_wait+82
> > > 
> > >   Which is unexpected; the busy poll usecs should have consumed the
> > >   entire timeout and there should be no reason to arm a timer.
> > > 
> > > After this patch is applied: the same test scenario does not generate a
> > > call to schedule() in the above edge case. If the busy poll usecs are
> > > reduced (for example usecs: 100, epoll_wait timeout 1ms) the timer is
> > > armed as expected.
> > > 
> > > Fixes: bf3b9f6372c4 ("epoll: Add busy poll support to epoll with socket fds.")
> > > Signed-off-by: Joe Damato <jdamato@fastly.com>
> > > Reviewed-by: Jan Kara <jack@suse.cz>
> > > ---
> > >  v2: 
> > >    - No longer an RFC and rebased on vfs/vfs.fixes
> > >    - Added Jan's Reviewed-by
> > >    - Added Fixes tag
> > >    - No functional changes from the RFC
> > > 
> > >  rfcv1: https://lore.kernel.org/linux-fsdevel/20250415184346.39229-1-jdamato@fastly.com/
> > > 
> > >  fs/eventpoll.c | 10 +++++++++-
> > >  1 file changed, 9 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> > > index 100376863a44..4bc264b854c4 100644
> > > --- a/fs/eventpoll.c
> > > +++ b/fs/eventpoll.c
> > > @@ -1996,6 +1996,14 @@ static int ep_try_send_events(struct eventpoll *ep,
> > >  	return res;
> > >  }
> > >  
> > > +static int ep_schedule_timeout(ktime_t *to)
> > > +{
> > > +	if (to)
> > > +		return ktime_after(*to, ktime_get());
> > > +	else
> > > +		return 1;
> > > +}
> > > +
> > >  /**
> > >   * ep_poll - Retrieves ready events, and delivers them to the caller-supplied
> > >   *           event buffer.
> > > @@ -2103,7 +2111,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
> > >  
> > >  		write_unlock_irq(&ep->lock);
> > >  
> > > -		if (!eavail)
> > > +		if (!eavail && ep_schedule_timeout(to))
> > >  			timed_out = !schedule_hrtimeout_range(to, slack,
> > >  							      HRTIMER_MODE_ABS);
> > 
> > Isn't this buggy? If @to is non-NULL and ep_schedule_timeout() returns
> > false you want to set timed_out to 1 to break the wait. Otherwise you
> > hang, no?
> 
> Yep, looks like that. Good spotting!

Thank you for spotting that and sorry for the trouble.

Christian / Jan what would be the correct way for me to deal with
this? Would it be to post a v3 (re-submitting the patch in its
entirety) or to post a new patch that fixes the original and lists
the commit sha from vfs.fixes with a Fixes tag ?

I was going to propose the following, which if I understand
correctly, should fix the issue Christian identified:

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 4bc264b854c4..1a5d1147f082 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -2111,7 +2111,9 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,

                write_unlock_irq(&ep->lock);

-               if (!eavail && ep_schedule_timeout(to))
+               if (!ep_schedule_timeout(to))
+                       timed_out = 1;
+               else if (!eavail)
                        timed_out = !schedule_hrtimeout_range(to, slack,
                                                              HRTIMER_MODE_ABS);
                __set_current_state(TASK_RUNNING);

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH vfs/vfs.fixes v2] eventpoll: Set epoll timeout if it's in the future
  2025-04-28 16:50     ` Joe Damato
@ 2025-04-28 22:32       ` Carlos Llamas
  2025-04-28 22:41         ` Joe Damato
  0 siblings, 1 reply; 11+ messages in thread
From: Carlos Llamas @ 2025-04-28 22:32 UTC (permalink / raw)
  To: Joe Damato
  Cc: Jan Kara, Christian Brauner, linux-fsdevel, Alexander Viro,
	David S. Miller, Eric Dumazet, Sridhar Samudrala, Alexander Duyck,
	open list, Tudor Ambarus, William McVicker

On Mon, Apr 28, 2025 at 09:50:02AM -0700, Joe Damato wrote:
> Thank you for spotting that and sorry for the trouble.

This was also flagged by our Android's epoll_pwait2 tests here:
https://android.googlesource.com/platform/bionic/+/refs/heads/main/tests/sys_epoll_test.cpp
They would all timeout, so the hang reported by Christian fits.


> Christian / Jan what would be the correct way for me to deal with
> this? Would it be to post a v3 (re-submitting the patch in its
> entirety) or to post a new patch that fixes the original and lists
> the commit sha from vfs.fixes with a Fixes tag ?

The original commit has landed in mainline already, so it needs to be
new patch at this point. If if helps, here is the tag:
Fixes: 0a65bc27bd64 ("eventpoll: Set epoll timeout if it's in the future")

> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index 4bc264b854c4..1a5d1147f082 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -2111,7 +2111,9 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
> 
>                 write_unlock_irq(&ep->lock);
> 
> -               if (!eavail && ep_schedule_timeout(to))
> +               if (!ep_schedule_timeout(to))
> +                       timed_out = 1;
> +               else if (!eavail)
>                         timed_out = !schedule_hrtimeout_range(to, slack,
>                                                               HRTIMER_MODE_ABS);
>                 __set_current_state(TASK_RUNNING);

I've ran your change through our internal CI and I confirm it fixes the
hangs seen on our end. If you send the fix feel free to add:

Tested-by: Carlos Llamas <cmllamas@google.com>

Cheers,
Carlos Llamas

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH vfs/vfs.fixes v2] eventpoll: Set epoll timeout if it's in the future
  2025-04-28 22:32       ` Carlos Llamas
@ 2025-04-28 22:41         ` Joe Damato
  2025-04-29 10:19           ` Jan Kara
  0 siblings, 1 reply; 11+ messages in thread
From: Joe Damato @ 2025-04-28 22:41 UTC (permalink / raw)
  To: Carlos Llamas
  Cc: Jan Kara, Christian Brauner, linux-fsdevel, Alexander Viro,
	David S. Miller, Eric Dumazet, Sridhar Samudrala, Alexander Duyck,
	open list, Tudor Ambarus, William McVicker

On Mon, Apr 28, 2025 at 10:32:31PM +0000, Carlos Llamas wrote:
> On Mon, Apr 28, 2025 at 09:50:02AM -0700, Joe Damato wrote:
> > Thank you for spotting that and sorry for the trouble.
> 
> This was also flagged by our Android's epoll_pwait2 tests here:
> https://android.googlesource.com/platform/bionic/+/refs/heads/main/tests/sys_epoll_test.cpp
> They would all timeout, so the hang reported by Christian fits.
> 
> 
> > Christian / Jan what would be the correct way for me to deal with
> > this? Would it be to post a v3 (re-submitting the patch in its
> > entirety) or to post a new patch that fixes the original and lists
> > the commit sha from vfs.fixes with a Fixes tag ?
> 
> The original commit has landed in mainline already, so it needs to be
> new patch at this point. If if helps, here is the tag:
> Fixes: 0a65bc27bd64 ("eventpoll: Set epoll timeout if it's in the future")
> 
> > diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> > index 4bc264b854c4..1a5d1147f082 100644
> > --- a/fs/eventpoll.c
> > +++ b/fs/eventpoll.c
> > @@ -2111,7 +2111,9 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
> > 
> >                 write_unlock_irq(&ep->lock);
> > 
> > -               if (!eavail && ep_schedule_timeout(to))
> > +               if (!ep_schedule_timeout(to))
> > +                       timed_out = 1;
> > +               else if (!eavail)
> >                         timed_out = !schedule_hrtimeout_range(to, slack,
> >                                                               HRTIMER_MODE_ABS);
> >                 __set_current_state(TASK_RUNNING);
> 
> I've ran your change through our internal CI and I confirm it fixes the
> hangs seen on our end. If you send the fix feel free to add:
> 
> Tested-by: Carlos Llamas <cmllamas@google.com>

Thanks, will do.

I was waiting to hear back from Christian / Jan if they are OK with
the proposed fix before submitting something, but glad to hear it
fixes the issue for you. Sorry for the trouble.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH vfs/vfs.fixes v2] eventpoll: Set epoll timeout if it's in the future
  2025-04-28 22:41         ` Joe Damato
@ 2025-04-29 10:19           ` Jan Kara
  2025-04-29 11:08             ` Christian Brauner
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Kara @ 2025-04-29 10:19 UTC (permalink / raw)
  To: Joe Damato
  Cc: Carlos Llamas, Jan Kara, Christian Brauner, linux-fsdevel,
	Alexander Viro, David S. Miller, Eric Dumazet, Sridhar Samudrala,
	Alexander Duyck, open list, Tudor Ambarus, William McVicker

On Mon 28-04-25 15:41:17, Joe Damato wrote:
> On Mon, Apr 28, 2025 at 10:32:31PM +0000, Carlos Llamas wrote:
> > On Mon, Apr 28, 2025 at 09:50:02AM -0700, Joe Damato wrote:
> > > Thank you for spotting that and sorry for the trouble.
> > 
> > This was also flagged by our Android's epoll_pwait2 tests here:
> > https://android.googlesource.com/platform/bionic/+/refs/heads/main/tests/sys_epoll_test.cpp
> > They would all timeout, so the hang reported by Christian fits.
> > 
> > 
> > > Christian / Jan what would be the correct way for me to deal with
> > > this? Would it be to post a v3 (re-submitting the patch in its
> > > entirety) or to post a new patch that fixes the original and lists
> > > the commit sha from vfs.fixes with a Fixes tag ?
> > 
> > The original commit has landed in mainline already, so it needs to be
> > new patch at this point. If if helps, here is the tag:
> > Fixes: 0a65bc27bd64 ("eventpoll: Set epoll timeout if it's in the future")
> > 
> > > diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> > > index 4bc264b854c4..1a5d1147f082 100644
> > > --- a/fs/eventpoll.c
> > > +++ b/fs/eventpoll.c
> > > @@ -2111,7 +2111,9 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
> > > 
> > >                 write_unlock_irq(&ep->lock);
> > > 
> > > -               if (!eavail && ep_schedule_timeout(to))
> > > +               if (!ep_schedule_timeout(to))
> > > +                       timed_out = 1;
> > > +               else if (!eavail)
> > >                         timed_out = !schedule_hrtimeout_range(to, slack,
> > >                                                               HRTIMER_MODE_ABS);
> > >                 __set_current_state(TASK_RUNNING);
> > 
> > I've ran your change through our internal CI and I confirm it fixes the
> > hangs seen on our end. If you send the fix feel free to add:
> > 
> > Tested-by: Carlos Llamas <cmllamas@google.com>
> 
> Thanks, will do.
> 
> I was waiting to hear back from Christian / Jan if they are OK with
> the proposed fix before submitting something, but glad to hear it
> fixes the issue for you. Sorry for the trouble.

Yep, a new patch submission with proper Fixes tag is needed at this point.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH vfs/vfs.fixes v2] eventpoll: Set epoll timeout if it's in the future
  2025-04-29 10:19           ` Jan Kara
@ 2025-04-29 11:08             ` Christian Brauner
  0 siblings, 0 replies; 11+ messages in thread
From: Christian Brauner @ 2025-04-29 11:08 UTC (permalink / raw)
  To: Jan Kara
  Cc: Joe Damato, Carlos Llamas, linux-fsdevel, Alexander Viro,
	David S. Miller, Eric Dumazet, Sridhar Samudrala, Alexander Duyck,
	open list, Tudor Ambarus, William McVicker

On Tue, Apr 29, 2025 at 12:19:15PM +0200, Jan Kara wrote:
> On Mon 28-04-25 15:41:17, Joe Damato wrote:
> > On Mon, Apr 28, 2025 at 10:32:31PM +0000, Carlos Llamas wrote:
> > > On Mon, Apr 28, 2025 at 09:50:02AM -0700, Joe Damato wrote:
> > > > Thank you for spotting that and sorry for the trouble.
> > > 
> > > This was also flagged by our Android's epoll_pwait2 tests here:
> > > https://android.googlesource.com/platform/bionic/+/refs/heads/main/tests/sys_epoll_test.cpp
> > > They would all timeout, so the hang reported by Christian fits.
> > > 
> > > 
> > > > Christian / Jan what would be the correct way for me to deal with
> > > > this? Would it be to post a v3 (re-submitting the patch in its
> > > > entirety) or to post a new patch that fixes the original and lists
> > > > the commit sha from vfs.fixes with a Fixes tag ?
> > > 
> > > The original commit has landed in mainline already, so it needs to be
> > > new patch at this point. If if helps, here is the tag:
> > > Fixes: 0a65bc27bd64 ("eventpoll: Set epoll timeout if it's in the future")
> > > 
> > > > diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> > > > index 4bc264b854c4..1a5d1147f082 100644
> > > > --- a/fs/eventpoll.c
> > > > +++ b/fs/eventpoll.c
> > > > @@ -2111,7 +2111,9 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
> > > > 
> > > >                 write_unlock_irq(&ep->lock);
> > > > 
> > > > -               if (!eavail && ep_schedule_timeout(to))
> > > > +               if (!ep_schedule_timeout(to))
> > > > +                       timed_out = 1;
> > > > +               else if (!eavail)
> > > >                         timed_out = !schedule_hrtimeout_range(to, slack,
> > > >                                                               HRTIMER_MODE_ABS);
> > > >                 __set_current_state(TASK_RUNNING);
> > > 
> > > I've ran your change through our internal CI and I confirm it fixes the
> > > hangs seen on our end. If you send the fix feel free to add:
> > > 
> > > Tested-by: Carlos Llamas <cmllamas@google.com>
> > 
> > Thanks, will do.
> > 
> > I was waiting to hear back from Christian / Jan if they are OK with
> > the proposed fix before submitting something, but glad to hear it
> > fixes the issue for you. Sorry for the trouble.
> 
> Yep, a new patch submission with proper Fixes tag is needed at this point.

Yes, please send a new fixes patch that I can pick up!

Thanks!
Christian

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-04-29 11:08 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-16 18:58 [PATCH vfs/vfs.fixes v2] eventpoll: Set epoll timeout if it's in the future Joe Damato
2025-04-17  7:56 ` Christian Brauner
2025-04-26 12:29 ` Christian Brauner
2025-04-28 12:14   ` Jan Kara
2025-04-28 13:18     ` Tudor Ambarus
2025-04-28 13:32       ` Tudor Ambarus
2025-04-28 16:50     ` Joe Damato
2025-04-28 22:32       ` Carlos Llamas
2025-04-28 22:41         ` Joe Damato
2025-04-29 10:19           ` Jan Kara
2025-04-29 11:08             ` Christian Brauner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).