public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* lots of calls to __write/read_lock_failed
@ 2003-01-16  4:18 Dave Hansen
  2003-01-16  4:46 ` William Lee Irwin III
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Hansen @ 2003-01-16  4:18 UTC (permalink / raw)
  To: linux-kernel

Running a webserver benchmark on 2.5.57, I saw some things in the 
profiles that I wasn't familliar with.  Here's readprofile | sort | tail:

    717 handle_IRQ_event                           7.7935
    732 do_gettimeofday                            5.4627
    755 __wake_up                                 10.7857
   1089 del_timer                                 11.8370
   1112 add_timer                                  7.8310
   1500 system_call                               34.0909
   1847 schedule                                   2.4759
   1873 kmap_atomic                               12.8288
267475 __write_lock_failed                     8358.5938
379837 __read_lock_failed                     18991.8500
1044613 poll_idle                             18010.5690
1704762 total                                    10.9631

I added a section at the ends of _raw_write_lock and _raw_read_lock 
that look like this:

{
         static int count = 0;
         if( ++count%50000 == 0 ) {
                 printk("%s:%s() %d\n", __stringify(KBUILD_BASENAME),
				__FUNCTION__, count );
                 dump_stack();
         }
}


  file_table:_raw_read_lock() 3300000
  Call Trace:
   [<c0152469>] fget+0x9d/0xa0
   [<c0152b27>] sys_fsync+0x21/0xbe
   [<c0151b53>] sys_writev+0x47/0x56
   [<c010931f>] syscall_call+0x7/0xb

filemap:_raw_read_lock() 1450000
  Call Trace:
   [<c0136937>] do_generic_mapping_read+0x411/0x43e
   [<c0136d98>] file_send_actor+0x0/0x74
   [<c0136e74>] generic_file_sendfile+0x68/0x76
   [<c0136d98>] file_send_actor+0x0/0x74
   [<c0151d48>] do_sendfile+0x1e6/0x28a
   [<c0136d98>] file_send_actor+0x0/0x74
   [<c0151e50>] sys_sendfile+0x64/0xcc
   [<c010931f>] syscall_call+0x7/0xb

  ip_output:_raw_read_lock() 2000000
  Call Trace:
   [<c02c90b2>] ip_finish_output2+0x154/0x226
   [<c02c7466>] ip_queue_xmit+0x3dc/0x4ce
   [<c011c71a>] default_wake_function+0x32/0x3e
   [<c011c75e>] __wake_up_common+0x38/0x58
   [<c02ddf24>] tcp_v4_send_check+0x54/0xe2
   [<c02d81b6>] tcp_transmit_skb+0x2be/0x448
   [<c02d54ca>] tcp_data_queue+0x23a/0x830
   [<c02da693>] tcp_send_ack+0x81/0xb2
   [<c02d68d1>] tcp_rcv_established+0x249/0x70e
   [<c02defd1>] tcp_v4_do_rcv+0x12d/0x132
   [<c02df452>] tcp_v4_rcv+0x47c/0x50c
   [<c02c4363>] ip_local_deliver_finish+0x9f/0x19e
   [<c02c4674>] ip_rcv_finish+0x212/0x29f
   [<c02b410e>] netif_receive_skb+0xc2/0x17c
   [<c02b4245>] process_backlog+0x7d/0x10c
   [<c02b4395>] net_rx_action+0xc1/0x178
   [<c01248e7>] do_softirq+0xb7/0xba
   [<c010b390>] do_IRQ+0xec/0xf8
   [<c0106eca>] default_idle+0x0/0x2e

time:_raw_write_lock() 1350000
Call Trace:
  [<c010f321>] timer_interrupt+0x99/0x9c
  [<c010b150>] handle_IRQ_event+0x38/0x5c
  [<c010b330>] do_IRQ+0x8c/0xf8
  [<c0106eca>] default_idle+0x0/0x2e
  [<c0106eca>] default_idle+0x0/0x2e
  [<c0109c8c>] common_interrupt+0x18/0x20
  [<c0106eca>] default_idle+0x0/0x2e
  [<c0106eca>] default_idle+0x0/0x2e
  [<c0106ef4>] default_idle+0x2a/0x2e
  [<c0106f6b>] cpu_idle+0x39/0x42
  [<c01212a5>] printk+0x15d/0x190

-- 
Dave Hansen
haveblue@us.ibm.com


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: lots of calls to __write/read_lock_failed
  2003-01-16  4:18 lots of calls to __write/read_lock_failed Dave Hansen
@ 2003-01-16  4:46 ` William Lee Irwin III
  2003-01-16  6:59   ` Dipankar Sarma
  0 siblings, 1 reply; 6+ messages in thread
From: William Lee Irwin III @ 2003-01-16  4:46 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel

On Wed, Jan 15, 2003 at 08:18:13PM -0800, Dave Hansen wrote:
>  file_table:_raw_read_lock() 3300000
>  Call Trace:
>   [<c0152469>] fget+0x9d/0xa0
>   [<c0152b27>] sys_fsync+0x21/0xbe
>   [<c0151b53>] sys_writev+0x47/0x56
>   [<c010931f>] syscall_call+0x7/0xb

read_lock(&file->files_lock);


On Wed, Jan 15, 2003 at 08:18:13PM -0800, Dave Hansen wrote:
> filemap:_raw_read_lock() 1450000
>  Call Trace:
>   [<c0136937>] do_generic_mapping_read+0x411/0x43e
>   [<c0136d98>] file_send_actor+0x0/0x74
>   [<c0136e74>] generic_file_sendfile+0x68/0x76
>   [<c0136d98>] file_send_actor+0x0/0x74
>   [<c0151d48>] do_sendfile+0x1e6/0x28a
>   [<c0136d98>] file_send_actor+0x0/0x74
>   [<c0151e50>] sys_sendfile+0x64/0xcc
>   [<c010931f>] syscall_call+0x7/0xb

read_lock(&mapping->page_lock);

On Wed, Jan 15, 2003 at 08:18:13PM -0800, Dave Hansen wrote:
>  ip_output:_raw_read_lock() 2000000
>  Call Trace:
>   [<c02c90b2>] ip_finish_output2+0x154/0x226
>   [<c02c7466>] ip_queue_xmit+0x3dc/0x4ce
>   [<c011c71a>] default_wake_function+0x32/0x3e
>   [<c011c75e>] __wake_up_common+0x38/0x58
>   [<c02ddf24>] tcp_v4_send_check+0x54/0xe2
>   [<c02d81b6>] tcp_transmit_skb+0x2be/0x448
>   [<c02d54ca>] tcp_data_queue+0x23a/0x830
>   [<c02da693>] tcp_send_ack+0x81/0xb2
>   [<c02d68d1>] tcp_rcv_established+0x249/0x70e
>   [<c02defd1>] tcp_v4_do_rcv+0x12d/0x132
>   [<c02df452>] tcp_v4_rcv+0x47c/0x50c
>   [<c02c4363>] ip_local_deliver_finish+0x9f/0x19e
>   [<c02c4674>] ip_rcv_finish+0x212/0x29f
>   [<c02b410e>] netif_receive_skb+0xc2/0x17c
>   [<c02b4245>] process_backlog+0x7d/0x10c
>   [<c02b4395>] net_rx_action+0xc1/0x178
>   [<c01248e7>] do_softirq+0xb7/0xba
>   [<c010b390>] do_IRQ+0xec/0xf8
>   [<c0106eca>] default_idle+0x0/0x2e

read_lock_bh(&hh->hh_lock);

On Wed, Jan 15, 2003 at 08:18:13PM -0800, Dave Hansen wrote:
> time:_raw_write_lock() 1350000
> Call Trace:
>  [<c010f321>] timer_interrupt+0x99/0x9c
>  [<c010b150>] handle_IRQ_event+0x38/0x5c
>  [<c010b330>] do_IRQ+0x8c/0xf8
>  [<c0106eca>] default_idle+0x0/0x2e
>  [<c0106eca>] default_idle+0x0/0x2e
>  [<c0109c8c>] common_interrupt+0x18/0x20
>  [<c0106eca>] default_idle+0x0/0x2e
>  [<c0106eca>] default_idle+0x0/0x2e
>  [<c0106ef4>] default_idle+0x2a/0x2e
>  [<c0106f6b>] cpu_idle+0x39/0x42
>  [<c01212a5>] printk+0x15d/0x190

read_lock_irqsave(&xtime_lock, flags)
or
write_lock_irq(&xtime_lock);


Bill

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: lots of calls to __write/read_lock_failed
  2003-01-16  4:46 ` William Lee Irwin III
@ 2003-01-16  6:59   ` Dipankar Sarma
  2003-01-16  7:13     ` Martin J. Bligh
  2003-01-17 16:17     ` Andrea Arcangeli
  0 siblings, 2 replies; 6+ messages in thread
From: Dipankar Sarma @ 2003-01-16  6:59 UTC (permalink / raw)
  To: William Lee Irwin III, Dave Hansen, linux-kernel

On Thu, Jan 16, 2003 at 04:47:30AM +0000, William Lee Irwin III wrote:
> On Wed, Jan 15, 2003 at 08:18:13PM -0800, Dave Hansen wrote:
> >  file_table:_raw_read_lock() 3300000
> >  Call Trace:
> >   [<c0152469>] fget+0x9d/0xa0
> >   [<c0152b27>] sys_fsync+0x21/0xbe
> >   [<c0151b53>] sys_writev+0x47/0x56
> >   [<c010931f>] syscall_call+0x7/0xb
> 
> read_lock(&file->files_lock);

You mean read_lock(&files->file_lock); :)

Dave, does your webserver benchmark clone() tasks with CLONE_FILES ? Unless
the fd table is shared, can't see why there would be contention on this.
If it is indeed necessary to share fd table, then there is a somewhat
unmaintained lockfree fget() patch (files_struct_rcu) that you might want
to try.

> 
> On Wed, Jan 15, 2003 at 08:18:13PM -0800, Dave Hansen wrote:
> > time:_raw_write_lock() 1350000
> > Call Trace:
> >  [<c010f321>] timer_interrupt+0x99/0x9c
> >  [<c010b150>] handle_IRQ_event+0x38/0x5c
> 
> read_lock_irqsave(&xtime_lock, flags)
> or
> write_lock_irq(&xtime_lock);

ISTR a patch from Stephen Hemminger at OSDL that used Andrea's
sequence number trick based rwlock (frlock) to implement do_gettimeofday.
It might be relevant here.

Thanks
Dipankar

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: lots of calls to __write/read_lock_failed
  2003-01-16  6:59   ` Dipankar Sarma
@ 2003-01-16  7:13     ` Martin J. Bligh
  2003-01-17 16:17     ` Andrea Arcangeli
  1 sibling, 0 replies; 6+ messages in thread
From: Martin J. Bligh @ 2003-01-16  7:13 UTC (permalink / raw)
  To: dipankar, William Lee Irwin III, Dave Hansen, linux-kernel

>> On Wed, Jan 15, 2003 at 08:18:13PM -0800, Dave Hansen wrote:
>> > time:_raw_write_lock() 1350000
>> > Call Trace:
>> >  [<c010f321>] timer_interrupt+0x99/0x9c
>> >  [<c010b150>] handle_IRQ_event+0x38/0x5c
>> 
>> read_lock_irqsave(&xtime_lock, flags)
>> or
>> write_lock_irq(&xtime_lock);
> 
> ISTR a patch from Stephen Hemminger at OSDL that used Andrea's
> sequence number trick based rwlock (frlock) to implement do_gettimeofday.
> It might be relevant here.

I thought that was Andi. If it's the patch I'm thinking of, it's sitting
in the -mm tree.

M.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: lots of calls to __write/read_lock_failed
  2003-01-16  6:59   ` Dipankar Sarma
  2003-01-16  7:13     ` Martin J. Bligh
@ 2003-01-17 16:17     ` Andrea Arcangeli
  2003-01-17 16:24       ` Stephen Hemminger
  1 sibling, 1 reply; 6+ messages in thread
From: Andrea Arcangeli @ 2003-01-17 16:17 UTC (permalink / raw)
  To: Dipankar Sarma
  Cc: William Lee Irwin III, Dave Hansen, linux-kernel,
	Stephen Hemminger, Andreas Schwab

Hello,

On Thu, Jan 16, 2003 at 12:29:41PM +0530, Dipankar Sarma wrote:
> On Thu, Jan 16, 2003 at 04:47:30AM +0000, William Lee Irwin III wrote:
> > On Wed, Jan 15, 2003 at 08:18:13PM -0800, Dave Hansen wrote:
> > >  file_table:_raw_read_lock() 3300000
> > >  Call Trace:
> > >   [<c0152469>] fget+0x9d/0xa0
> > >   [<c0152b27>] sys_fsync+0x21/0xbe
> > >   [<c0151b53>] sys_writev+0x47/0x56
> > >   [<c010931f>] syscall_call+0x7/0xb
> > 
> > read_lock(&file->files_lock);
> 
> You mean read_lock(&files->file_lock); :)
> 
> Dave, does your webserver benchmark clone() tasks with CLONE_FILES ? Unless
> the fd table is shared, can't see why there would be contention on this.
> If it is indeed necessary to share fd table, then there is a somewhat
> unmaintained lockfree fget() patch (files_struct_rcu) that you might want
> to try.
> 
> > 
> > On Wed, Jan 15, 2003 at 08:18:13PM -0800, Dave Hansen wrote:
> > > time:_raw_write_lock() 1350000
> > > Call Trace:
> > >  [<c010f321>] timer_interrupt+0x99/0x9c
> > >  [<c010b150>] handle_IRQ_event+0x38/0x5c
> > 
> > read_lock_irqsave(&xtime_lock, flags)
> > or
> > write_lock_irq(&xtime_lock);
> 
> ISTR a patch from Stephen Hemminger at OSDL that used Andrea's
> sequence number trick based rwlock (frlock) to implement do_gettimeofday.

I'm merging a version of Stephen's patch right now (the starvation of
the read locks that we rely when we don't clear irqs in read_locks that
can run from irqs too seems to hurt too much on some hardware/workload
combination to a point that it even lose ticks with the irq stuck, and
the frlock is the most efficient and scalable possible locking design
for gettimeofday and it will solve the starvation too, very good patch
Stephen).

While merging it I found a problem, not sure if it helps for your crash
but while checking it I found a quite fatal bug here:

+static inline unsigned fr_read_begin(frlock_t *rw)
+{
+       rmb();
+       return rw->post_sequence;
+
+}

the rmb() must be placed after (not before) reading the post_sequence.
The above bug could trigger on x86 too because it should even allow the
compiler to reorder stuff and the x86 can read speculative even if the
compiler doesn't reorder. This at the very least can explain screwed
timing results with such patch applied.

Andrea

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: lots of calls to __write/read_lock_failed
  2003-01-17 16:17     ` Andrea Arcangeli
@ 2003-01-17 16:24       ` Stephen Hemminger
  0 siblings, 0 replies; 6+ messages in thread
From: Stephen Hemminger @ 2003-01-17 16:24 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Dipankar Sarma, William Lee Irwin III, Dave Hansen,
	Linux Kernel Mailing List, Andreas Schwab

Your analysis looks right, also, look at the ordering of the updates of
pre/post.  They should match the vsyscall version on x86.

On Fri, 2003-01-17 at 08:17, Andrea Arcangeli wrote:
> Hello,
> 
> On Thu, Jan 16, 2003 at 12:29:41PM +0530, Dipankar Sarma wrote:
> > On Thu, Jan 16, 2003 at 04:47:30AM +0000, William Lee Irwin III wrote:
> > > On Wed, Jan 15, 2003 at 08:18:13PM -0800, Dave Hansen wrote:
> > > >  file_table:_raw_read_lock() 3300000
> > > >  Call Trace:
> > > >   [<c0152469>] fget+0x9d/0xa0
> > > >   [<c0152b27>] sys_fsync+0x21/0xbe
> > > >   [<c0151b53>] sys_writev+0x47/0x56
> > > >   [<c010931f>] syscall_call+0x7/0xb
> > > 
> > > read_lock(&file->files_lock);
> > 
> > You mean read_lock(&files->file_lock); :)
> > 
> > Dave, does your webserver benchmark clone() tasks with CLONE_FILES ? Unless
> > the fd table is shared, can't see why there would be contention on this.
> > If it is indeed necessary to share fd table, then there is a somewhat
> > unmaintained lockfree fget() patch (files_struct_rcu) that you might want
> > to try.
> > 
> > > 
> > > On Wed, Jan 15, 2003 at 08:18:13PM -0800, Dave Hansen wrote:
> > > > time:_raw_write_lock() 1350000
> > > > Call Trace:
> > > >  [<c010f321>] timer_interrupt+0x99/0x9c
> > > >  [<c010b150>] handle_IRQ_event+0x38/0x5c
> > > 
> > > read_lock_irqsave(&xtime_lock, flags)
> > > or
> > > write_lock_irq(&xtime_lock);
> > 
> > ISTR a patch from Stephen Hemminger at OSDL that used Andrea's
> > sequence number trick based rwlock (frlock) to implement do_gettimeofday.
> 
> I'm merging a version of Stephen's patch right now (the starvation of
> the read locks that we rely when we don't clear irqs in read_locks that
> can run from irqs too seems to hurt too much on some hardware/workload
> combination to a point that it even lose ticks with the irq stuck, and
> the frlock is the most efficient and scalable possible locking design
> for gettimeofday and it will solve the starvation too, very good patch
> Stephen).
> 
> While merging it I found a problem, not sure if it helps for your crash
> but while checking it I found a quite fatal bug here:
> 
> +static inline unsigned fr_read_begin(frlock_t *rw)
> +{
> +       rmb();
> +       return rw->post_sequence;
> +
> +}
> 
> the rmb() must be placed after (not before) reading the post_sequence.
> The above bug could trigger on x86 too because it should even allow the
> compiler to reorder stuff and the x86 can read speculative even if the
> compiler doesn't reorder. This at the very least can explain screwed
> timing results with such patch applied.
> 
> Andrea
-- 
Stephen Hemminger <shemminger@osdl.org>
Open Source Devlopment Lab


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2003-01-17 16:15 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-01-16  4:18 lots of calls to __write/read_lock_failed Dave Hansen
2003-01-16  4:46 ` William Lee Irwin III
2003-01-16  6:59   ` Dipankar Sarma
2003-01-16  7:13     ` Martin J. Bligh
2003-01-17 16:17     ` Andrea Arcangeli
2003-01-17 16:24       ` Stephen Hemminger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox