linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/10] V3: rwsem changes + down_read_critical() proposal
@ 2010-05-17 22:25 Michel Lespinasse
  2010-05-17 22:25 ` [PATCH 01/10] x86 rwsem: minor cleanups Michel Lespinasse
                   ` (18 more replies)
  0 siblings, 19 replies; 36+ messages in thread
From: Michel Lespinasse @ 2010-05-17 22:25 UTC (permalink / raw)
  To: Linus Torvalds, David Howells, Ingo Molnar, Thomas Gleixner
  Cc: LKML, Andrew Morton, Mike Waychison, Suleiman Souhlal, Ying Han,
	Michel Lespinasse

This is version 3 of my rwsem changes. Patches 7 and 10 were modified to
address Linus's comments about the API. Please consider for merging.

Changes since V2:

- Rebased to 2.6.34

- Changed patch 07 to address Linus's comments about the API.
  down_read_critical() and up_read_critical() now work as a pair; threads
  using this are allowed to skip over blocked threads when acquiring the
  read lock; however they must make sure to quickly release that lock and
  in particular, they are forbidden to block.

- Changed patch 10 to make use of the down_read_critical()/up_read_critical()
  API when accessing /proc/<pid>/exe and /proc/<pid>/maps files.
  I excluded smaps and numa_maps files, which can actually block while
  being generated (smaps blocks in smaps_pte_range() doing a cond_resched(),
  which seems legitimate as it's a potentially heavy operation. numa_maps
  blocks in show_numa_map() doing a bzalloc of struct numa_maps, which
  should probably get done in do_maps_open() instead).

The motivation for this change was some cluster monitoring software we
use at google; which reads /proc/<pid>/maps files for all running
processes. When the machines are under load, the mmap_sem is often
acquire for reads for long periods of time since do_page_fault() holds
it while doing disk accesses; and fair queueing behavior often ends up
in the monitoring software making little progress. By introducing
unfair behavior in a few selected places, are are able to let the
monitoring software make progress without impacting performance for
the rest of the system. I've made sure not to change the rwsem fast
paths in implementing this proposal.

Michel Lespinasse (10):
  x86 rwsem: minor cleanups
  rwsem: fully separate code pathes to wake writers vs readers
  rwsem: lighter active count checks when waking up readers
  rwsem: let RWSEM_WAITING_BIAS represent any number of waiting threads
  rwsem: wake queued readers when writer blocks on active read lock
  rwsem: smaller wrappers around rwsem_down_failed_common
  generic rwsem: implement down_read_critical() / up_read_critical()
  rwsem: down_read_critical infrastructure support
  x86 rwsem: down_read_critical implementation
  Use down_read_critical() for /sys/<pid>/exe and /sys/<pid>/maps files

 arch/x86/include/asm/rwsem.h   |   70 ++++++++++++-----
 arch/x86/lib/rwsem_64.S        |   14 +++-
 arch/x86/lib/semaphore_32.S    |   21 +++++-
 fs/proc/base.c                 |    4 +-
 fs/proc/task_mmu.c             |   24 ++++--
 include/linux/proc_fs.h        |    1 +
 include/linux/rwsem-spinlock.h |   10 ++-
 include/linux/rwsem.h          |   12 +++
 kernel/rwsem.c                 |   35 +++++++++
 lib/rwsem-spinlock.c           |   10 ++-
 lib/rwsem.c                    |  160 ++++++++++++++++++++++++++--------------
 11 files changed, 266 insertions(+), 95 deletions(-)

^ permalink raw reply	[flat|nested] 36+ messages in thread
* [PATCH 00/10] V2: rwsem changes + down_read_unfair() proposal
@ 2010-05-14 12:39 Michel Lespinasse
  2010-05-14 12:39 ` [PATCH 04/10] rwsem: let RWSEM_WAITING_BIAS represent any number of waiting threads Michel Lespinasse
  0 siblings, 1 reply; 36+ messages in thread
From: Michel Lespinasse @ 2010-05-14 12:39 UTC (permalink / raw)
  To: Linus Torvalds, David Howells, Ingo Molnar, Thomas Gleixner
  Cc: LKML, Andrew Morton, Mike Waychison, Suleiman Souhlal, Ying Han,
	Michel Lespinasse

I would like to sollicit comments regarding the following changes
against 2.6.34-rc7 + 91af708 (from V1 proposal) already applied.

The motivation for this change was some cluster monitoring software we
use at google; which reads /proc/<pid>/maps files for all running
processes. When the machines are under load, the mmap_sem is often
acquire for reads for long periods of time since do_page_fault() holds
it while doing disk accesses; and fair queueing behavior often ends up
in the monitoring software making little progress. By introducing
unfair behavior in a few selected places, are are able to let the
monitoring software make progress without impacting performance for
the rest of the system.

In general, I've made sure to implement this proposal without touching
the rwsem fast paths. Also, the first 8 patches of this series should
be of general applicability even if not taking the down_read_unfair()
changes, addressing minor issues such as situations where reader
threads can get blocked at the head of the waiting list even though
the rwsem is currently owned for reads.

Changes since v1:
- Keep the active count check when trying to wake readers in the up_xxxx()
  slow path (I had suppressed it in v1). However, I did try to lighten the
  check (this is patch 3 of the series).
- Added priviledge check before making use of unfair behavior in
  /proc/<pid>/exe and /proc/<pid>/maps files.
- Applied David Howell's many small suggestions (I hope I did not miss any).

Michel Lespinasse (10):
  x86 rwsem: minor cleanups
  rwsem: fully separate code pathes to wake writers vs readers
  rwsem: lighter active count checks when waking up readers
  rwsem: let RWSEM_WAITING_BIAS represent any number of waiting threads
  rwsem: wake queued readers when writer blocks on active read lock
  rwsem: smaller wrappers around rwsem_down_failed_common
  generic rwsem: implement down_read_unfair
  rwsem: down_read_unfair infrastructure support
  x86 rwsem: down_read_unfair implementation
  Use down_read_unfair() for /sys/<pid>/exe and /sys/<pid>/maps files

 arch/x86/include/asm/rwsem.h   |   70 ++++++++++++-----
 arch/x86/lib/rwsem_64.S        |   14 +++-
 arch/x86/lib/semaphore_32.S    |   21 +++++-
 fs/proc/base.c                 |    2 +-
 fs/proc/task_mmu.c             |    2 +-
 fs/proc/task_nommu.c           |    2 +-
 include/linux/capability.h     |    1 +
 include/linux/rwsem-spinlock.h |   10 ++-
 include/linux/rwsem.h          |   13 +++
 kernel/rwsem.c                 |   31 ++++++++
 lib/rwsem-spinlock.c           |   10 ++-
 lib/rwsem.c                    |  160 ++++++++++++++++++++++++++--------------
 12 files changed, 247 insertions(+), 89 deletions(-)


^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2010-05-25  9:43 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-17 22:25 [PATCH 00/10] V3: rwsem changes + down_read_critical() proposal Michel Lespinasse
2010-05-17 22:25 ` [PATCH 01/10] x86 rwsem: minor cleanups Michel Lespinasse
2010-05-17 22:25 ` [PATCH 02/10] rwsem: fully separate code pathes to wake writers vs readers Michel Lespinasse
2010-05-17 22:25 ` [PATCH 03/10] rwsem: lighter active count checks when waking up readers Michel Lespinasse
2010-05-17 22:25 ` [PATCH 04/10] rwsem: let RWSEM_WAITING_BIAS represent any number of waiting threads Michel Lespinasse
2010-05-17 22:25 ` [PATCH 05/10] rwsem: wake queued readers when writer blocks on active read lock Michel Lespinasse
2010-05-17 22:25 ` [PATCH 06/10] rwsem: smaller wrappers around rwsem_down_failed_common Michel Lespinasse
2010-05-17 22:25 ` [PATCH 07/10] generic rwsem: implement down_read_critical() / up_read_critical() Michel Lespinasse
2010-05-17 22:44   ` Linus Torvalds
2010-05-17 23:13     ` Michel Lespinasse
2010-05-17 23:20       ` Michel Lespinasse
2010-05-19 13:21       ` David Howells
2010-05-19 23:47         ` Michel Lespinasse
2010-05-21  3:35         ` Michel Lespinasse
2010-05-17 22:25 ` [PATCH 08/10] rwsem: down_read_critical infrastructure support Michel Lespinasse
2010-05-17 22:25 ` [PATCH 09/10] x86 rwsem: down_read_critical implementation Michel Lespinasse
2010-05-17 22:25 ` [PATCH 10/10] Use down_read_critical() for /sys/<pid>/exe and /sys/<pid>/maps files Michel Lespinasse
2010-05-19 11:47 ` [PATCH 01/10] x86 rwsem: minor cleanups David Howells
2010-05-20 21:37   ` Michel Lespinasse
2010-05-19 12:04 ` [PATCH 02/10] rwsem: fully separate code pathes to wake writers vs readers David Howells
2010-05-20 21:48   ` Michel Lespinasse
2010-05-19 12:25 ` [PATCH 03/10] rwsem: lighter active count checks when waking up readers David Howells
2010-05-20 22:33   ` Michel Lespinasse
2010-05-21  8:06   ` David Howells
2010-05-19 12:33 ` [PATCH 04/10] rwsem: let RWSEM_WAITING_BIAS represent any number of waiting threads David Howells
2010-05-19 12:44 ` [PATCH 05/10] rwsem: wake queued readers when writer blocks on active read lock David Howells
2010-05-19 12:51 ` [PATCH 06/10] rwsem: smaller wrappers around rwsem_down_failed_common David Howells
2010-05-19 13:34 ` [PATCH 08/10] rwsem: down_read_critical infrastructure support David Howells
2010-05-20 23:30   ` Michel Lespinasse
2010-05-21  8:03   ` David Howells
2010-05-19 14:36 ` [PATCH 09/10] x86 rwsem: down_read_critical implementation David Howells
2010-05-19 15:21 ` [PATCH 10/10] Use down_read_critical() for /sys/<pid>/exe and /sys/<pid>/maps files David Howells
2010-05-21  2:44   ` Michel Lespinasse
2010-05-22  1:49   ` Michel Lespinasse
2010-05-25  9:42   ` David Howells
  -- strict thread matches above, loose matches on Subject: below --
2010-05-14 12:39 [PATCH 00/10] V2: rwsem changes + down_read_unfair() proposal Michel Lespinasse
2010-05-14 12:39 ` [PATCH 04/10] rwsem: let RWSEM_WAITING_BIAS represent any number of waiting threads Michel Lespinasse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).