All of lore.kernel.org

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: [Qemu-devel] [PULL 00/19] Block patches
From: Anthony Liguori @ 2011-10-24 16:19 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel
In-Reply-To: <1319217556-28273-1-git-send-email-kwolf@redhat.com>

On 10/21/2011 12:18 PM, Kevin Wolf wrote:
> The following changes since commit c2e2343e1faae7bbc77574c12a25881b1b696808:
>
>    hw/arm_gic.c: Fix save/load of irq_target array (2011-10-21 17:19:56 +0200)
>
> are available in the git repository at:
>    git://repo.or.cz/qemu/kevin.git for-anthony

Pulled.  Thanks.

Regards,

Anthony Liguori

>
> Alex Jia (1):
>        fix memory leak in aio_write_f
>
> Kevin Wolf (5):
>        xen_disk: Always set feature-barrier = 1
>        fdc: Fix floppy port I/O
>        qemu-img: Don't allow preallocation and compression at the same time
>        qcow2: Fix bdrv_write_compressed error handling
>        pc: Fix floppy drives with if=none
>
> Paolo Bonzini (12):
>        sheepdog: add coroutine_fn markers
>        add socket_set_block
>        block: rename bdrv_co_rw_bh
>        block: unify flush implementations
>        block: add bdrv_co_discard and bdrv_aio_discard support
>        vmdk: fix return values of vmdk_parent_open
>        vmdk: clean up open
>        block: add a CoMutex to synchronous read drivers
>        block: take lock around bdrv_read implementations
>        block: take lock around bdrv_write implementations
>        block: change flush to co_flush
>        block: change discard to co_discard
>
> Stefan Hajnoczi (1):
>        block: drop redundant bdrv_flush implementation
>
>   block.c               |  258 ++++++++++++++++++++++++++++++-------------------
>   block.h               |    5 +
>   block/blkdebug.c      |    6 -
>   block/blkverify.c     |    9 --
>   block/bochs.c         |   15 +++-
>   block/cloop.c         |   15 +++-
>   block/cow.c           |   34 ++++++-
>   block/dmg.c           |   15 +++-
>   block/nbd.c           |   28 +++++-
>   block/parallels.c     |   15 +++-
>   block/qcow.c          |   17 +---
>   block/qcow2-cluster.c |    6 +-
>   block/qcow2.c         |   72 ++++++--------
>   block/qed.c           |    6 -
>   block/raw-posix.c     |   23 +----
>   block/raw-win32.c     |    4 +-
>   block/raw.c           |   23 ++---
>   block/rbd.c           |    4 +-
>   block/sheepdog.c      |   14 ++--
>   block/vdi.c           |    6 +-
>   block/vmdk.c          |   82 ++++++++++------
>   block/vpc.c           |   34 ++++++-
>   block/vvfat.c         |   28 +++++-
>   block_int.h           |    9 +-
>   hw/fdc.c              |   14 +++
>   hw/fdc.h              |    9 ++-
>   hw/pc.c               |   25 +++--
>   hw/pc.h               |    3 +-
>   hw/pc_piix.c          |    5 +-
>   hw/xen_disk.c         |    5 +-
>   oslib-posix.c         |    7 ++
>   oslib-win32.c         |    6 +
>   qemu-img.c            |   11 ++
>   qemu-io.c             |    1 +
>   qemu_socket.h         |    1 +
>   trace-events          |    1 +
>   36 files changed, 524 insertions(+), 292 deletions(-)
>
>

^ permalink raw reply

* Re: [Qemu-devel] [PULL v3 00/13] allow tools to use the QEMU main loop
From: Anthony Liguori @ 2011-10-24 16:19 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel
In-Reply-To: <1319214405-20388-1-git-send-email-pbonzini@redhat.com>

On 10/21/2011 11:26 AM, Paolo Bonzini wrote:
> The following changes since commit c76eaf13975130768070ecd2d4f3107eb69ab757:
>
>    hw/9pfs: Fix broken compilation caused by wrong trace events (2011-10-20 15:30:59 -0500)
>
> are available in the git repository at:
>    git://github.com/bonzini/qemu.git split-main-loop-for-anthony

Pulled.  Thanks.

Regards,

Anthony Liguori

> This patch series makes the QEMU main loop usable out of the executable,
> and especially in tools and possibly unit tests.  This is cleaner because
> it avoids introducing partial transitions to GIOChannel.  Interfacing with
> the glib main loop is still possible.
>
> The main loop code is currently split in cpus.c and vl.c.  Moving it
> to a new file is easy; the problem is that the main loop depends on the
> timer infrastructure in qemu-timer.c, and that file currently contains
> the implementation of icount and the vm_clock.  This is bad for the
> perspective of linking qemu-timer.c into the tools.  Luckily, it is
> relatively easy to untie them and move them out of the way.  This is
> what the largest part of the series does (patches 1-9).
>
> Patches 10-13 complete the refactoring and cleanup some surrounding
> code.
>
> v2->v3
> 	Rebased, added documentation
>
> v1->v2
> 	Rebased
>
> Paolo Bonzini (13):
>    remove unused function
>    qemu-timer: remove active_timers array
>    qemu-timer: move common code to qemu_rearm_alarm_timer
>    qemu-timer: more clock functions
>    qemu-timer: move icount to cpus.c
>    qemu-timer: do not refer to runstate_is_running()
>    qemu-timer: use atexit for quit_timers
>    qemu-timer: move more stuff out of qemu-timer.c
>    qemu-timer: do not use RunState change handlers
>    main-loop: create main-loop.h
>    main-loop: create main-loop.c
>    Revert to a hand-made select loop
>    simplify main loop functions
>
>   Makefile.objs         |    2 +-
>   async.c               |    1 +
>   cpus.c                |  497 ++++++++++++++++++++++++++++---------------------
>   cpus.h                |    3 +-
>   exec-all.h            |   14 ++
>   exec.c                |    3 -
>   hw/mac_dbdma.c        |    5 -
>   hw/mac_dbdma.h        |    1 -
>   iohandler.c           |   55 +------
>   main-loop.c           |  495 ++++++++++++++++++++++++++++++++++++++++++++++++
>   main-loop.h           |  351 ++++++++++++++++++++++++++++++++++
>   os-win32.c            |  123 ------------
>   qemu-char.h           |   12 +-
>   qemu-common.h         |   37 +----
>   qemu-coroutine-lock.c |    1 +
>   qemu-os-posix.h       |    4 -
>   qemu-os-win32.h       |   17 +--
>   qemu-timer.c          |  489 +++++++++---------------------------------------
>   qemu-timer.h          |   31 +---
>   savevm.c              |   25 +++
>   slirp/libslirp.h      |   11 -
>   sysemu.h              |    3 +-
>   vl.c                  |  189 ++++---------------
>   23 files changed, 1309 insertions(+), 1060 deletions(-)
>   create mode 100644 main-loop.c
>   create mode 100644 main-loop.h
>

^ permalink raw reply

* Confidential/How are you
From: Barrister  Jacque Charles @ 2011-10-24 15:21 UTC (permalink / raw)





Dearest,


My name is Barrister Jacque Charles, a personal Attorney to a late client who died in car crash without a will.
For more information please contact via email: (jcchamber@rocketmail.com) upon your response, I shall then provide you with more details and relevant documents that will help you understand this transaction well.


Kindest Regards 
Barrister Jacque Charles,

^ permalink raw reply

* Re: [Qemu-devel] gcc auto-omit-frame-pointer vs msvc longjmp
From: Kai Tietz @ 2011-10-24 16:18 UTC (permalink / raw)
  To: Bob Breuer
  Cc: xunxun, Richard Henderson, qemu-devel, Mark Cave-Ayland,
	gcc@gcc.gnu.org
In-Reply-To: <4EA57A26.1050806@mc.net>

2011/10/24 Bob Breuer <breuerr@mc.net>:
> Kai Tietz wrote:
>> Hi,
>>
>> For trunk-version I have a tentative patch for this issue.  On 4.6.x
>> and older branches this doesn't work, as here we can't differenciate
>> that easy between ms- and sysv-abi.
>>
>> But could somebody give this patch a try?
>>
>> Regards,
>> Kai
>>
>> ChangeLog
>>
>>         * config/i386/i386.c (ix86_frame_pointer_required): Enforce use of
>>         frame-pointer for 32-bit ms-abi, if setjmp is used.
>>
>> Index: i386.c
>> ===================================================================
>> --- i386.c      (revision 180099)
>> +++ i386.c      (working copy)
>> @@ -8391,6 +8391,10 @@
>>    if (SUBTARGET_FRAME_POINTER_REQUIRED)
>>      return true;
>>
>> +  /* For older 32-bit runtimes setjmp requires valid frame-pointer.  */
>> +  if (TARGET_32BIT_MS_ABI && cfun->calls_setjmp)
>> +    return true;
>> +
>>    /* In ix86_option_override_internal, TARGET_OMIT_LEAF_FRAME_POINTER
>>       turns off the frame pointer by default.  Turn it back on now if
>>       we've not got a leaf function.  */
>>
>
> For a gcc 4.7 snapshot, this does fix the longjmp problem that I
> encountered.  So aside from specifying -fno-omit-frame-pointer for
> affected files, what can be done for 4.6?
>
> Bob

Well, for 4.6.x (or older) we just can use the mingw32.h header in
gcc/config/i386/ and define here a subtarget-macro to indicate that.
The only incompatible point here might be for Wine using the
linux-compiler to build Windows related code.

A possible patch for 4.6 gcc versions I attached to this mail.

Regards,
Kai

Index: mingw32.h
===================================================================
--- mingw32.h   (revision 180393)
+++ mingw32.h   (working copy)
@@ -239,3 +239,8 @@
 /* We should find a way to not have to update this manually.  */
 #define LIBGCJ_SONAME "libgcj" /*LIBGCC_EH_EXTN*/ "-12.dll"

+/* For 32-bit Windows we need valid frame-pointer for function using
+   setjmp.  */
+#define SUBTARGET_SETJMP_NEED_FRAME_POINTER \
+  (!TARGET_64BIT && cfun->calls_setjmp)
+
Index: i386.c
===================================================================
--- i386.c      (revision 180393)
+++ i386.c      (working copy)
@@ -8741,6 +8741,12 @@
   if (SUBTARGET_FRAME_POINTER_REQUIRED)
     return true;

+#ifdef SUBTARGET_SETJMP_NEED_FRAME_POINTER
+  /* For older 32-bit runtimes setjmp requires valid frame-pointer.  */
+  if (SUBTARGET_SETJMP_NEED_FRAME_POINTER)
+    return true;
+#endif
+
   /* In ix86_option_override_internal, TARGET_OMIT_LEAF_FRAME_POINTER
      turns off the frame pointer by default.  Turn it back on now if
      we've not got a leaf function.  */

^ permalink raw reply

* Re: NFS4 client blocked (kernel 3.0.7)
From: Dilip Daya @ 2011-10-24 16:18 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs@vger.kernel.org, David Flynn
In-Reply-To: <1319449226.2785.7.camel@lade.trondhjem.org>

Embedded comments...below...

On Mon, 2011-10-24 at 09:40 +0000, Trond Myklebust wrote:
> On Sat, 2011-10-22 at 12:00 -0400, Dilip Daya wrote: 
> > See below...
> > 
> > 0n Sat, 2011-10-22 at 08:28 +0000, David Flynn wrote: 
> > > Dear all,
> > > 
> > > When mounting a solaris NFS4 export on a v3.0.4 client, we've experienced
> > > processes becoming blocked.  Any further attempt to access the mountpoint
> > > from another process also blocks.  Other mountpoints are unaffected.
> > > I have not identified a test case to reproduce the behaviour.
> > > 
> > > Any thoughts on the matter would be most welcome,
> > > 
> > > Kind regards,
> > > 
> > > ..david
> > > 
> > > from /proc/mounts:
> > > home:/home/ /home nfs4 rw,relatime,vers=4,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=172.29.190.20,minorversion=0,local_lock=none,addr=172.29.120.140 0 0
> > > 
> > > [105121.204200] INFO: task bash:4457 blocked for more than 120 seconds.
> > > [105121.247424] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > [105121.299955] bash            D ffffffff818050a0     0  4457      1 0x00000000
> > > [105121.347840]  ffff8802954b5c28 0000000000000082 ffff8802954b5db8 0000000000012a40
> > > [105121.397793]  ffff8802954b5fd8 0000000000012a40 ffff8802954b4000 0000000000012a40
> > > [105121.441724]  0000000000012a40 0000000000012a40 ffff8802954b5fd8 0000000000012a40
> > > [105121.441728] Call Trace:
> > > [105121.441740]  [<ffffffff81110030>] ? __lock_page+0x70/0x70
> > > [105121.441744]  [<ffffffff8160007c>] io_schedule+0x8c/0xd0
> > > [105121.441746]  [<ffffffff8111003e>] sleep_on_page+0xe/0x20
> > > [105121.441749]  [<ffffffff816008ff>] __wait_on_bit+0x5f/0x90
> > > [105121.441751]  [<ffffffff81110203>] wait_on_page_bit+0x73/0x80
> > > [105121.441756]  [<ffffffff81085bf0>] ? autoremove_wake_function+0x40/0x40
> > > [105121.441759]  [<ffffffff8111c5e5>] ? pagevec_lookup_tag+0x25/0x40
> > > [105121.441761]  [<ffffffff81110436>] filemap_fdatawait_range+0xf6/0x1a0
> > > [105121.441786]  [<ffffffffa023a7d0>] ? nfs_destroy_directcache+0x20/0x20 [nfs]
> > > [105121.441789]  [<ffffffff8111bae1>] ? do_writepages+0x21/0x40
> > > [105121.441791]  [<ffffffff811116bb>] ? __filemap_fdatawrite_range+0x5b/0x60
> > > [105121.441793]  [<ffffffff8111050b>] filemap_fdatawait+0x2b/0x30
> > > [105121.441795]  [<ffffffff81112124>] filemap_write_and_wait+0x44/0x60
> > > [105121.441803]  [<ffffffffa0232805>] nfs_getattr+0x105/0x120 [nfs]
> > > [105121.441806]  [<ffffffff81605e88>] ? do_page_fault+0x258/0x550
> > > [105121.441810]  [<ffffffff81175b31>] vfs_getattr+0x51/0x120
> > > [105121.441812]  [<ffffffff81175c70>] vfs_fstatat+0x70/0x90
> > > [105121.441814]  [<ffffffff81175ccb>] vfs_stat+0x1b/0x20
> > > [105121.441816]  [<ffffffff81175f14>] sys_newstat+0x24/0x40
> > > [105121.441820]  [<ffffffff8101449a>] ? init_fpu+0x4a/0x150
> > > [105121.441822]  [<ffffffff81602955>] ? page_fault+0x25/0x30
> > > [105121.441825]  [<ffffffff8160a702>] system_call_fastpath+0x16/0x1b
> > > [105121.441837] INFO: task bash:5612 blocked for more than 120 seconds.
> > > [105121.441838] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > [105121.441840] bash            D 0000000000000005     0  5612      1 0x00000000
> > > [105121.441843]  ffff8801f25d5ca8 0000000000000086 ffff8800163e9b08 0000000000012a40
> > > [105121.441845]  ffff8801f25d5fd8 0000000000012a40 ffff8801f25d4000 0000000000012a40
> > > [105121.441848]  0000000000012a40 0000000000012a40 ffff8801f25d5fd8 0000000000012a40
> > > [105121.441850] Call Trace:
> > > [105121.441853]  [<ffffffff81110030>] ? __lock_page+0x70/0x70
> > > [105121.441855]  [<ffffffff8160007c>] io_schedule+0x8c/0xd0
> > > [105121.441857]  [<ffffffff8111003e>] sleep_on_page+0xe/0x20
> > > [105121.441859]  [<ffffffff816008ff>] __wait_on_bit+0x5f/0x90
> > > [105121.441861]  [<ffffffff81110203>] wait_on_page_bit+0x73/0x80
> > > [105121.441863]  [<ffffffff81085bf0>] ? autoremove_wake_function+0x40/0x40
> > > [105121.441866]  [<ffffffff8111c5e5>] ? pagevec_lookup_tag+0x25/0x40
> > > [105121.441868]  [<ffffffff81110436>] filemap_fdatawait_range+0xf6/0x1a0
> > > [105121.441876]  [<ffffffffa023a7d0>] ? nfs_destroy_directcache+0x20/0x20 [nfs]
> > > [105121.441878]  [<ffffffff8111bae1>] ? do_writepages+0x21/0x40
> > > [105121.441880]  [<ffffffff811116bb>] ? __filemap_fdatawrite_range+0x5b/0x60
> > > [105121.441882]  [<ffffffff81111730>] filemap_write_and_wait_range+0x70/0x80
> > > [105121.441886]  [<ffffffff8119cc6a>] vfs_fsync_range+0x5a/0x90
> > > [105121.441888]  [<ffffffff8119cd0c>] vfs_fsync+0x1c/0x20
> > > [105121.441894]  [<ffffffffa022ec74>] nfs_file_flush+0x54/0x80 [nfs]
> > > [105121.441898]  [<ffffffff8116ee7f>] filp_close+0x3f/0x90
> > > [105121.441900]  [<ffffffff8116f8a7>] sys_close+0xb7/0x120
> > > [105121.441902]  [<ffffffff8160a702>] system_call_fastpath+0x16/0x1b
> > > --
> > 
> > Same issue!
> > 
> > In my case I have NFS client & server  both with Linux kernel
> > v3.0.7-stable.
> > 
> > 
> > Kernel: v3.0.7-stable (amd64)
> > 
> > # nfsstat -m
> > /opt/xorsyst/nfs_test from 192.168.1.53:/opt/xorsyst/nfs_test
> > Flags:
> > rw,relatime,vers=4,rsize=32768,wsize=32768,namlen=255,hard,proto=udp,port=0,timeo=600,retrans=6,sec=sys,clientaddr=192.168.1.52,minorversion=0,local_lock=none,addr=192.168.1.53
> 
> Sigh... Why are you using udp with timeo!=default? You do realise that
> unlike tcp, udp is a lossy protocol with no guarantee that messages will
> actually be delivered to the server?
> 
> Trond

Hi Trond,

Thank you for your response. I should have provided you additional
details surrounding this issue:

Yes, I truly understand not using UDP, sorry for not providing you
additional background information earlier:

We use an in-house test-suite-tool much like LTP to test newer kernels
(v3.0.x kernel) _before_ we release them in production. We run various
tests for 96 CHO (Continuous-Hours-of-Operation). This issue was
reported in one such test using:

v3.0.7-stable kernel on both NFS client/server (x86_64) systems:

# nfsstat -m
/opt/xorsyst/nfs_test from 192.168.1.53:/opt/xorsyst/nfs_test
 Flags:
rw,relatime,vers=4,rsize=32768,wsize=32768,namlen=255,hard,proto=udp,port=0,timeo=600,retrans=6,sec=sys,clientaddr=192.168.1.52,minorversion=0,local_lock=none,addr=192.168.1.53

...BUT we've had successful completion of 96 CHO using NFS/TCP (rather
than NFS/UDP) with no issues.  (Not even any task "blocked for 120
seconds" nor any NFS "server not responding" messages.)

# mount -o rw,sync,proto=tcp,timeo=600,retrans=6 192.168.1.xx:/opt/xorsyst/nfs_test /opt/xorsyst/nfs_test

Note:
We've had no testing issues (NFS/UDP) with 2.6.32 based kernels which are in production at this time.

Status update:
At this time I have a system in this state, i.e. 
- "df" command hangs. Show local filesystems but hangs at showing NFS mounts.
- Our tests continue now at 48 hours with only one NFS/UDP issue as reported above.
- I issued "echo 0 >/proc/sys/sunrpc/rpc_debug", unfortunately all the PID involved
  in the reported backtraces no longer exist and so will have to wait for another occurrence.

=> Is there other data that I should collect?
=> Any patches patches that I could apply to v3.0.7 and retry my test?

Thanking you in advance.
-DilipD.





^ permalink raw reply

* Re: [PATCH 13/X] uprobes: introduce UTASK_SSTEP_TRAPPED logic
From: Oleg Nesterov @ 2011-10-24 16:13 UTC (permalink / raw)
  To: Ananth N Mavinakayanahalli
  Cc: Srikar Dronamraju, Peter Zijlstra, Ingo Molnar, Steven Rostedt,
	Linux-mm, Arnaldo Carvalho de Melo, Linus Torvalds,
	Jonathan Corbet, Masami Hiramatsu, Hugh Dickins,
	Christoph Hellwig, Thomas Gleixner, Andi Kleen, Andrew Morton,
	Jim Keniston, Roland McGrath, LKML
In-Reply-To: <20111024151614.GA6034@in.ibm.com>

On 10/24, Ananth N Mavinakayanahalli wrote:
>
> On Mon, Oct 24, 2011 at 04:41:27PM +0200, Oleg Nesterov wrote:
> >
> > Agreed! it would be nice to "hide" these int3's if we dump the core, but
> > I think this is a bit off-topic. It makes sense to do this in any case,
> > even if the core-dumping was triggered by another thread/insn. It makes
> > sense to remove all int3's, not only at regs->ip location. But how can
> > we do this? This is nontrivial.
>
> I don't think that is a problem.. see below...
>
> > And. Even worse. Suppose that you do "gdb probed_application". Now you
> > see int3's in the disassemble output. What can we do?
>
> In this case, nothing.
>
> > I think we can do nothing, at least currently. This just reflects the
> > fact that uprobe connects to inode, not to process/mm/etc.
> >
> > What do you think?
>
> Thinking further on this, in the normal 'running gdb on a core' case, we
> won't have this problem, as the binary that we point gdb to, will be a
> pristine one, without the uprobe int3s, right?

Not sure I understand.

I meant, if we have a binary with uprobes (iow, register_uprobe() installed
uprobes into that file), then gdb will see int3's with or without the core.
Or you can add uprobe into glibc, say you can probe getpid(). Now (again,
with or without the core) disassemble shows that getpid() starts with int3.

But I guess you meant something else...

Oleg.


^ permalink raw reply

* Re: [PATCH 13/X] uprobes: introduce UTASK_SSTEP_TRAPPED logic
From: Oleg Nesterov @ 2011-10-24 16:13 UTC (permalink / raw)
  To: Ananth N Mavinakayanahalli
  Cc: Srikar Dronamraju, Peter Zijlstra, Ingo Molnar, Steven Rostedt,
	Linux-mm, Arnaldo Carvalho de Melo, Linus Torvalds,
	Jonathan Corbet, Masami Hiramatsu, Hugh Dickins,
	Christoph Hellwig, Thomas Gleixner, Andi Kleen, Andrew Morton,
	Jim Keniston, Roland McGrath, LKML
In-Reply-To: <20111024151614.GA6034@in.ibm.com>

On 10/24, Ananth N Mavinakayanahalli wrote:
>
> On Mon, Oct 24, 2011 at 04:41:27PM +0200, Oleg Nesterov wrote:
> >
> > Agreed! it would be nice to "hide" these int3's if we dump the core, but
> > I think this is a bit off-topic. It makes sense to do this in any case,
> > even if the core-dumping was triggered by another thread/insn. It makes
> > sense to remove all int3's, not only at regs->ip location. But how can
> > we do this? This is nontrivial.
>
> I don't think that is a problem.. see below...
>
> > And. Even worse. Suppose that you do "gdb probed_application". Now you
> > see int3's in the disassemble output. What can we do?
>
> In this case, nothing.
>
> > I think we can do nothing, at least currently. This just reflects the
> > fact that uprobe connects to inode, not to process/mm/etc.
> >
> > What do you think?
>
> Thinking further on this, in the normal 'running gdb on a core' case, we
> won't have this problem, as the binary that we point gdb to, will be a
> pristine one, without the uprobe int3s, right?

Not sure I understand.

I meant, if we have a binary with uprobes (iow, register_uprobe() installed
uprobes into that file), then gdb will see int3's with or without the core.
Or you can add uprobe into glibc, say you can probe getpid(). Now (again,
with or without the core) disassemble shows that getpid() starts with int3.

But I guess you meant something else...

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Congratulation!!! Money Is Awarded To You
From: Western Union© @ 2011-10-24 15:49 UTC (permalink / raw)




-- 
 Dear Beneficiary,

 The sum of $900,000.00USD has been deposited in your name here in the 
 western union office by Ecowas Organisation, you are to contact Mr. Tom 
 Carlson to collect your money transfer control number (M.T.C.N).
 
 Contact email: customer.service.western.un@hotmail.com
 
 Western Union©
 Customer Service

^ permalink raw reply

* Re: [PATCH] scheduler rate controller
From: George Dunlap @ 2011-10-24 16:17 UTC (permalink / raw)
  To: Lv, Hui
  Cc: Duan, Jiangang, Tian, Kevin, xen-devel@lists.xensource.com,
	keir@xen.org, Dong, Eddie
In-Reply-To: <C10D3FB0CD45994C8A51FEC1227CE22F340768D793@shsmsx502.ccr.corp.intel.com>

On Mon, Oct 24, 2011 at 4:36 AM, Lv, Hui <hui.lv@intel.com> wrote:
>
> As one of the topics presented in Xen summit2011 in SC, we proposed one method scheduler rate controller (SRC) to control high frequency of scheduling under some conditions. You can find the slides at
> http://www.slideshare.net/xen_com_mgr/9-hui-lvtacklingthemanagementchallengesofserverconsolidationonmulticoresystems
>
> In the followings, we have tested it with 2-socket multi-core system with many rounds and got the positive results and improve the performance greatly either with the consolidation workload SPECvirt_2010 or some small workloads such as sysbench and SPECjbb. So I posted it here for review.
>
> >From Xen scheduling mechanism, hypervisor kicks related VCPUs by raising schedule softirq during processing external interrupts. Therefore, if the number of IRQ is very large, the scheduling happens more frequent. Frequent scheduling will
> 1) bring more overhead for hypervisor and
> 2) increase cache miss rate.
>
> In our consolidation workloads, SPECvirt_sc2010, SR-IOV & iSCSI solution are adopted to bypass software emulation but bring heavy network traffic. Correspondingly, 15k scheduling happened per second on each physical core, which means the average running time is  very short, only 60us. We proposed SRC in XEN to mitigate this problem.
> The performance benefits brought by this patch is very huge at peak throughput with no influence when system loads are low.
>
> SRC improved SPECvirt performance by 14%.
> 1)It reduced CPU utilization, which allows more load to be added.
> 2)Response time (QoS)  became better at the same CPU %.
> 3)The better response time allowed us to push the CPU % at peak performance to an even higher level (CPU was not saturated in SPECvirt).
> SRC reduced context switch rate significantly, resulted in
> 2)Smaller Path Length
> 3)Less cache misses thus lower CPI
> 4)Better performance for both Guest and Hypervisor sides.
>
> With this patch, from our SPECvirt_sc2010 results, the performance of xen catches up the other open sourced hypervisor.

Hui,

Thanks for the patch, and the work you've done testing it.  There are
a couple of things to discuss.

* I'm not sure I like the idea of doing this at the generic level than
at the specific scheduler level -- e.g., inside of credit1.  For
better or for worse, all aspects of scheduling work together, and even
small changes tend to have a significant effect on the emergent
behavior.  I understand why you'd want this in the generic scheduling
code; but it seems like it would be better for each scheduler to
implement a rate control independently.

* The actual algorithm you use here isn't described.  It seems to be
as follows (please correct me if I've made a mistake
reverse-engineering the algorithm):

Every 10ms, check to see if there have been more than 50 schedules.
If so, disable pre-emption entirely for 10ms, allowing processes to
run without being interrupted (unless they yield).

It seems like we should be able to do better.  For one, it means in
the general case you will flip back and forth between really frequent
schedules and less frequent schedules.  For two, turning off
preemption entirely will mean that whatever vcpu happens to be running
could, if it wished, run for the full 10ms; and which one got elected
to do that would be really random.  This may work well for SPECvirt,
but it's the kind of algorithm that is likely to have some workloads
on which it works very poorly.  Finally, there's the chance that this
algorithm could be "gamed" -- i.e., if a rogue VM knew that most other
VMs yielded frequently, it might be able to arrange that there would
always be more than 50 context switches a second, while it runs
without preemption and takes up more than its fair share.

Have you tried just making it give each vcpu a minimum amount of
scheduling time, say, 500us or 1ms?

Now a couple of stylistic comments:
* src tends to make me think of "source".  I think sched_rate[_*]
would fit the existing naming convention better.
* src_controller() shouldn't call continue_running() directly.
Instead, scheduler() should call src_controller(); and only call
sched->do_schedule() if src_controller() returns false (or something
like that).
* Whatever the algorithm is should have comments describing what it
does and how it's supposed to work.
* Your patch is malformed; you need to have it apply at the top level,
not from within the xen/ subdirectory.  The easiest way to get a patch
is to use either mercurial queues, or "hg diff".  There are some good
suggestions for making and posting patches here:
http://wiki.xensource.com/xenwiki/SubmittingXenPatches

Thanks again for all your work on this -- we definitely want Xen to
beat the other open-source hypervisor. :-)

 -George

^ permalink raw reply

* Re: [lm-sensors] [PATCH 5/6] IIO:hwmon interface client driver.
From: Jonathan Cameron @ 2011-10-24 16:15 UTC (permalink / raw)
  To: guenter.roeck
  Cc: linux-kernel@vger.kernel.org, linux-iio@vger.kernel.org,
	linus.ml.walleij@gmail.com, zdevai@gmail.com,
	linux@arm.linux.org.uk, arnd@arndb.de,
	broonie@opensource.wolfsonmicro.com, gregkh@suse.de,
	lm-sensors@lm-sensors.org, khali@linux-fr.org,
	thomas.petazzoni@free-electrons.com,
	maxime.ripard@free-electrons.com
In-Reply-To: <1319472607.2583.49.camel@groeck-laptop>

On 10/24/11 17:10, Guenter Roeck wrote:
> On Mon, 2011-10-24 at 11:58 -0400, Jonathan Cameron wrote:
>> On 10/24/11 16:39, Guenter Roeck wrote:
>>> On Mon, 2011-10-24 at 06:09 -0400, Jonathan Cameron wrote:
>>> [ ... ]
>>>>>>> +/*
>>>>>>> + * Assumes that IIO and hwmon operate in the same base units.
>>>>>>> + * This is supposed to be true, but needs verification for
>>>>>>> + * new channel types.
>>>>>>> + */
>>>>>>> +static ssize_t iio_hwmon_read_val(struct device *dev,
>>>>>>> +				  struct device_attribute *attr,
>>>>>>> +				  char *buf)
>>>>>>> +{
>>>>>>> +	long result;
>>>>>>> +	int val, ret, scaleint, scalepart;
>>>>>>> +	struct sensor_device_attribute *sattr = to_sensor_dev_attr(attr);
>>>>>>> +	struct iio_hwmon_state *state = dev_get_drvdata(dev);
>>>>>>> +
>>>>>>> +	/*
>>>>>>> +	 * No locking between this pair, so theoretically possible
>>>>>>> +	 * the scale has changed.
>>>>>>> +	 */
>>>>>>> +	ret = iio_read_channel_raw(state->channels[sattr->index],
>>>>>>> +				   &val);
>>>>>>> +	if (ret < 0)
>>>>>>> +		return ret;
>>>>>>> +
>>>>>>> +	ret = iio_read_channel_scale(state->channels[sattr->index],
>>>>>>> +				     &scaleint, &scalepart);
>>>>>>> +	if (ret < 0)
>>>>>>> +		return ret;
>>>>>>> +	switch (ret) {
>>>>>>> +	case IIO_VAL_INT:
>>>>>>> +		result = val * scaleint;
>>>>>>> +		break;
>>>>>>> +	case IIO_VAL_INT_PLUS_MICRO:
>>>>>>> +		result = (long)val * (long)scaleint +
>>>>>>> +			(long)val * (long)scalepart / 1000000L;
>>>>>>> +		break;
>>>>>>> +	case IIO_VAL_INT_PLUS_NANO:
>>>>>>> +		result = (long)val * (long)scaleint +
>>>>>>> +			(long)val * (long)scalepart / 1000000000L;
>>>>>>> +		break;
>>>>>>
>>>>>> Still easy to imagine that val * scalepart gets larger than 2147483647L
>>>>>> (on machines where sizeof(long) = 4) ... it will already happen if the
>>>>>> result of (val * scalepart / 1000000000) is larger than 2. 
>>>>> Good point.  I really ought to have done the calcs.
>>>>> If we have maximum possible value in here things will be ugly.
>>>>>
>>>>> Worst case is scalepart is 9999999999. (could be done as 1 - 0.000000001
>>>>> which would be nicer, but we don't specify a preference - from this
>>>>> discussion I am suspecting we should!)
>>>>>
>>>>> Looks like 64 bits is going to be a requirement as you say.
>>>>>>
>>>>>> What value range do you expect to see here ?
>>>>>>
>>>>>> If (val * scaleint) is already the milli-unit, scalepart would possibly
>>>>>> only address fractions of milli-units. If so, the result of (val *
>>>>>> scalepart / 1000000000L) might always be smaller than 1, ie 0. 
>>>>> It certainly should be.
>>>>>> If so, for the calculation to have any value, you might be better off using
>>>>>> DIV_ROUND_CLOSEST(val * scalepart, 1000000000L).
>>>>> Good idea.
>>>>>>
>>>>>> I am a bit confused by this anyway. Since hwmon in general reports
>>>>>> milli-units, VAL_INT appears to reflect milli-units, VAL_INT_PLUS_MICRO
>>>>>> really means nano-units, and IIO_VAL_INT_PLUS_NANO really means
>>>>>> pico-units. Is this correct ?
>>>>> Micro units of the scale factor.
>>>>>
>>>>> Take my test part a max1363...
>>>>> Scale is actually 0.5 so each adc count (e.g. raw value) is 0.5millivolts.
>>>>>
>>>>> scale int here is 0,
>>>>> scale part is 500,000 (so 0.5) and it returns IIO_VAL_INT_PLUS_MICRO.
>>>>
>>>> How about the following?  It'll be extremely costly, but this isn't exactly
>>>> a fast path!
>>>>
>>>> 	case IIO_VAL_INT_PLUS_MICRO:
>>>> 		result = (s64)val * (s64)scaleint +
>>>> 			div_s64((s64)val * (s64)scalepart, 1000000LL);
>>>> 		break;
>>>> 	case IIO_VAL_INT_PLUS_NANO:
>>>> 		result = (s64)val * (s64)scaleint +
>>>> 			div_s64((s64)val * (s64)scalepart, 1000000000LL);
>>>> 		break;
>>>
>>> Is div_s64 really necessary, or would
>>>
>>> 		result = (long)val * (long)scaleint +
>>> 			DIV_ROUND_CLOSEST((s64)val * (s64)scalepart,
>>> 					 1000000000LL);
>>>
>>> work as well ?
>> Not if you want it to compile on arm v5 by the look of it.
>>
>> ERROR: "__aeabi_ldivmod" [drivers/staging/iio/iio_hwmon.ko] undefined!
>>
> Annoying. Ok, I don't have a better idea than using div_s64. You don't
> need s64 for the first part of the operation (val * scaleint), though,
> since the result is a long.
True enough. Pretty unlikely we are going to have 2 MV hwmon devices any
time soon.  I'll pop that back down to int * int I think!


_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply

* Re: [PATCH 5/6] IIO:hwmon interface client driver.
From: Jonathan Cameron @ 2011-10-24 16:15 UTC (permalink / raw)
  To: guenter.roeck
  Cc: linux-kernel@vger.kernel.org, linux-iio@vger.kernel.org,
	linus.ml.walleij@gmail.com, zdevai@gmail.com,
	linux@arm.linux.org.uk, arnd@arndb.de,
	broonie@opensource.wolfsonmicro.com, gregkh@suse.de,
	lm-sensors@lm-sensors.org, khali@linux-fr.org,
	thomas.petazzoni@free-electrons.com,
	maxime.ripard@free-electrons.com
In-Reply-To: <1319472607.2583.49.camel@groeck-laptop>

On 10/24/11 17:10, Guenter Roeck wrote:
> On Mon, 2011-10-24 at 11:58 -0400, Jonathan Cameron wrote:
>> On 10/24/11 16:39, Guenter Roeck wrote:
>>> On Mon, 2011-10-24 at 06:09 -0400, Jonathan Cameron wrote:
>>> [ ... ]
>>>>>>> +/*
>>>>>>> + * Assumes that IIO and hwmon operate in the same base units.
>>>>>>> + * This is supposed to be true, but needs verification for
>>>>>>> + * new channel types.
>>>>>>> + */
>>>>>>> +static ssize_t iio_hwmon_read_val(struct device *dev,
>>>>>>> +				  struct device_attribute *attr,
>>>>>>> +				  char *buf)
>>>>>>> +{
>>>>>>> +	long result;
>>>>>>> +	int val, ret, scaleint, scalepart;
>>>>>>> +	struct sensor_device_attribute *sattr = to_sensor_dev_attr(attr);
>>>>>>> +	struct iio_hwmon_state *state = dev_get_drvdata(dev);
>>>>>>> +
>>>>>>> +	/*
>>>>>>> +	 * No locking between this pair, so theoretically possible
>>>>>>> +	 * the scale has changed.
>>>>>>> +	 */
>>>>>>> +	ret = iio_read_channel_raw(state->channels[sattr->index],
>>>>>>> +				   &val);
>>>>>>> +	if (ret < 0)
>>>>>>> +		return ret;
>>>>>>> +
>>>>>>> +	ret = iio_read_channel_scale(state->channels[sattr->index],
>>>>>>> +				     &scaleint, &scalepart);
>>>>>>> +	if (ret < 0)
>>>>>>> +		return ret;
>>>>>>> +	switch (ret) {
>>>>>>> +	case IIO_VAL_INT:
>>>>>>> +		result = val * scaleint;
>>>>>>> +		break;
>>>>>>> +	case IIO_VAL_INT_PLUS_MICRO:
>>>>>>> +		result = (long)val * (long)scaleint +
>>>>>>> +			(long)val * (long)scalepart / 1000000L;
>>>>>>> +		break;
>>>>>>> +	case IIO_VAL_INT_PLUS_NANO:
>>>>>>> +		result = (long)val * (long)scaleint +
>>>>>>> +			(long)val * (long)scalepart / 1000000000L;
>>>>>>> +		break;
>>>>>>
>>>>>> Still easy to imagine that val * scalepart gets larger than 2147483647L
>>>>>> (on machines where sizeof(long) = 4) ... it will already happen if the
>>>>>> result of (val * scalepart / 1000000000) is larger than 2. 
>>>>> Good point.  I really ought to have done the calcs.
>>>>> If we have maximum possible value in here things will be ugly.
>>>>>
>>>>> Worst case is scalepart is 9999999999. (could be done as 1 - 0.000000001
>>>>> which would be nicer, but we don't specify a preference - from this
>>>>> discussion I am suspecting we should!)
>>>>>
>>>>> Looks like 64 bits is going to be a requirement as you say.
>>>>>>
>>>>>> What value range do you expect to see here ?
>>>>>>
>>>>>> If (val * scaleint) is already the milli-unit, scalepart would possibly
>>>>>> only address fractions of milli-units. If so, the result of (val *
>>>>>> scalepart / 1000000000L) might always be smaller than 1, ie 0. 
>>>>> It certainly should be.
>>>>>> If so, for the calculation to have any value, you might be better off using
>>>>>> DIV_ROUND_CLOSEST(val * scalepart, 1000000000L).
>>>>> Good idea.
>>>>>>
>>>>>> I am a bit confused by this anyway. Since hwmon in general reports
>>>>>> milli-units, VAL_INT appears to reflect milli-units, VAL_INT_PLUS_MICRO
>>>>>> really means nano-units, and IIO_VAL_INT_PLUS_NANO really means
>>>>>> pico-units. Is this correct ?
>>>>> Micro units of the scale factor.
>>>>>
>>>>> Take my test part a max1363...
>>>>> Scale is actually 0.5 so each adc count (e.g. raw value) is 0.5millivolts.
>>>>>
>>>>> scale int here is 0,
>>>>> scale part is 500,000 (so 0.5) and it returns IIO_VAL_INT_PLUS_MICRO.
>>>>
>>>> How about the following?  It'll be extremely costly, but this isn't exactly
>>>> a fast path!
>>>>
>>>> 	case IIO_VAL_INT_PLUS_MICRO:
>>>> 		result = (s64)val * (s64)scaleint +
>>>> 			div_s64((s64)val * (s64)scalepart, 1000000LL);
>>>> 		break;
>>>> 	case IIO_VAL_INT_PLUS_NANO:
>>>> 		result = (s64)val * (s64)scaleint +
>>>> 			div_s64((s64)val * (s64)scalepart, 1000000000LL);
>>>> 		break;
>>>
>>> Is div_s64 really necessary, or would
>>>
>>> 		result = (long)val * (long)scaleint +
>>> 			DIV_ROUND_CLOSEST((s64)val * (s64)scalepart,
>>> 					 1000000000LL);
>>>
>>> work as well ?
>> Not if you want it to compile on arm v5 by the look of it.
>>
>> ERROR: "__aeabi_ldivmod" [drivers/staging/iio/iio_hwmon.ko] undefined!
>>
> Annoying. Ok, I don't have a better idea than using div_s64. You don't
> need s64 for the first part of the operation (val * scaleint), though,
> since the result is a long.
True enough. Pretty unlikely we are going to have 2 MV hwmon devices any
time soon.  I'll pop that back down to int * int I think!


^ permalink raw reply

* Re: [PATCH 6/6] mfd: TPS65910: Improve regulator init data
From: Kyle Manna @ 2011-10-24 16:13 UTC (permalink / raw)
  To: Mark Brown
  Cc: linux-kernel, Samuel Ortiz, Liam Girdwood,
	Jorge Eduardo Candelaria, Graeme Gregory
In-Reply-To: <20111019140027.GH18713@sirena.org.uk>

On 10/19/2011 09:00 AM, Mark Brown wrote:
> On Tue, Oct 18, 2011 at 01:26:28PM -0500, Kyle Manna wrote:
>> Improve the interface between platform code/board files to the TPS65910
> Again, *always* CC maintainers on patches.

This was an oversight on my part.

>
>> regulators.  The TWL4030/6030 code was used as an example interface.
> This isn't a good sign...

I've reviewed other PMICs (ie Wolfson Micro ;) ) and will post an 
updated series with an interface similar to what is used there.  The new 
approach makes more sense and keeps the code/patch small.

>
>> This improved interface will allow use of the regulators without
>> specifying all the constraints. Also gets rid of an assumption that
>> the platform pass in an array of correct size and was unchecked.
> You've not described the changes between the two interfaces.  Note that
> empty constraints should be absolutely fine with the API.
>
>> +	if (init_data->constraints.name)
>> +		pmic->desc[i].name = init_data->constraints.name;
>> +	else
>> +		pmic->desc[i].name = info[i].name;
> No, this is broken.  The name of the regulator is a fixed property of
> the device and isn't something that ought to be overridden per system.

Understood.

>
>> +	/* TPS65910 and TPS65911 Regulators */
>> +	rdev = add_regulator(pmic, info, TPS65910_REG_VRTC,
>> +			pmic_plat_data->vrtc);
>> +	if (IS_ERR(rdev))
>> +		return PTR_ERR(rdev);
>> +	rdev = add_regulator(pmic, info, TPS65910_REG_VIO,
>> +			pmic_plat_data->vio);
>> +
>> +	if (IS_ERR(rdev))
>> +		return PTR_ERR(rdev);
>> +
>> +	rdev = add_regulator(pmic, info, TPS65910_REG_VDD1,
>> +			pmic_plat_data->vdd1);
>> +	if (IS_ERR(rdev))
>> +		return PTR_ERR(rdev);
> This looks like a regression - we've gone from looping over an array
> which is nice and simple to explicit code for each individual regulator
> giving us lots of repetitive code...

Will be revised.
>> -err_unregister_regulator:
>> -	while (--i>= 0)
>> -		regulator_unregister(pmic->rdev[i]);
>> -	kfree(pmic->rdev);
> ...and loosing all our cleanup if things go wrong which isn't great
> either.


^ permalink raw reply

* Re: [PATCH 11/X] uprobes: x86: introduce xol_was_trapped()
From: Oleg Nesterov @ 2011-10-24 16:07 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Peter Zijlstra, Ingo Molnar, Steven Rostedt, Linux-mm,
	Arnaldo Carvalho de Melo, Linus Torvalds, Jonathan Corbet,
	Masami Hiramatsu, Hugh Dickins, Christoph Hellwig,
	Ananth N Mavinakayanahalli, Thomas Gleixner, Andi Kleen,
	Andrew Morton, Jim Keniston, Roland McGrath, LKML
In-Reply-To: <20111024145531.GB31435@linux.vnet.ibm.com>

On 10/24, Srikar Dronamraju wrote:
>
> > diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h
> > index 1c30cfd..f0fbdab 100644
> > --- a/arch/x86/include/asm/uprobes.h
> > +++ b/arch/x86/include/asm/uprobes.h
> > @@ -39,6 +39,7 @@ struct uprobe_arch_info {
> >
> >  struct uprobe_task_arch_info {
> >  	unsigned long saved_scratch_register;
> > +	unsigned long saved_trap_no;
> >  };
> >  #else
> >  struct uprobe_arch_info {};
>
>
> one nit
> I had to add saved_trap_no to #else part (i.e uprobe_arch_info ).

Yes, thanks, I didn't notice this is for X86_64 only.

And just in case, please feel free to rename/redo/whatever.

Oleg.


^ permalink raw reply

* Re: [PATCH 11/X] uprobes: x86: introduce xol_was_trapped()
From: Oleg Nesterov @ 2011-10-24 16:07 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Peter Zijlstra, Ingo Molnar, Steven Rostedt, Linux-mm,
	Arnaldo Carvalho de Melo, Linus Torvalds, Jonathan Corbet,
	Masami Hiramatsu, Hugh Dickins, Christoph Hellwig,
	Ananth N Mavinakayanahalli, Thomas Gleixner, Andi Kleen,
	Andrew Morton, Jim Keniston, Roland McGrath, LKML
In-Reply-To: <20111024145531.GB31435@linux.vnet.ibm.com>

On 10/24, Srikar Dronamraju wrote:
>
> > diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h
> > index 1c30cfd..f0fbdab 100644
> > --- a/arch/x86/include/asm/uprobes.h
> > +++ b/arch/x86/include/asm/uprobes.h
> > @@ -39,6 +39,7 @@ struct uprobe_arch_info {
> >
> >  struct uprobe_task_arch_info {
> >  	unsigned long saved_scratch_register;
> > +	unsigned long saved_trap_no;
> >  };
> >  #else
> >  struct uprobe_arch_info {};
>
>
> one nit
> I had to add saved_trap_no to #else part (i.e uprobe_arch_info ).

Yes, thanks, I didn't notice this is for X86_64 only.

And just in case, please feel free to rename/redo/whatever.

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [PATCH 1/1] pinctrl/sirf: fix sirfsoc_get_group_pins prototype introduce in 7e570f97
From: Jean-Christophe PLAGNIOL-VILLARD @ 2011-10-24 16:11 UTC (permalink / raw)
  To: linux-arm-kernel

Signed-off-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Stephen Warren <swarren@nvidia.com>
---
 drivers/pinctrl/pinmux-sirf.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/pinctrl/pinmux-sirf.c b/drivers/pinctrl/pinmux-sirf.c
index ba73523..d76cae6 100644
--- a/drivers/pinctrl/pinmux-sirf.c
+++ b/drivers/pinctrl/pinmux-sirf.c
@@ -870,7 +870,7 @@ static const char *sirfsoc_get_group_name(struct pinctrl_dev *pctldev,
 
 static int sirfsoc_get_group_pins(struct pinctrl_dev *pctldev, unsigned selector,
 			       const unsigned **pins,
-			       const unsigned *num_pins)
+			       unsigned *num_pins)
 {
 	if (selector >= ARRAY_SIZE(sirfsoc_pin_groups))
 		return -EINVAL;
-- 
1.7.7

^ permalink raw reply related

* [PATCH] cache align vm_stat
From: Dimitri Sivanich @ 2011-10-24 16:10 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Andrew Morton, Christoph Lameter, David Rientjes, Andi Kleen,
	Mel Gorman

Avoid false sharing of the vm_stat array.

This was found to adversely affect tmpfs I/O performance.

Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
---
 mm/vmstat.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/mm/vmstat.c
===================================================================
--- linux.orig/mm/vmstat.c
+++ linux/mm/vmstat.c
@@ -78,7 +78,7 @@ void vm_events_fold_cpu(int cpu)
  *
  * vm_stat contains the global counters
  */
-atomic_long_t vm_stat[NR_VM_ZONE_STAT_ITEMS];
+atomic_long_t vm_stat[NR_VM_ZONE_STAT_ITEMS] __cacheline_aligned_in_smp;
 EXPORT_SYMBOL(vm_stat);
 
 #ifdef CONFIG_SMP

^ permalink raw reply

* [PATCH] cache align vm_stat
From: Dimitri Sivanich @ 2011-10-24 16:10 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Andrew Morton, Christoph Lameter, David Rientjes, Andi Kleen,
	Mel Gorman

Avoid false sharing of the vm_stat array.

This was found to adversely affect tmpfs I/O performance.

Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
---
 mm/vmstat.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/mm/vmstat.c
===================================================================
--- linux.orig/mm/vmstat.c
+++ linux/mm/vmstat.c
@@ -78,7 +78,7 @@ void vm_events_fold_cpu(int cpu)
  *
  * vm_stat contains the global counters
  */
-atomic_long_t vm_stat[NR_VM_ZONE_STAT_ITEMS];
+atomic_long_t vm_stat[NR_VM_ZONE_STAT_ITEMS] __cacheline_aligned_in_smp;
 EXPORT_SYMBOL(vm_stat);
 
 #ifdef CONFIG_SMP

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips
From: Jan Kiszka @ 2011-10-24 16:10 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Avi Kivity, Marcelo Tosatti, kvm
In-Reply-To: <20111024160526.GA30385@redhat.com>

On 2011-10-24 18:05, Michael S. Tsirkin wrote:
>> This is what I have in mind:
>>  - devices set PBA bit if MSI message cannot be sent due to mask (*)
>>  - core checks&clears PBA bit on unmask, injects message if bit was set
>>  - devices clear PBA bit if message reason is resolved before unmask (*)
> 
> OK, but practically, when exactly does the device clear PBA?

Consider a network adapter that signals messages in a RX ring: If the
corresponding vector is masked while the guest empties the ring, I
strongly assume that the device is supposed to take back the pending bit
in that case so that there is no interrupt inject on a later vector
unmask operation.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply

* Re: [lm-sensors] [PATCH 5/6] IIO:hwmon interface client driver.
From: Guenter Roeck @ 2011-10-24 16:10 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-kernel@vger.kernel.org, linux-iio@vger.kernel.org,
	linus.ml.walleij@gmail.com, zdevai@gmail.com,
	linux@arm.linux.org.uk, arnd@arndb.de,
	broonie@opensource.wolfsonmicro.com, gregkh@suse.de,
	lm-sensors@lm-sensors.org, khali@linux-fr.org,
	thomas.petazzoni@free-electrons.com,
	maxime.ripard@free-electrons.com
In-Reply-To: <4EA58B39.8030608@cam.ac.uk>

On Mon, 2011-10-24 at 11:58 -0400, Jonathan Cameron wrote:
> On 10/24/11 16:39, Guenter Roeck wrote:
> > On Mon, 2011-10-24 at 06:09 -0400, Jonathan Cameron wrote:
> > [ ... ]
> >>>>> +/*
> >>>>> + * Assumes that IIO and hwmon operate in the same base units.
> >>>>> + * This is supposed to be true, but needs verification for
> >>>>> + * new channel types.
> >>>>> + */
> >>>>> +static ssize_t iio_hwmon_read_val(struct device *dev,
> >>>>> +				  struct device_attribute *attr,
> >>>>> +				  char *buf)
> >>>>> +{
> >>>>> +	long result;
> >>>>> +	int val, ret, scaleint, scalepart;
> >>>>> +	struct sensor_device_attribute *sattr = to_sensor_dev_attr(attr);
> >>>>> +	struct iio_hwmon_state *state = dev_get_drvdata(dev);
> >>>>> +
> >>>>> +	/*
> >>>>> +	 * No locking between this pair, so theoretically possible
> >>>>> +	 * the scale has changed.
> >>>>> +	 */
> >>>>> +	ret = iio_read_channel_raw(state->channels[sattr->index],
> >>>>> +				   &val);
> >>>>> +	if (ret < 0)
> >>>>> +		return ret;
> >>>>> +
> >>>>> +	ret = iio_read_channel_scale(state->channels[sattr->index],
> >>>>> +				     &scaleint, &scalepart);
> >>>>> +	if (ret < 0)
> >>>>> +		return ret;
> >>>>> +	switch (ret) {
> >>>>> +	case IIO_VAL_INT:
> >>>>> +		result = val * scaleint;
> >>>>> +		break;
> >>>>> +	case IIO_VAL_INT_PLUS_MICRO:
> >>>>> +		result = (long)val * (long)scaleint +
> >>>>> +			(long)val * (long)scalepart / 1000000L;
> >>>>> +		break;
> >>>>> +	case IIO_VAL_INT_PLUS_NANO:
> >>>>> +		result = (long)val * (long)scaleint +
> >>>>> +			(long)val * (long)scalepart / 1000000000L;
> >>>>> +		break;
> >>>>
> >>>> Still easy to imagine that val * scalepart gets larger than 2147483647L
> >>>> (on machines where sizeof(long) = 4) ... it will already happen if the
> >>>> result of (val * scalepart / 1000000000) is larger than 2. 
> >>> Good point.  I really ought to have done the calcs.
> >>> If we have maximum possible value in here things will be ugly.
> >>>
> >>> Worst case is scalepart is 9999999999. (could be done as 1 - 0.000000001
> >>> which would be nicer, but we don't specify a preference - from this
> >>> discussion I am suspecting we should!)
> >>>
> >>> Looks like 64 bits is going to be a requirement as you say.
> >>>>
> >>>> What value range do you expect to see here ?
> >>>>
> >>>> If (val * scaleint) is already the milli-unit, scalepart would possibly
> >>>> only address fractions of milli-units. If so, the result of (val *
> >>>> scalepart / 1000000000L) might always be smaller than 1, ie 0. 
> >>> It certainly should be.
> >>>> If so, for the calculation to have any value, you might be better off using
> >>>> DIV_ROUND_CLOSEST(val * scalepart, 1000000000L).
> >>> Good idea.
> >>>>
> >>>> I am a bit confused by this anyway. Since hwmon in general reports
> >>>> milli-units, VAL_INT appears to reflect milli-units, VAL_INT_PLUS_MICRO
> >>>> really means nano-units, and IIO_VAL_INT_PLUS_NANO really means
> >>>> pico-units. Is this correct ?
> >>> Micro units of the scale factor.
> >>>
> >>> Take my test part a max1363...
> >>> Scale is actually 0.5 so each adc count (e.g. raw value) is 0.5millivolts.
> >>>
> >>> scale int here is 0,
> >>> scale part is 500,000 (so 0.5) and it returns IIO_VAL_INT_PLUS_MICRO.
> >>
> >> How about the following?  It'll be extremely costly, but this isn't exactly
> >> a fast path!
> >>
> >> 	case IIO_VAL_INT_PLUS_MICRO:
> >> 		result = (s64)val * (s64)scaleint +
> >> 			div_s64((s64)val * (s64)scalepart, 1000000LL);
> >> 		break;
> >> 	case IIO_VAL_INT_PLUS_NANO:
> >> 		result = (s64)val * (s64)scaleint +
> >> 			div_s64((s64)val * (s64)scalepart, 1000000000LL);
> >> 		break;
> > 
> > Is div_s64 really necessary, or would
> > 
> > 		result = (long)val * (long)scaleint +
> > 			DIV_ROUND_CLOSEST((s64)val * (s64)scalepart,
> > 					 1000000000LL);
> > 
> > work as well ?
> Not if you want it to compile on arm v5 by the look of it.
> 
> ERROR: "__aeabi_ldivmod" [drivers/staging/iio/iio_hwmon.ko] undefined!
> 
Annoying. Ok, I don't have a better idea than using div_s64. You don't
need s64 for the first part of the operation (val * scaleint), though,
since the result is a long.

Thanks,
Guenter



_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply

* Re: [PATCH 5/6] IIO:hwmon interface client driver.
From: Guenter Roeck @ 2011-10-24 16:10 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-kernel@vger.kernel.org, linux-iio@vger.kernel.org,
	linus.ml.walleij@gmail.com, zdevai@gmail.com,
	linux@arm.linux.org.uk, arnd@arndb.de,
	broonie@opensource.wolfsonmicro.com, gregkh@suse.de,
	lm-sensors@lm-sensors.org, khali@linux-fr.org,
	thomas.petazzoni@free-electrons.com,
	maxime.ripard@free-electrons.com
In-Reply-To: <4EA58B39.8030608@cam.ac.uk>

On Mon, 2011-10-24 at 11:58 -0400, Jonathan Cameron wrote:
> On 10/24/11 16:39, Guenter Roeck wrote:
> > On Mon, 2011-10-24 at 06:09 -0400, Jonathan Cameron wrote:
> > [ ... ]
> >>>>> +/*
> >>>>> + * Assumes that IIO and hwmon operate in the same base units.
> >>>>> + * This is supposed to be true, but needs verification for
> >>>>> + * new channel types.
> >>>>> + */
> >>>>> +static ssize_t iio_hwmon_read_val(struct device *dev,
> >>>>> +				  struct device_attribute *attr,
> >>>>> +				  char *buf)
> >>>>> +{
> >>>>> +	long result;
> >>>>> +	int val, ret, scaleint, scalepart;
> >>>>> +	struct sensor_device_attribute *sattr = to_sensor_dev_attr(attr);
> >>>>> +	struct iio_hwmon_state *state = dev_get_drvdata(dev);
> >>>>> +
> >>>>> +	/*
> >>>>> +	 * No locking between this pair, so theoretically possible
> >>>>> +	 * the scale has changed.
> >>>>> +	 */
> >>>>> +	ret = iio_read_channel_raw(state->channels[sattr->index],
> >>>>> +				   &val);
> >>>>> +	if (ret < 0)
> >>>>> +		return ret;
> >>>>> +
> >>>>> +	ret = iio_read_channel_scale(state->channels[sattr->index],
> >>>>> +				     &scaleint, &scalepart);
> >>>>> +	if (ret < 0)
> >>>>> +		return ret;
> >>>>> +	switch (ret) {
> >>>>> +	case IIO_VAL_INT:
> >>>>> +		result = val * scaleint;
> >>>>> +		break;
> >>>>> +	case IIO_VAL_INT_PLUS_MICRO:
> >>>>> +		result = (long)val * (long)scaleint +
> >>>>> +			(long)val * (long)scalepart / 1000000L;
> >>>>> +		break;
> >>>>> +	case IIO_VAL_INT_PLUS_NANO:
> >>>>> +		result = (long)val * (long)scaleint +
> >>>>> +			(long)val * (long)scalepart / 1000000000L;
> >>>>> +		break;
> >>>>
> >>>> Still easy to imagine that val * scalepart gets larger than 2147483647L
> >>>> (on machines where sizeof(long) = 4) ... it will already happen if the
> >>>> result of (val * scalepart / 1000000000) is larger than 2. 
> >>> Good point.  I really ought to have done the calcs.
> >>> If we have maximum possible value in here things will be ugly.
> >>>
> >>> Worst case is scalepart is 9999999999. (could be done as 1 - 0.000000001
> >>> which would be nicer, but we don't specify a preference - from this
> >>> discussion I am suspecting we should!)
> >>>
> >>> Looks like 64 bits is going to be a requirement as you say.
> >>>>
> >>>> What value range do you expect to see here ?
> >>>>
> >>>> If (val * scaleint) is already the milli-unit, scalepart would possibly
> >>>> only address fractions of milli-units. If so, the result of (val *
> >>>> scalepart / 1000000000L) might always be smaller than 1, ie 0. 
> >>> It certainly should be.
> >>>> If so, for the calculation to have any value, you might be better off using
> >>>> DIV_ROUND_CLOSEST(val * scalepart, 1000000000L).
> >>> Good idea.
> >>>>
> >>>> I am a bit confused by this anyway. Since hwmon in general reports
> >>>> milli-units, VAL_INT appears to reflect milli-units, VAL_INT_PLUS_MICRO
> >>>> really means nano-units, and IIO_VAL_INT_PLUS_NANO really means
> >>>> pico-units. Is this correct ?
> >>> Micro units of the scale factor.
> >>>
> >>> Take my test part a max1363...
> >>> Scale is actually 0.5 so each adc count (e.g. raw value) is 0.5millivolts.
> >>>
> >>> scale int here is 0,
> >>> scale part is 500,000 (so 0.5) and it returns IIO_VAL_INT_PLUS_MICRO.
> >>
> >> How about the following?  It'll be extremely costly, but this isn't exactly
> >> a fast path!
> >>
> >> 	case IIO_VAL_INT_PLUS_MICRO:
> >> 		result = (s64)val * (s64)scaleint +
> >> 			div_s64((s64)val * (s64)scalepart, 1000000LL);
> >> 		break;
> >> 	case IIO_VAL_INT_PLUS_NANO:
> >> 		result = (s64)val * (s64)scaleint +
> >> 			div_s64((s64)val * (s64)scalepart, 1000000000LL);
> >> 		break;
> > 
> > Is div_s64 really necessary, or would
> > 
> > 		result = (long)val * (long)scaleint +
> > 			DIV_ROUND_CLOSEST((s64)val * (s64)scalepart,
> > 					 1000000000LL);
> > 
> > work as well ?
> Not if you want it to compile on arm v5 by the look of it.
> 
> ERROR: "__aeabi_ldivmod" [drivers/staging/iio/iio_hwmon.ko] undefined!
> 
Annoying. Ok, I don't have a better idea than using div_s64. You don't
need s64 for the first part of the operation (val * scaleint), though,
since the result is a long.

Thanks,
Guenter

^ permalink raw reply

* Re: [PATCH] ceph: fix memory leak in async readpages
From: Sage Weil @ 2011-10-24 16:09 UTC (permalink / raw)
  To: Jeff Wu; +Cc: David Flynn, ceph-devel@vger.kernel.org
In-Reply-To: <1319444888.4473.9.camel@cephclient0>

Thanks, Jeff!

Was there a workload that reliably triggered this case?

sage


On Mon, 24 Oct 2011, Jeff Wu wrote:

> Hi ,
> 
> start_read() do twice "kfree(pages)",
> 
> ................
>  out_pages:
>         ceph_release_page_vector(pages, nr_pages);
>         kfree(pages);
> 
> 
> ceph_release_page_vector had kfree pages, continue to do kfree(pages),
> sometimes ,async read ,printk "BUG kmalloc-16: Object already
> free" ,then OOPS.
> 
> Jeff Wu
> 
> ----------------------------------------------------------------------
> void ceph_release_page_vector(struct page **pages, int num_pages)
> {
> 	int i;
> 
> 	for (i = 0; i < num_pages; i++)
> 		__free_pages(pages[i], 0);
> 	kfree(pages);
> }
> 
> 
> $ git diff
> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> index 5ffee90..4144caf 100644
> --- a/fs/ceph/addr.c
> +++ b/fs/ceph/addr.c
> @@ -345,7 +345,6 @@ static int start_read(struct inode *inode, struct
> list_head *page_list, int max)
>  
>  out_pages:
>         ceph_release_page_vector(pages, nr_pages);
> -       kfree(pages);
>  out:
>         ceph_osdc_put_request(req);
>         return ret;
> 
> 
> On Thu, 2011-09-29 at 03:16 +0800, Sage Weil wrote:
> > On Wed, 28 Sep 2011, Sage Weil wrote:
> > > I'll send this upstream with the other patches so it'll hopefully make 
> > > 3.1...
> > 
> > Er, not really.. this'll go upstream during the next merge window, along 
> > with the readahead code.  :)
> > 
> > sage
> > 
> > 
> > > 
> > > Thanks!
> > > sage
> > > 
> > > 
> > > On Wed, 28 Sep 2011, David Flynn wrote:
> > > 
> > > > The finish_read callback introduced in 63c90314546c1cec1f220f6ab24ea
> > > > fails to release the page list allocated in start_read.
> > > > ---
> > > >  fs/ceph/addr.c |    1 +
> > > >  1 files changed, 1 insertions(+), 0 deletions(-)
> > > > 
> > > > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> > > > index e06a322..4144caf 100644
> > > > --- a/fs/ceph/addr.c
> > > > +++ b/fs/ceph/addr.c
> > > > @@ -261,6 +261,7 @@ static void finish_read(struct ceph_osd_request *req, struct ceph_msg *msg)
> > > >  		unlock_page(page);
> > > >  		page_cache_release(page);
> > > >  	}
> > > > +	kfree(req->r_pages);
> > > >  }
> > > >  
> > > >  /*
> > > > -- 
> > > > 1.7.4.1
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > 
> > > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply

* some question about request queue in blok device
From: loody @ 2011-10-24 16:07 UTC (permalink / raw)
  To: kernelnewbies

Dear all:
I found there is a member of request_queue, bounce_pfn and it will be
set up by calling blk_queue_bounce_limit.
it seems someone will use that when calling blk_queue_bounce, but what
and when it will be called?

-- 
Appreciate your help,

^ permalink raw reply

* CloudLinux on Xen
From: R J @ 2011-10-24 16:05 UTC (permalink / raw)
  To: xen-api-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR,
	xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR,
	xen-users-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR

[-- Attachment #1.1: Type: text/plain, Size: 646 bytes --]

Hello List,

I am testing a XCP1 CentOS 5 paravirt vm with CloudLinux.
http://cloudlinux.com/
CloudLinux is a great product for shared hosting and I was evaluation same
on a paravirt guest.

I found a strange thing in XCP. CloudLinux provides xen kernel for DomUs and
after installing the CloudLinux in a DomU it actually got 32 cpus inside.
I had assigned only 2 vCPU to that DomU. To confirm this I started
generating load on that DomU and in no time it ate all my node's CPUs.

Is it an expected behavior that one of your DomU can eat all node with few
modifications in kernel ?
Or is Paravirt not a Hardware Virtualized guest ?

Regards,
R J

[-- Attachment #1.2: Type: text/html, Size: 740 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]

^ permalink raw reply

* Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips
From: Michael S. Tsirkin @ 2011-10-24 16:05 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm
In-Reply-To: <4EA57D8B.7020905@siemens.com>

On Mon, Oct 24, 2011 at 05:00:27PM +0200, Jan Kiszka wrote:
> On 2011-10-24 16:40, Michael S. Tsirkin wrote:
> > On Mon, Oct 24, 2011 at 03:43:53PM +0200, Jan Kiszka wrote:
> >> On 2011-10-24 15:11, Jan Kiszka wrote:
> >>> On 2011-10-24 14:43, Michael S. Tsirkin wrote:
> >>>> On Mon, Oct 24, 2011 at 02:06:08PM +0200, Jan Kiszka wrote:
> >>>>> On 2011-10-24 13:09, Avi Kivity wrote:
> >>>>>> On 10/24/2011 12:19 PM, Jan Kiszka wrote:
> >>>>>>>>
> >>>>>>>> With the new feature it may be worthwhile, but I'd like to see the whole
> >>>>>>>> thing, with numbers attached.
> >>>>>>>
> >>>>>>> It's not a performance issue, it's a resource limitation issue: With the
> >>>>>>> new API we can stop worrying about user space device models consuming
> >>>>>>> limited IRQ routes of the KVM subsystem.
> >>>>>>>
> >>>>>>
> >>>>>> Only if those devices are in the same process (or have access to the
> >>>>>> vmfd).  Interrupt routing together with irqfd allows you to disaggregate
> >>>>>> the device model.  Instead of providing a competing implementation with
> >>>>>> new limitations, we need to remove the limitations of the old
> >>>>>> implementation.
> >>>>>
> >>>>> That depends on where we do the cut. Currently we let the IRQ source
> >>>>> signal an abstract edge on a pre-allocated pseudo IRQ line. But we
> >>>>> cannot build correct MSI-X on top of the current irqfd model as we lack
> >>>>> the level information (for PBA emulation). *)
> >>>>
> >>>>
> >>>> I don't agree here. IMO PBA emulation would need to
> >>>> clear pending bits on interrupt status register read.
> >>>> So clearing pending bits could be done by ioctl from qemu
> >>>> while setting them would be done from irqfd.
> >>>
> >>> How should QEMU know if the reason for "pending" has been cleared at
> >>> device level if the device is outside the scope of QEMU? This model only
> >>> works for PV devices when you agree that spurious IRQs are OK.
> >>>
> >>>>
> >>>>> So we either need to
> >>>>> extend the existing model anyway -- or push per-vector masking back to
> >>>>> the IRQ source. In the latter case, it would be a very good chance to
> >>>>> give up on limited pseudo GSIs with static routes and do MSI messaging
> >>>>> from external IRQ sources to KVM directly.
> >>>>> But all those considerations affect different APIs than what I'm
> >>>>> proposing here. We will always need a way to inject MSIs in the context
> >>>>> of the VM as there will always be scenarios where devices are better run
> >>>>> in that very same context, for performance or simplicity or whatever
> >>>>> reasons. E.g., I could imagine that one would like to execute an
> >>>>> emulated IRQ remapper rather in the hypervisor context than
> >>>>> "over-microkernelized" in a separate process.
> >>>>>
> >>>>> Jan
> >>>>>
> >>>>> *) Realized this while trying to generalize the proposed MSI-X MMIO
> >>>>> acceleration for assigned devices to arbitrary device models, vhost-net,
> >>>>
> >>>> I'm actually working on a qemu patch to get pba emulation working correctly.
> >>>> I think it's doable with existing irqfd.
> >>>
> >>> irqfd has no notion of level. You can only communicate a rising edge and
> >>> then need a side channel for the state of the edge reason.
> >>>
> >>>>
> >>>>> and specifically vfio.
> >>>>
> >>>> Interesting. How would you clear the pseudo interrupt level?
> >>>
> >>> Ideally: not at all (for MSI). If we manage the mask at device level, we
> >>> only need to send the message if there is actually something to deliver
> >>> to the interrupt controller and masked input events would be lost on
> >>> real HW as well.
> >>
> >> This wouldn't work out nicely as well. We rather need a combined model:
> >>
> >> Devices need to maintain the PBA actively, i.e. set & clear them
> >> themselves and do not rely on the core here (with the core being either
> >> QEMU user space or an in-kernel MSI-X MMIO accelerator). The core only
> >> checks the PBA if it is about to deliver some message and refrains from
> >> doing so if the bit became 0 in the meantime (specifically during the
> >> masked period).
> >>
> >> For QEMU device models, that means no additional IOCTLs,
> >> just memory sharing of the PBA which is required anyway.
> > 
> > Sorry, I don't understand the above two paragraphs. Maybe I am
> > confused by terminology here. We really only need to check PBA when it's
> > read.  Whether the message is delivered only depends on the mask bit.
> 
> This is what I have in mind:
>  - devices set PBA bit if MSI message cannot be sent due to mask (*)
>  - core checks&clears PBA bit on unmask, injects message if bit was set
>  - devices clear PBA bit if message reason is resolved before unmask (*)

OK, but practically, when exactly does the device clear PBA?

> The marked (*) lines differ from the current user space model where only
> the core does PBA manipulation (including clearance via a special
> function). Basically, the PBA becomes a communication channel also
> between device and MSI core. And this model also works if core and
> device run in different processes provided they set up the PBA as shared
> memory.
> 
> Jan
> 


> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply

* Re: Converting from Raid 5 to 6
From: Michael Busby @ 2011-10-24 16:03 UTC (permalink / raw)
  To: linux-raid
In-Reply-To: <CADNH=7H90HX58zh8mSH54FdFgh46umEscj89t372PZjj9DnHtg@mail.gmail.com>

should the speed be very slow when doing this progress, its a lot
slower than a normal grow

reshape =  1.2% (25006080/1953513984) finish=12481.8min speed=2574K/sec

On 24 October 2011 15:11, Mathias Burén <mathias.buren@gmail.com> wrote:
> On 24 October 2011 14:11, Michael Busby <michael.a.busby@gmail.com> wrote:
>> At the moment i have a raid5 setup with 5 disks, i am looking to add a
>> 6th disk and change from raid 5 to raid 6
>>
>> having looked at Neil's site i have found the following command, and
>> just want to double check this is still the recommend way of
>> converting
>>
>> mdadm --grow /dev/md0 --level=6 --raid-disks=6 --backup-file=/home/md.backup
>>
>> also would i need to add the extra disk before or after the command?
>>
>> cheers
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
> Hi,
>
> I grew my 6 disk RAID5 to a 7 disk RAID6. First, add the drive. Then
> partition it as required. Then add the drive to the array (I think
> it'll become a spare?). Then you can grow it.
>
> Make sure you're using the latest mdadm tools available.
>
> Regards,
> Mathias
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.