kernelnewbies.kernelnewbies.org archive mirror
 help / color / mirror / Atom feed
* Custom Linux Kernel Scheduler issue
@ 2016-11-24  7:01 Kenneth Adam Miller
  2016-11-24  7:18 ` Greg KH
  0 siblings, 1 reply; 9+ messages in thread
From: Kenneth Adam Miller @ 2016-11-24  7:01 UTC (permalink / raw)
  To: kernelnewbies

Hello,


I have a scheduler issue in two different respects:

1) I have a process that is supposed to tight loop, and it is being
given very very little time on the system. I don't want that - I want
those who would use the processor to be given the resources to run as
fast as they each can.


2) I am seeing with perf that the maximum overhead at each section
does not sum up to be more than 15 percent. Total, probably something
like 18% of cpu time is used, and my binary has rocketed in slowness
from about 2 seconds or less total to several minutes. I think that
the linux scheduler isn't scheduling it, because this process is just
some unit tests that double as benchmarks in that they shm_open a file
and write into it with memcpy's.

Can anybody help tell me what kind of linux kernel configuration could
cause this? The kernel is configured as SMP with preemption possible
as a desktop...

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Custom Linux Kernel Scheduler issue
  2016-11-24  7:01 Custom Linux Kernel Scheduler issue Kenneth Adam Miller
@ 2016-11-24  7:18 ` Greg KH
       [not found]   ` <CAK7rcp9ih36KP5sL3_4QQhKvsofCJ8_um5tSW78jgj4pR42cYA@mail.gmail.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Greg KH @ 2016-11-24  7:18 UTC (permalink / raw)
  To: kernelnewbies

On Thu, Nov 24, 2016 at 02:01:41AM -0500, Kenneth Adam Miller wrote:
> Hello,
> 
> 
> I have a scheduler issue in two different respects:
> 
> 1) I have a process that is supposed to tight loop, and it is being
> given very very little time on the system. I don't want that - I want
> those who would use the processor to be given the resources to run as
> fast as they each can.

What is causing it to give up its timeslice?  Is it waiting for I/O?
Doing something else to sleep?

> 2) I am seeing with perf that the maximum overhead at each section
> does not sum up to be more than 15 percent. Total, probably something
> like 18% of cpu time is used, and my binary has rocketed in slowness
> from about 2 seconds or less total to several minutes.

What changed to make things slower?  Did you change kernel versions or
did you change something in your userspace program?

> I think that
> the linux scheduler isn't scheduling it, because this process is just
> some unit tests that double as benchmarks in that they shm_open a file
> and write into it with memcpy's.

Are you sure that I/O isn't happening here like through swap or
something else?

What does perf say is taking all of your time?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Fwd: Custom Linux Kernel Scheduler issue
       [not found]   ` <CAK7rcp9ih36KP5sL3_4QQhKvsofCJ8_um5tSW78jgj4pR42cYA@mail.gmail.com>
@ 2016-11-24 15:31     ` Kenneth Adam Miller
  2016-11-24 16:13       ` Greg KH
  0 siblings, 1 reply; 9+ messages in thread
From: Kenneth Adam Miller @ 2016-11-24 15:31 UTC (permalink / raw)
  To: kernelnewbies

On Nov 24, 2016 2:18 AM, "Greg KH" <greg@kroah.com> wrote:
>
> On Thu, Nov 24, 2016 at 02:01:41AM -0500, Kenneth Adam Miller wrote:
> > Hello,
> >
> >
> > I have a scheduler issue in two different respects:
> >
> > 1) I have a process that is supposed to tight loop, and it is being
> > given very very little time on the system. I don't want that - I want
> > those who would use the processor to be given the resources to run as
> > fast as they each can.
>
> What is causing it to give up its timeslice?  Is it waiting for I/O?
> Doing something else to sleep?

It's multithreaded, so it reads in a loop in one thread and writes in
another thread. What I saw when I ran strace on it is each process
would run for too long- the program is designed to try and stay out of
the kernel on each side, so it checks some shared variables before it
ever goes.

>
> > 2) I am seeing with perf that the maximum overhead at each section
> > does not sum up to be more than 15 percent. Total, probably something
> > like 18% of cpu time is used, and my binary has rocketed in slowness
> > from about 2 seconds or less total to several minutes.
>
> What changed to make things slower?  Did you change kernel versions or
> did you change something in your userspace program?
>

The kernel versions specifically couldnt have anything to do with it
but it was different kernels. The test runs in less that 2 seconds on
my host. When I copy it to our custom linux, it takes minutes for it
to run. I think it's some extra setting that we're missing while
building the kernel, and I don't know what that is. I got a huge
improvement when I changed the multicore scheduling to allow
preemption "(desktop)" but there's still a problem as I've described
with one of the processes not using the core as it should.

> > I think that
> > the linux scheduler isn't scheduling it, because this process is just
> > some unit tests that double as benchmarks in that they shm_open a file
> > and write into it with memcpy's.
>
> Are you sure that I/O isn't happening here like through swap or
> something else?
>

Well, we're using tmpfs and don't have a disk in the machine, but I
will say this process is using all lot of the address space. One
problem here is that the kernel has more ram than it thinks it does,
but what I want to emphasize is that I haven't changed the program to
allocate any more than it was previously. I'm not sure if that's a
kernel change or some setting, but it went from 85% to 98%. The reason
why is that there is a large latency even without that big program in
there; I can't run my standalone tests in qemu without it also taking
minutes. I understand qemu has to emulate, and that's its not just a
VM, but I'm going from host CPU to guest, and the settings are the
same.

> What does perf say is taking all of your time?

When I ran perf what it appeared to indicate is that the largest
consumer of time was my library, which should be right in either
scenario because it should use stay out of the kernel as I've designed
it. In addition, the work takes place there anyway, so that's right.
What's not right is the fact that the largest percent of time used is
around 15%, and all the others combined don't add up to anything near
100.

>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Fwd: Custom Linux Kernel Scheduler issue
  2016-11-24 15:31     ` Fwd: " Kenneth Adam Miller
@ 2016-11-24 16:13       ` Greg KH
  2016-11-24 16:33         ` Kenneth Adam Miller
  0 siblings, 1 reply; 9+ messages in thread
From: Greg KH @ 2016-11-24 16:13 UTC (permalink / raw)
  To: kernelnewbies

On Thu, Nov 24, 2016 at 10:31:18AM -0500, Kenneth Adam Miller wrote:
> On Nov 24, 2016 2:18 AM, "Greg KH" <greg@kroah.com> wrote:
> >
> > On Thu, Nov 24, 2016 at 02:01:41AM -0500, Kenneth Adam Miller wrote:
> > > Hello,
> > >
> > >
> > > I have a scheduler issue in two different respects:
> > >
> > > 1) I have a process that is supposed to tight loop, and it is being
> > > given very very little time on the system. I don't want that - I want
> > > those who would use the processor to be given the resources to run as
> > > fast as they each can.
> >
> > What is causing it to give up its timeslice?  Is it waiting for I/O?
> > Doing something else to sleep?
> 
> It's multithreaded, so it reads in a loop in one thread and writes in
> another thread. What I saw when I ran strace on it is each process
> would run for too long- the program is designed to try and stay out of
> the kernel on each side, so it checks some shared variables before it
> ever goes.

So locking/cpu contention for those "shared variables" perhaps?

> > > 2) I am seeing with perf that the maximum overhead at each section
> > > does not sum up to be more than 15 percent. Total, probably something
> > > like 18% of cpu time is used, and my binary has rocketed in slowness
> > > from about 2 seconds or less total to several minutes.
> >
> > What changed to make things slower?  Did you change kernel versions or
> > did you change something in your userspace program?
> >
> 
> The kernel versions specifically couldnt have anything to do with it
> but it was different kernels. The test runs in less that 2 seconds on
> my host. When I copy it to our custom linux, it takes minutes for it
> to run. I think it's some extra setting that we're missing while
> building the kernel, and I don't know what that is. I got a huge
> improvement when I changed the multicore scheduling to allow
> preemption "(desktop)" but there's still a problem as I've described
> with one of the processes not using the core as it should.

What do you mean by "custom linux"?  Is this the exact same hardware as
your machine?  Or different?  If so, what is different?  What is
different between the different kernel versions you are using?  Does the
perf output look different from running on the two different machines?
If so, where?

Have you changed the priority levels of your application at all?  Have
you thought about just forcing your app to a specific CPU and getting
the kernel off of that CPU in order so that the kernel isn't even an
option here at all (Linux allows you to do this, details are somewhere
in the documentation, sorry, can't remember off the top of my head...)

But really, you should track down what the differences are between your
two machines/environments, as something is different that is causing the
slow down.

You haven't even said what kernel version you are using, and if you have
any of your own kernel patches in those kernels.

> > > I think that
> > > the linux scheduler isn't scheduling it, because this process is just
> > > some unit tests that double as benchmarks in that they shm_open a file
> > > and write into it with memcpy's.
> >
> > Are you sure that I/O isn't happening here like through swap or
> > something else?
> >
> 
> Well, we're using tmpfs and don't have a disk in the machine, but I
> will say this process is using all lot of the address space. One
> problem here is that the kernel has more ram than it thinks it does,

What do you mean, is this a hardware issue?

> but what I want to emphasize is that I haven't changed the program to
> allocate any more than it was previously. I'm not sure if that's a
> kernel change or some setting, but it went from 85% to 98%.

What exactly went up by 17%?

> The reason
> why is that there is a large latency even without that big program in
> there; I can't run my standalone tests in qemu without it also taking
> minutes. I understand qemu has to emulate, and that's its not just a
> VM, but I'm going from host CPU to guest, and the settings are the
> same.

That doesn't really make much sense, why is qemu even in the picture
here?  And no, qemu doesn't always emulate things, that depends on the
hardware you are running it on, and what type of image you are running
on it.

> > What does perf say is taking all of your time?
> 
> When I ran perf what it appeared to indicate is that the largest
> consumer of time was my library, which should be right in either
> scenario because it should use stay out of the kernel as I've designed
> it. In addition, the work takes place there anyway, so that's right.
> What's not right is the fact that the largest percent of time used is
> around 15%, and all the others combined don't add up to anything near
> 100.

So perhaps you have other processes running on the machine that you are
not noticing that is taking up the time slices?  Are you _sure_ nothing
else is running?

Basically, you have a bunch of variables, and haven't been very specific
with what really is changing, or even being used here, so there's not
much specific that I can think of at the moment.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Fwd: Custom Linux Kernel Scheduler issue
  2016-11-24 16:13       ` Greg KH
@ 2016-11-24 16:33         ` Kenneth Adam Miller
  2016-11-24 18:05           ` Kenneth Adam Miller
  2016-11-24 18:46           ` Greg KH
  0 siblings, 2 replies; 9+ messages in thread
From: Kenneth Adam Miller @ 2016-11-24 16:33 UTC (permalink / raw)
  To: kernelnewbies

On Thu, Nov 24, 2016 at 11:13 AM, Greg KH <greg@kroah.com> wrote:
> On Thu, Nov 24, 2016 at 10:31:18AM -0500, Kenneth Adam Miller wrote:
>> On Nov 24, 2016 2:18 AM, "Greg KH" <greg@kroah.com> wrote:
>> >
>> > On Thu, Nov 24, 2016 at 02:01:41AM -0500, Kenneth Adam Miller wrote:
>> > > Hello,
>> > >
>> > >
>> > > I have a scheduler issue in two different respects:
>> > >
>> > > 1) I have a process that is supposed to tight loop, and it is being
>> > > given very very little time on the system. I don't want that - I want
>> > > those who would use the processor to be given the resources to run as
>> > > fast as they each can.
>> >
>> > What is causing it to give up its timeslice?  Is it waiting for I/O?
>> > Doing something else to sleep?
>>
>> It's multithreaded, so it reads in a loop in one thread and writes in
>> another thread. What I saw when I ran strace on it is each process
>> would run for too long- the program is designed to try and stay out of
>> the kernel on each side, so it checks some shared variables before it
>> ever goes.
>
> So locking/cpu contention for those "shared variables" perhaps?

I don't think that could possibly be it, because the shared variables
are controlled by atomics. It's just some memory operation to check to
see if it needs to go to the kernel, as in is there more data in the
shm region for me to read? If not, I'll go wait on this OS semaphore.
It's lightening fast on my host machine.

>
>> > > 2) I am seeing with perf that the maximum overhead at each section
>> > > does not sum up to be more than 15 percent. Total, probably something
>> > > like 18% of cpu time is used, and my binary has rocketed in slowness
>> > > from about 2 seconds or less total to several minutes.
>> >
>> > What changed to make things slower?  Did you change kernel versions or
>> > did you change something in your userspace program?
>> >
>>
>> The kernel versions specifically couldnt have anything to do with it
>> but it was different kernels. The test runs in less that 2 seconds on
>> my host. When I copy it to our custom linux, it takes minutes for it
>> to run. I think it's some extra setting that we're missing while
>> building the kernel, and I don't know what that is. I got a huge
>> improvement when I changed the multicore scheduling to allow
>> preemption "(desktop)" but there's still a problem as I've described
>> with one of the processes not using the core as it should.
>
> What do you mean by "custom linux"?  Is this the exact same hardware as
> your machine?  Or different?  If so, what is different?  What is
> different between the different kernel versions you are using?  Does the
> perf output look different from running on the two different machines?
> If so, where?

I am building with buildroot a linux that is meant to be really
stripped down and only have the things we want. In my case, the what
the bzImage sees is either what QEMU gives it or what it sees in our
dedicated hardware, with is just off the shelf i7 and other stuff you
get a market - nothing custom in the sense you are thinking. Custom as
in, roll your own linux.

The kernel versions between my host and the target are 3.13.x and
3.14.5x; they don't change so much, and certainly don't affect
performance on their own. I'm missing some setting or something with
how I'm configuring or building linux.

I haven't had a chance to run perf on my host. I can't find what
ubuntu package it is just yet, but I will search for it in a minute. I
have to go somewhere and will be right back immediately.

>
> Have you changed the priority levels of your application at all?  Have
> you thought about just forcing your app to a specific CPU and getting
> the kernel off of that CPU in order so that the kernel isn't even an
> option here at all (Linux allows you to do this, details are somewhere
> in the documentation, sorry, can't remember off the top of my head...)
>

No, that may be it or help though. I thought that binding an
application to a particular cpu had something to do with affinity and
that there was some C api for it or something. That would work for our
particular scenario, and we've even talked about it, I just don't know
how to do it yet.

> But really, you should track down what the differences are between your
> two machines/environments, as something is different that is causing the
> slow down.

True - the kernel configuration is most suspect based on everything I
know. The hardware differences between my host to the target we're
building for is each modern, and well supported by linux. I'm thinking
it absolutely must have something to do with the way I've built linux.

>
> You haven't even said what kernel version you are using, and if you have
> any of your own kernel patches in those kernels.
>

For the target hardware is 3.14.5x, and there aren't any kernel
patches at this time; I've disabled grsec while in the process of
narrowing down what the problem is.

>> > > I think that
>> > > the linux scheduler isn't scheduling it, because this process is just
>> > > some unit tests that double as benchmarks in that they shm_open a file
>> > > and write into it with memcpy's.
>> >
>> > Are you sure that I/O isn't happening here like through swap or
>> > something else?
>> >
>>
>> Well, we're using tmpfs and don't have a disk in the machine, but I
>> will say this process is using all lot of the address space. One
>> problem here is that the kernel has more ram than it thinks it does,
>
> What do you mean, is this a hardware issue?

I don't think it's hardware; we're using this proprietary software
beneath the linux kernel, but it's still ram of course. I can't say
too too much, but what I can say is that while how much linux thinks
it has could be affecting how it behaves, on our end we have the
resources and can just change the configuration to make sure that
linux sees and has enough ram. So that we can test on our end, and
indeed we will.

>
>> but what I want to emphasize is that I haven't changed the program to
>> allocate any more than it was previously. I'm not sure if that's a
>> kernel change or some setting, but it went from 85% to 98%.
>
> What exactly went up by 17%?

Consider the process that I was talking about that is meant to tight
loop and burn on a core to be the "end product process". This is
different from the test benchmarks that I was explaining run so
poorly.

>
>> The reason
>> why is that there is a large latency even without that big program in
>> there; I can't run my standalone tests in qemu without it also taking
>> minutes. I understand qemu has to emulate, and that's its not just a
>> VM, but I'm going from host CPU to guest, and the settings are the
>> same.
>
> That doesn't really make much sense, why is qemu even in the picture
> here?  And no, qemu doesn't always emulate things, that depends on the
> hardware you are running it on, and what type of image you are running
> on it.

Well, when I'm not at work, I have to be able to run the bzImage on
something, and I don't have a dedicated machine. So I run it in QEMU.

>
>> > What does perf say is taking all of your time?
>>
>> When I ran perf what it appeared to indicate is that the largest
>> consumer of time was my library, which should be right in either
>> scenario because it should use stay out of the kernel as I've designed
>> it. In addition, the work takes place there anyway, so that's right.
>> What's not right is the fact that the largest percent of time used is
>> around 15%, and all the others combined don't add up to anything near
>> 100.
>
> So perhaps you have other processes running on the machine that you are
> not noticing that is taking up the time slices?  Are you _sure_ nothing
> else is running?

I'm certain that there are other processes alive, but they are not
using the CPU. This process is the only one running. I even gave qemu
"-smp 4" because I want it to behave as close as possible to what it
would if it were just on the host.

>
> Basically, you have a bunch of variables, and haven't been very specific
> with what really is changing, or even being used here, so there's not
> much specific that I can think of at the moment.
>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Fwd: Custom Linux Kernel Scheduler issue
  2016-11-24 16:33         ` Kenneth Adam Miller
@ 2016-11-24 18:05           ` Kenneth Adam Miller
  2016-11-24 18:47             ` Greg KH
  2016-11-24 18:46           ` Greg KH
  1 sibling, 1 reply; 9+ messages in thread
From: Kenneth Adam Miller @ 2016-11-24 18:05 UTC (permalink / raw)
  To: kernelnewbies

So, I ran perf on my host and it came back far more true. The top consumers
of time were all atomics and some function called sse3, which I believe is
a super fast memcpy implementation provided the the arch. In addition, all
the highest time consumers are within my image- it stayed out of the kernel
as designed and it used additional extensions and features.

I just thought of something-what if there is some kind of page size
difference between my host and my Linux kernel causing the performance
problems?

On Nov 24, 2016 11:33 AM, "Kenneth Adam Miller" <kennethadammiller@gmail.com>
wrote:
>
> On Thu, Nov 24, 2016 at 11:13 AM, Greg KH <greg@kroah.com> wrote:
> > On Thu, Nov 24, 2016 at 10:31:18AM -0500, Kenneth Adam Miller wrote:
> >> On Nov 24, 2016 2:18 AM, "Greg KH" <greg@kroah.com> wrote:
> >> >
> >> > On Thu, Nov 24, 2016 at 02:01:41AM -0500, Kenneth Adam Miller wrote:
> >> > > Hello,
> >> > >
> >> > >
> >> > > I have a scheduler issue in two different respects:
> >> > >
> >> > > 1) I have a process that is supposed to tight loop, and it is being
> >> > > given very very little time on the system. I don't want that - I
want
> >> > > those who would use the processor to be given the resources to run
as
> >> > > fast as they each can.
> >> >
> >> > What is causing it to give up its timeslice?  Is it waiting for I/O?
> >> > Doing something else to sleep?
> >>
> >> It's multithreaded, so it reads in a loop in one thread and writes in
> >> another thread. What I saw when I ran strace on it is each process
> >> would run for too long- the program is designed to try and stay out of
> >> the kernel on each side, so it checks some shared variables before it
> >> ever goes.
> >
> > So locking/cpu contention for those "shared variables" perhaps?
>
> I don't think that could possibly be it, because the shared variables
> are controlled by atomics. It's just some memory operation to check to
> see if it needs to go to the kernel, as in is there more data in the
> shm region for me to read? If not, I'll go wait on this OS semaphore.
> It's lightening fast on my host machine.
>
> >
> >> > > 2) I am seeing with perf that the maximum overhead at each section
> >> > > does not sum up to be more than 15 percent. Total, probably
something
> >> > > like 18% of cpu time is used, and my binary has rocketed in
slowness
> >> > > from about 2 seconds or less total to several minutes.
> >> >
> >> > What changed to make things slower?  Did you change kernel versions
or
> >> > did you change something in your userspace program?
> >> >
> >>
> >> The kernel versions specifically couldnt have anything to do with it
> >> but it was different kernels. The test runs in less that 2 seconds on
> >> my host. When I copy it to our custom linux, it takes minutes for it
> >> to run. I think it's some extra setting that we're missing while
> >> building the kernel, and I don't know what that is. I got a huge
> >> improvement when I changed the multicore scheduling to allow
> >> preemption "(desktop)" but there's still a problem as I've described
> >> with one of the processes not using the core as it should.
> >
> > What do you mean by "custom linux"?  Is this the exact same hardware as
> > your machine?  Or different?  If so, what is different?  What is
> > different between the different kernel versions you are using?  Does the
> > perf output look different from running on the two different machines?
> > If so, where?
>
> I am building with buildroot a linux that is meant to be really
> stripped down and only have the things we want. In my case, the what
> the bzImage sees is either what QEMU gives it or what it sees in our
> dedicated hardware, with is just off the shelf i7 and other stuff you
> get a market - nothing custom in the sense you are thinking. Custom as
> in, roll your own linux.
>
> The kernel versions between my host and the target are 3.13.x and
> 3.14.5x; they don't change so much, and certainly don't affect
> performance on their own. I'm missing some setting or something with
> how I'm configuring or building linux.
>
> I haven't had a chance to run perf on my host. I can't find what
> ubuntu package it is just yet, but I will search for it in a minute. I
> have to go somewhere and will be right back immediately.
>
> >
> > Have you changed the priority levels of your application at all?  Have
> > you thought about just forcing your app to a specific CPU and getting
> > the kernel off of that CPU in order so that the kernel isn't even an
> > option here at all (Linux allows you to do this, details are somewhere
> > in the documentation, sorry, can't remember off the top of my head...)
> >
>
> No, that may be it or help though. I thought that binding an
> application to a particular cpu had something to do with affinity and
> that there was some C api for it or something. That would work for our
> particular scenario, and we've even talked about it, I just don't know
> how to do it yet.
>
> > But really, you should track down what the differences are between your
> > two machines/environments, as something is different that is causing the
> > slow down.
>
> True - the kernel configuration is most suspect based on everything I
> know. The hardware differences between my host to the target we're
> building for is each modern, and well supported by linux. I'm thinking
> it absolutely must have something to do with the way I've built linux.
>
> >
> > You haven't even said what kernel version you are using, and if you have
> > any of your own kernel patches in those kernels.
> >
>
> For the target hardware is 3.14.5x, and there aren't any kernel
> patches at this time; I've disabled grsec while in the process of
> narrowing down what the problem is.
>
> >> > > I think that
> >> > > the linux scheduler isn't scheduling it, because this process is
just
> >> > > some unit tests that double as benchmarks in that they shm_open a
file
> >> > > and write into it with memcpy's.
> >> >
> >> > Are you sure that I/O isn't happening here like through swap or
> >> > something else?
> >> >
> >>
> >> Well, we're using tmpfs and don't have a disk in the machine, but I
> >> will say this process is using all lot of the address space. One
> >> problem here is that the kernel has more ram than it thinks it does,
> >
> > What do you mean, is this a hardware issue?
>
> I don't think it's hardware; we're using this proprietary software
> beneath the linux kernel, but it's still ram of course. I can't say
> too too much, but what I can say is that while how much linux thinks
> it has could be affecting how it behaves, on our end we have the
> resources and can just change the configuration to make sure that
> linux sees and has enough ram. So that we can test on our end, and
> indeed we will.
>
> >
> >> but what I want to emphasize is that I haven't changed the program to
> >> allocate any more than it was previously. I'm not sure if that's a
> >> kernel change or some setting, but it went from 85% to 98%.
> >
> > What exactly went up by 17%?
>
> Consider the process that I was talking about that is meant to tight
> loop and burn on a core to be the "end product process". This is
> different from the test benchmarks that I was explaining run so
> poorly.
>
> >
> >> The reason
> >> why is that there is a large latency even without that big program in
> >> there; I can't run my standalone tests in qemu without it also taking
> >> minutes. I understand qemu has to emulate, and that's its not just a
> >> VM, but I'm going from host CPU to guest, and the settings are the
> >> same.
> >
> > That doesn't really make much sense, why is qemu even in the picture
> > here?  And no, qemu doesn't always emulate things, that depends on the
> > hardware you are running it on, and what type of image you are running
> > on it.
>
> Well, when I'm not at work, I have to be able to run the bzImage on
> something, and I don't have a dedicated machine. So I run it in QEMU.
>
> >
> >> > What does perf say is taking all of your time?
> >>
> >> When I ran perf what it appeared to indicate is that the largest
> >> consumer of time was my library, which should be right in either
> >> scenario because it should use stay out of the kernel as I've designed
> >> it. In addition, the work takes place there anyway, so that's right.
> >> What's not right is the fact that the largest percent of time used is
> >> around 15%, and all the others combined don't add up to anything near
> >> 100.
> >
> > So perhaps you have other processes running on the machine that you are
> > not noticing that is taking up the time slices?  Are you _sure_ nothing
> > else is running?
>
> I'm certain that there are other processes alive, but they are not
> using the CPU. This process is the only one running. I even gave qemu
> "-smp 4" because I want it to behave as close as possible to what it
> would if it were just on the host.
>
> >
> > Basically, you have a bunch of variables, and haven't been very specific
> > with what really is changing, or even being used here, so there's not
> > much specific that I can think of at the moment.
> >
> > thanks,
> >
> > greg k-h
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20161124/725f06ae/attachment-0001.html 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Fwd: Custom Linux Kernel Scheduler issue
  2016-11-24 16:33         ` Kenneth Adam Miller
  2016-11-24 18:05           ` Kenneth Adam Miller
@ 2016-11-24 18:46           ` Greg KH
  1 sibling, 0 replies; 9+ messages in thread
From: Greg KH @ 2016-11-24 18:46 UTC (permalink / raw)
  To: kernelnewbies

On Thu, Nov 24, 2016 at 11:33:04AM -0500, Kenneth Adam Miller wrote:
> On Thu, Nov 24, 2016 at 11:13 AM, Greg KH <greg@kroah.com> wrote:
> > On Thu, Nov 24, 2016 at 10:31:18AM -0500, Kenneth Adam Miller wrote:
> >> On Nov 24, 2016 2:18 AM, "Greg KH" <greg@kroah.com> wrote:
> >> >
> >> > On Thu, Nov 24, 2016 at 02:01:41AM -0500, Kenneth Adam Miller wrote:
> >> > > Hello,
> >> > >
> >> > >
> >> > > I have a scheduler issue in two different respects:
> >> > >
> >> > > 1) I have a process that is supposed to tight loop, and it is being
> >> > > given very very little time on the system. I don't want that - I want
> >> > > those who would use the processor to be given the resources to run as
> >> > > fast as they each can.
> >> >
> >> > What is causing it to give up its timeslice?  Is it waiting for I/O?
> >> > Doing something else to sleep?
> >>
> >> It's multithreaded, so it reads in a loop in one thread and writes in
> >> another thread. What I saw when I ran strace on it is each process
> >> would run for too long- the program is designed to try and stay out of
> >> the kernel on each side, so it checks some shared variables before it
> >> ever goes.
> >
> > So locking/cpu contention for those "shared variables" perhaps?
> 
> I don't think that could possibly be it, because the shared variables
> are controlled by atomics. It's just some memory operation to check to
> see if it needs to go to the kernel, as in is there more data in the
> shm region for me to read? If not, I'll go wait on this OS semaphore.
> It's lightening fast on my host machine.

Ah, but your "host" and your "test" machines are two totally different
things, as you say below.  So how do you know that memory accesses and
atomic writes/reads are the same?

> >> > > 2) I am seeing with perf that the maximum overhead at each section
> >> > > does not sum up to be more than 15 percent. Total, probably something
> >> > > like 18% of cpu time is used, and my binary has rocketed in slowness
> >> > > from about 2 seconds or less total to several minutes.
> >> >
> >> > What changed to make things slower?  Did you change kernel versions or
> >> > did you change something in your userspace program?
> >> >
> >>
> >> The kernel versions specifically couldnt have anything to do with it
> >> but it was different kernels. The test runs in less that 2 seconds on
> >> my host. When I copy it to our custom linux, it takes minutes for it
> >> to run. I think it's some extra setting that we're missing while
> >> building the kernel, and I don't know what that is. I got a huge
> >> improvement when I changed the multicore scheduling to allow
> >> preemption "(desktop)" but there's still a problem as I've described
> >> with one of the processes not using the core as it should.
> >
> > What do you mean by "custom linux"?  Is this the exact same hardware as
> > your machine?  Or different?  If so, what is different?  What is
> > different between the different kernel versions you are using?  Does the
> > perf output look different from running on the two different machines?
> > If so, where?
> 
> I am building with buildroot a linux that is meant to be really
> stripped down and only have the things we want. In my case, the what
> the bzImage sees is either what QEMU gives it or what it sees in our
> dedicated hardware, with is just off the shelf i7 and other stuff you
> get a market - nothing custom in the sense you are thinking. Custom as
> in, roll your own linux.
> 
> The kernel versions between my host and the target are 3.13.x and
> 3.14.5x; they don't change so much, and certainly don't affect
> performance on their own. I'm missing some setting or something with
> how I'm configuring or building linux.

Those are really old and obsolete kernel versions, not much we can do
with them here :)

> I haven't had a chance to run perf on my host. I can't find what
> ubuntu package it is just yet, but I will search for it in a minute. I
> have to go somewhere and will be right back immediately.
> 
> >
> > Have you changed the priority levels of your application at all?  Have
> > you thought about just forcing your app to a specific CPU and getting
> > the kernel off of that CPU in order so that the kernel isn't even an
> > option here at all (Linux allows you to do this, details are somewhere
> > in the documentation, sorry, can't remember off the top of my head...)
> >
> 
> No, that may be it or help though. I thought that binding an
> application to a particular cpu had something to do with affinity and
> that there was some C api for it or something. That would work for our
> particular scenario, and we've even talked about it, I just don't know
> how to do it yet.
> 
> > But really, you should track down what the differences are between your
> > two machines/environments, as something is different that is causing the
> > slow down.
> 
> True - the kernel configuration is most suspect based on everything I
> know. The hardware differences between my host to the target we're
> building for is each modern, and well supported by linux. I'm thinking
> it absolutely must have something to do with the way I've built linux.
> 
> >
> > You haven't even said what kernel version you are using, and if you have
> > any of your own kernel patches in those kernels.
> >
> 
> For the target hardware is 3.14.5x, and there aren't any kernel
> patches at this time; I've disabled grsec while in the process of
> narrowing down what the problem is.

Woah, grsec does a _lot_ of different things, you have to just not use
it if you wish to try to compare anything.

> >> > > I think that
> >> > > the linux scheduler isn't scheduling it, because this process is just
> >> > > some unit tests that double as benchmarks in that they shm_open a file
> >> > > and write into it with memcpy's.
> >> >
> >> > Are you sure that I/O isn't happening here like through swap or
> >> > something else?
> >> >
> >>
> >> Well, we're using tmpfs and don't have a disk in the machine, but I
> >> will say this process is using all lot of the address space. One
> >> problem here is that the kernel has more ram than it thinks it does,
> >
> > What do you mean, is this a hardware issue?
> 
> I don't think it's hardware; we're using this proprietary software
> beneath the linux kernel, but it's still ram of course. I can't say
> too too much, but what I can say is that while how much linux thinks
> it has could be affecting how it behaves, on our end we have the
> resources and can just change the configuration to make sure that
> linux sees and has enough ram. So that we can test on our end, and
> indeed we will.

Ah, this crazy thing.

You are running two totally different hardware platforms here, with
memory accesses working totally differently between them.  Of course
performance is going to be different, why would you expect it not to be?

So try to compare apples to apples, not apples to "the smell of apples".

Oh, and rip out grsec when doing benchmarks of anything, if you want to
have a chance of comparing kernels.

best of luck,

greg k-h

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Fwd: Custom Linux Kernel Scheduler issue
  2016-11-24 18:05           ` Kenneth Adam Miller
@ 2016-11-24 18:47             ` Greg KH
  2016-11-25  2:23               ` Kenneth Adam Miller
  0 siblings, 1 reply; 9+ messages in thread
From: Greg KH @ 2016-11-24 18:47 UTC (permalink / raw)
  To: kernelnewbies

On Thu, Nov 24, 2016 at 01:05:47PM -0500, Kenneth Adam Miller wrote:
> So, I ran perf on my host and it came back far more true. The top consumers of
> time were all atomics and some function called sse3, which I believe is a super
> fast memcpy implementation provided the the arch. In addition, all the highest
> time consumers are within my image- it stayed out of the kernel as designed and
> it used additional extensions and features.
> 
> I just thought of something-what if there is some kind of page size difference
> between my host and my Linux kernel causing the performance problems?

You tell me, are the page sizes different?  You have said that memory
accesses are different, so of course performance is going to be
different.  To expect otherwise is just crazy :)

good luck!

greg k-h

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Fwd: Custom Linux Kernel Scheduler issue
  2016-11-24 18:47             ` Greg KH
@ 2016-11-25  2:23               ` Kenneth Adam Miller
  0 siblings, 0 replies; 9+ messages in thread
From: Kenneth Adam Miller @ 2016-11-25  2:23 UTC (permalink / raw)
  To: kernelnewbies

Thank you Greg. I got at least my unit tests to execute about as fast as my
host when I turned on KVM support with qemu while testing. I can't test
With the dedicated hardware or software yet, but now I have another test
case to run by changing the provisioned memory.

On Nov 24, 2016 1:47 PM, "Greg KH" <greg@kroah.com> wrote:

> On Thu, Nov 24, 2016 at 01:05:47PM -0500, Kenneth Adam Miller wrote:
> > So, I ran perf on my host and it came back far more true. The top
> consumers of
> > time were all atomics and some function called sse3, which I believe is
> a super
> > fast memcpy implementation provided the the arch. In addition, all the
> highest
> > time consumers are within my image- it stayed out of the kernel as
> designed and
> > it used additional extensions and features.
> >
> > I just thought of something-what if there is some kind of page size
> difference
> > between my host and my Linux kernel causing the performance problems?
>
> You tell me, are the page sizes different?  You have said that memory
> accesses are different, so of course performance is going to be
> different.  To expect otherwise is just crazy :)
>
> good luck!
>
> greg k-h
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20161124/ae5fdb30/attachment.html 

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-11-25  2:23 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-24  7:01 Custom Linux Kernel Scheduler issue Kenneth Adam Miller
2016-11-24  7:18 ` Greg KH
     [not found]   ` <CAK7rcp9ih36KP5sL3_4QQhKvsofCJ8_um5tSW78jgj4pR42cYA@mail.gmail.com>
2016-11-24 15:31     ` Fwd: " Kenneth Adam Miller
2016-11-24 16:13       ` Greg KH
2016-11-24 16:33         ` Kenneth Adam Miller
2016-11-24 18:05           ` Kenneth Adam Miller
2016-11-24 18:47             ` Greg KH
2016-11-25  2:23               ` Kenneth Adam Miller
2016-11-24 18:46           ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).