* problems with slow auditing/journaling after updating to RHEL 6.8
@ 2016-06-10 16:43 Jacobson, Robert C. (GSFC-444.0)[HONEYWELL TECHNOLOGY SOLUTIONS INC]
2016-06-10 17:20 ` Steve Grubb
0 siblings, 1 reply; 2+ messages in thread
From: Jacobson, Robert C. (GSFC-444.0)[HONEYWELL TECHNOLOGY SOLUTIONS INC] @ 2016-06-10 16:43 UTC (permalink / raw)
To: linux-audit@redhat.com
Cc: Hymowitz, Eric B. (GSFC-444.0)[HONEYWELL TECHNOLOGY SOLUTIONS INC]
I guess the first question I should ask is : is this the proper list for questions about problems? If not, then please accept my apology -- and I would greatly appreciate if you could direct me to the proper list.
I recently upgraded a system to RHEL 6.8.
kernel: 2.6.32-642.1.1.el6.x86_64
audit: audit-2.4.5-3.el6.x86_64
My users and I immediately noticed a huge drop in application performance -- and when I investigated further, I found a large amount of CPU wait.
Here's a graph from nagios data via splunk, over the past 48 hours. The system was patched ~1300 (right about the middle of each graph -- the first spike), then rebooted at 1715 (the second spike):
http://i.imgur.com/BuyazLH.jpg
Here's some nmon snapshot during a "problem" period:
http://i.imgur.com/YbQjeK2.jpg
(note: all logical volumes on sda5, dm-9 is the logical volume for audit logs)
If I'm interpreting things correctly, it is due to drastic slow-down of journaling on my audit volume. I don't know if this is a problem with the auditing subsystem, or the disk I/O subsystem, or ... what?
Here's an excerpt from iotop, while I run a process that generates perhaps 1000 audit log messages:
Total DISK READ : 0.00 B/s | Total DISK WRITE : 160.53 K/s
Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 586.11 K/s
PID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
1458 be/3 root 0.00 B/s 0.00 B/s 0.00 % 97.23 % [jbd2/dm-9-8]
24354 be/3 root 0.00 B/s 145.59 K/s 0.00 % 0.54 % auditd
The total amount of data generated in the logs is actually very small -- approximately 1 MB -- so I'm puzzled as to why it would take so long to write it out.
The process that generates most of these audit messages takes 1.1 second to run with auditing off (or on a RHEL 6.7 system with auditing on or off).
With auditing running, it takes 32 seconds to run.
## auditing is off:
$ time nice -n 10 /product/bin/fclient.rb -d 3
Logging in /tmp/ferret.log
real 0m1.113s
user 0m0.435s
sys 0m0.643s
## auditing is on:
$ time nice -n 10 /product/bin/fclient.rb -d 3
Logging in /tmp/ferret.log
real 0m32.088s
user 0m0.394s
sys 0m0.686s
I'll note that the ruby version hasn't changed between 6.7 and 6.8 (still ruby-1.8.7.374-4.el6_6.x86_64)
The system is using CFQ for disk I/O. The filesystem is ext4. Disks are all local SATA disks.
I'd appreciate any guidance you have!
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Robert Jacobson Robert.C.Jacobson@nasa.gov
Lead System Admin Solar Dynamics Observatory (SDO)
Bldg 14, E222 (301) 286-1591
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: problems with slow auditing/journaling after updating to RHEL 6.8
2016-06-10 16:43 problems with slow auditing/journaling after updating to RHEL 6.8 Jacobson, Robert C. (GSFC-444.0)[HONEYWELL TECHNOLOGY SOLUTIONS INC]
@ 2016-06-10 17:20 ` Steve Grubb
0 siblings, 0 replies; 2+ messages in thread
From: Steve Grubb @ 2016-06-10 17:20 UTC (permalink / raw)
To: linux-audit
Cc: Hymowitz, Eric B. (GSFC-444.0)[HONEYWELL TECHNOLOGY SOLUTIONS INC]
On Friday, June 10, 2016 04:43:19 PM Jacobson, Robert C. [HONEYWELL TECHNOLOGY
SOLUTIONS INC] wrote:
> I guess the first question I should ask is : is this the proper list for
> questions about problems? If not, then please accept my apology -- and I
> would greatly appreciate if you could direct me to the proper list.
Yes, this is the correct list.
> I recently upgraded a system to RHEL 6.8.
> kernel: 2.6.32-642.1.1.el6.x86_64
> audit: audit-2.4.5-3.el6.x86_64
>
> My users and I immediately noticed a huge drop in application performance --
> and when I investigated further, I found a large amount of CPU wait.
Look under audit
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/6.8_Release_Notes/new_features_security.html
There was a bug fixed where the guarantee about writing to disk was not being
met. There were two choices, fix the behavior so that you get guaranteed writes
to disk or leave it alone and hope no one ever notices.
This led me to do a performance study during the Christmas break and I posted
the results here:
https://www.redhat.com/archives/linux-audit/2015-December/msg00061.html
What I would recommend is changing flush to INCREMENTAL and freq to a number on
that table that you think is the target performance you need. 100 is a good
all around number.
The result of this performance study then led to a redesign of the logging
code released in 2.5. The 2.5 logging is over 90 times faster except when
using SYNC or DATA settings. Nothing can really help that.
-Steve
> Here's a graph from nagios data via splunk, over the past 48 hours. The
> system was patched ~1300 (right about the middle of each graph -- the first
> spike), then rebooted at 1715 (the second spike):
> http://i.imgur.com/BuyazLH.jpg
>
> Here's some nmon snapshot during a "problem" period:
> http://i.imgur.com/YbQjeK2.jpg
>
> (note: all logical volumes on sda5, dm-9 is the logical volume for audit
> logs)
>
> If I'm interpreting things correctly, it is due to drastic slow-down of
> journaling on my audit volume. I don't know if this is a problem with the
> auditing subsystem, or the disk I/O subsystem, or ... what?
>
> Here's an excerpt from iotop, while I run a process that generates perhaps
> 1000 audit log messages:
>
> Total DISK READ : 0.00 B/s | Total DISK WRITE : 160.53 K/s
> Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 586.11 K/s
> PID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
> 1458 be/3 root 0.00 B/s 0.00 B/s 0.00 % 97.23 % [jbd2/dm-9-8]
> 24354 be/3 root 0.00 B/s 145.59 K/s 0.00 % 0.54 % auditd
>
> The total amount of data generated in the logs is actually very small --
> approximately 1 MB -- so I'm puzzled as to why it would take so long to
> write it out.
>
> The process that generates most of these audit messages takes 1.1 second to
> run with auditing off (or on a RHEL 6.7 system with auditing on or off).
> With auditing running, it takes 32 seconds to run.
>
> ## auditing is off:
> $ time nice -n 10 /product/bin/fclient.rb -d 3
> Logging in /tmp/ferret.log
>
> real 0m1.113s
> user 0m0.435s
> sys 0m0.643s
>
> ## auditing is on:
> $ time nice -n 10 /product/bin/fclient.rb -d 3
> Logging in /tmp/ferret.log
>
> real 0m32.088s
> user 0m0.394s
> sys 0m0.686s
>
> I'll note that the ruby version hasn't changed between 6.7 and 6.8 (still
> ruby-1.8.7.374-4.el6_6.x86_64)
>
> The system is using CFQ for disk I/O. The filesystem is ext4. Disks are
> all local SATA disks.
>
> I'd appreciate any guidance you have!
>
>
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Robert Jacobson Robert.C.Jacobson@nasa.gov
> Lead System Admin Solar Dynamics Observatory (SDO)
> Bldg 14, E222 (301) 286-1591
>
>
>
> --
> Linux-audit mailing list
> Linux-audit@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-audit
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2016-06-10 17:20 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-10 16:43 problems with slow auditing/journaling after updating to RHEL 6.8 Jacobson, Robert C. (GSFC-444.0)[HONEYWELL TECHNOLOGY SOLUTIONS INC]
2016-06-10 17:20 ` Steve Grubb
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).