CPU Performance Analysis in Linux

The CPU is critical in servers used mainly for applications and databases. It is also usually a source of performance bottlenecks. However, high CPU utilization does not always mean that the CPU is doing work; it could also be waiting on another subsystem. When you do performance analysis, always look at the system as a whole, taking care to inspect all subsystems because there may be a cascade effect trickling down that is causing the issue.

Useful commands for performance analysis

uptime

Uptime gives a one-line display of the following information:

  • Current time
  • How long the system has been running
  • How many users are currently logged on
  • System load averages in the past 1, 5, or 15 minutes


System load averages is the average number of processes that are either in a runnable or uninterruptable state. A process in a runnable state is either using the CPU or waiting to use the CPU. A process in uninterruptable state is waiting for some I/O access, e.g. waiting for disk. The averages are taken over the three (3) time intervals. Load averages are not normalized for the number of CPUs in a system, so a load average of one (1) means a single CPU system is loaded all the time while on a four (4) CPU system, it means it was idle 75% of the time.

(Source “man uptime”).

top

The top command displays CPU utilization and which processes may be causing the problem. It shows actual process activity. By default, it sorts the process list and displays the most CPU-intensive tasks running on the server in descending order. It then updates this list every three (3) seconds. You can opt to sort the processes by the different available information labels in the table.

An interesting section of the top command output is the header portion, which shows the CPU statistics. Pay attention to the line that shows CPU state percentages based on the interval since the last refresh. The said line contains the following labeled information:

Note: Where two labels are shown below, those for more recent kernel versions are shown first.
          

      • us, user    : time running un-niced user processes

      • sy, system  : time running kernel processes
      • ni, nice    : time running niced user processes
      • wa, IO-wait : time waiting for I/O completion
      • hi : time spent servicing hardware interrupts
          
      • si : time spent servicing software interrupts

      • st : time stolen from this vm by the hypervisor

(Source “man top”.)

If you see the CPU is busy 100% of the time, but is busy more than say 70% of time in the “wa” state, then the likely cause of the issue is an I/O problem. High “hi” and “si” values are also indicators of intensive I/O process. You can read more about this topic here.

By default, topdisplays average system load for the interval desired for symmetric multiprocessing-based (SMP) systems. If you want to see the load per CPU (core), press “1”. In doing this, you turn a line like this:

%Cpu(s):  8.8 us,  4.7 sy,  0.0 ni, 86.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

into this:

%Cpu0  :  4.3 us,  6.6 sy,  0.0 ni, 89.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

%Cpu1  : 20.6 us,  3.5 sy,  0.0 ni, 75.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

%Cpu2  :  7.9 us,  2.2 sy,  0.0 ni, 89.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

%Cpu3  :  4.2 us,  3.5 sy,  0.0 ni, 92.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

ps

Thepscommand lists all running processes. In the example below, the output is the top ten (10) currently running CPU consumer processes in the system:

ps -eo pcpu,pid,user,args –sort “-%cpu” |head -10

Cache flushes in SMP systems

In SMP environments, there is a concept called CPU affinity wherein you can bind processes to certain CPUs. CPU affinity optimizes CPU cache because it keeps the same process on one CPU instead of moving between processors. If a process hops between CPUs, the cache of the new CPU must be flushed. Having several processes do this therefore causes many flushes to occur. This, in turn, makes an individual process finish longer. This unbound-process scenario is actually difficult to detect, as the CPU load will appear balanced and not necessarily peaking. Use the command taskset to bind processes to CPUs.


Tuning

Always check if CPU is the one causing the performance issue and not one of the other subsystems. If you were able to isolate the point to the processor as the origin of the bottleneck, do the following actions to improve performance:

  • Use the command ps-ef to ensure that no unnecessary applications are running in the background.

  • If you find that there are programs running in the background, terminate them and use the cron command to schedule them to run during off-peak hours.

  • Identify critical and CPU-intensive processes by using the top command and change their priority using the renice command.

  • Depending whether the application is designed to take advantage of multiple processors, it might be better to scale up (bigger CPUs) than to
scale out (more CPUs). This depends on whether or not your application was designed to
effectively take advantage of more processors—e.g. a single-threaded application
would scale better with a faster CPU and not with more CPUs.


  • Make sure you are using the latest drivers and firmware of your hardware and software. Not having either can affect the load they have on the CPU.

More on this at  http://www.redbooks.ibm.com/redpapers/pdfs/redp4285.pdf

See our Knowledgebase for more How-To articles.

Comments are closed.