Whether you're using a Linux control panel or not, often times the resource reporting tools available on the Linux command-line are much more informative and responsive then control panel usage reports. This article will attempt to review some common command-line tools that you can use to review your server resources by connecting to your server using SSH.
Introducing 'top' - Checking Memory and CPU
Of all the tools available in your Linux command-line arsenal for monitoring resource usage on your Linux server, 'top' is probably the most well-known and most-often used. The 'top' command comes pre-installed on almost all Linux OS's, and can be run from an SSH command-line with the following simple command:
# top
or if you're not already running as a privileged user:
$ sudo top
the result will give you something like this:
As you can see, the 'top' program gives you live access to both CPU and Memory resources on your server, as well as what programs are using what amount of resources.
Let's go over this screen and review what exactly each line means. The first line is as follows:
- top - the name of the program running
- 18:25:54 - current system time
- up 14 min - the amount of time since the last system boot
- 2 users - the number of users currently logged in to the system
- load average - CPU load averages for 1 Minute, 5 Minutes, and 15 minutes
- 0.00 - CPU load average over the last 1 Minute
- 0.07 - CPU load average over the last 5 minutes
- 0.11 - CPU load average over the last 15 minutes
Personally, we've always found the load averages to be of particular use here.
The next line covers the kernel's processing table:
- Tasks: 117 total - the number of 'active' processes that can request CPU time
- 1 running - the number of active processes currently utilizing CPU time
- 116 sleeping - the number of active processes not currently utilizing CPU time
- 0 stopped - the number of stopped ('paused') processes
- 0 zombie - the number of zombie ('defunct') processes
Stopped Process: A 'stopped' process is a process that has essentially been 'paused'. The most common cause of a stopped process is a user hitting CTRL + Z
on an actively running foreground process. Another method would be for a user or program to send a kill -STOP $pid
command to the kernel. Stopped processes can be 'un-paused' by sending a 'continue' command to the kernel, such as kill -CONT $pid
.
Zombie Process: A 'zombie' process is a process that has stopped processing but still has an entry in the kernel's process table - either because the process completed or stopped prematurely due to an error.
The next line covers CPU core utilization:
- 0.1 us - percentage of CPU time used by 'user' processes
- 0.1 sy - percentage of CPU time used by 'system' processes
- 0.0 ni - percentage of CPU time used by processes with a positive 'nice' value
- 99.8 id - percentage of 'idle' CPU time (where the CPU cores aren't doing anything at all)
- 0.0 wa - percentage of time the CPU is 'waiting' on I/O (usually your HDD or RAID array)
- 0.0 hi - percentage of CPU time used by Hardware Interrupts
- 0.0 si - percentage of CPU time used by Software Interrupts
- 0.0 st - percentage of time the CPU is waiting for 'Steal Time'
User Processes vs System Processes: The difference between a 'user' process and a 'system' processes on a Unix system is whether or not the process was initiated by the Linux kernel or not. Processes like "go get this data from the hard drives" are system processes, where "run this command" processes are user processes.
Nice Processes: A 'nice' process in Unix is a way of lowering the priority of a specific process. By increasing a processes 'niceness' value you encourage that process to play well with others, and thus have a lower priority. In regards to the 'top' output above, if large amounts of CPU are being taken by 'nice' processes, it just means those processes are low priority.
I/O Wait: If you find that your 'wait' value is high, it almost always means your CPU is consistently waiting on your storage in order to continue processing, and you should consider ways to increase your storage throughput either through SSD's, RAID arrays, or clustering.
Steal Time: Steal time is he amount of time that a virtual CPU waits for a real CPU while the hypervisor is servicing another virtual processor. If your Steal Time is high, your VPS Platform Hardware is too busy and you need to ask your host to move you to a less-busy machine.
Multi-Core Systems: By default, 'top' displays an average of all cores on your system on this line. If you'd like to toggle between the summary and a view of each core, simply hit the number '1' key on your keyboard.
The next line deals with memory utilization:
- 1884136 total - the total amount of memory available on the system
- 922732 free - the total amount of free memory
- 493072 used - the total amount of memory used by the system
- 468332 buff/cache - the total amount of memory used by buffers and cache
Buffers & Cache: Buffers and Cache are mechanisms that help speed up the process of reading from your storage devices (like HDD or RAID arrays). In the world of computers, reading from storage is slow. To help speed things along, the Linux kernel uses 'buffers' to store file system meta-data (like where files are located, what the permissions are, etc) and 'cache' is used to store the files themselves. That way, the next time a process needs to read a file, Linux might have that information in a buffer or cache already and can skip the long, drawn-out process of getting a file from storage.
The next line deals with swap:
- 839676 total - total amount of swap available on the system
- 839676 free - total amount of free swap
- 0 used - total amount of used swap
- 1230288 avail Mem - total amount of memory available for starting new apps without swapping
Swap Memory: Swap memory is space that has been allocated on your storage device (usually your HDD or RAID array) that is used in addition to RAM if your RAM ever gets too full. Swap is important because it keeps your server running, but is extremely slow when compared to actual RAM. If your server gets really slow - especially when you have a lot of visitors to your web site - then your server has probably maxed out it's RAM and is using your swap space to keep from crashing. You'll want to add more RAM to your server to keep it fast.
avail Mem: This particular line item is new as of CentOS 7, so if you're using an older version of 'top' this value might not be present. According to the top's manual, the 'avail Mem' value is:
an estimation of physical memory available for starting new applications, without swapping. Unlike the free field, it attempts to account for readily reclaimable page cache and memory slabs. It is available on kernels 3.14, emulated on kernels 2.6.27+, otherwise the same as free.
After the general information is presented, the next section in 'top' displays resources and CPU utilization on a per-process level. The following line is the headings for the columns being displayed for each process, and what they mean:
- PID - the Process ID number
- USER - the user account the process is running under
- PR - the Priority value of the process
- NI - the Niceness value of the process
- VIRT - the amount of Virtual Memory used by the process
- RES - the amount of Physical Memory used by the process
- SHR - the amount of Shared Memory used by the process
- S - the current status of the process ('S' for sleeping, 'R' for running, etc)
- %CPU - the percentage of CPU the process is taking
- %MEM - the percentage of Memory the process is taking
- TIME+ - the total CPU time that has been given to the process while it's been active
- COMMAND - the name of the process itself - toggle the commands full path by hitting 'c'
PR vs NI: Since both Priority (PR) and Niceness (NI) have to do with the priority of a process, why do we need two columns for this? The full explanation for this is a little complex, but put simply the Priority (PR) value is the value that the Linux Kernel places on a process and the Niceness (NI) value is something that a user can control. The resulting 'actual' priority of a process is something that both the user and the kernel have a say in.
Virtual Memory (VIRT): Virtual Memory includes not only the physical amount of memory being used by the process, but also all code, data, shared libraries, swap memory, and even mapped memory that hasn't actually been allocated from anywhere yet. Put simply, this value is akin to 'the amount of memory that this process wants'.
Physical Memory (RES): Short for 'Resident Memory'. This is the true amount of physical memory that a process is using. Most system administrators consider this the most important value when determining the memory requirements of a specific process.
Shared Memory (SHR): This is the amount of shared memory that has been made available to a specific process, not all of which is 'resident' memory. Specifically, this value represents the amount of memory that could be potentially shared with other processes on the system.
Useful Commands for 'top'
Now that you can read the output from the standard top command, here are some additional commands that are helpful when using top:
q
- hit the 'Q' key to exit out of topShift + M
- sort the process list by memory usage (the %MEM field)k
- kill a process with a specific PID - defaults to process with highest utilizationc
- toggle the full system path to the process executable file1
- press 1 (one) to toggle display for all cores on the system or the averaged
- change the delay of top's refresh rate. Default is '3' for 3 seconds.b
- toggle between highlighting running processes or note
- increment how numeric values are displayed in the process listing, KB is default, but can be chanted to MB, GB, TB, PB, EB, and then goes back to KB.Shift + E
- increment how numbers are displayed in the system values at top. Like above, default is KB, then moves to MB, GB, TB, PB, EB, then goes back to KB.
How to Monitor Your Storage Space on Linux
Now that we've mastered monitoring our CPU and Memory using 'top', let's take a look at how we can monitor our HDD/RAID array using the tools available to us on Linux.
The first command is simple:
# df -h
which will get you output similar to this:
The df
command is a linux command that shows your drive mount points along with their size and usage. Adding the -h
option to the command told df
that I wanted the file sizes to be human-readable - that is, I wanted the sizes to be in MB or GB and not KB. Note that the sizes listed in the above output have either 'M' or 'G' added to them.
I didn't do anything special with the drive mappings for this Linux server, so it only has the root directory (the /
under the 'Mounted on' column) with some standard system mounts. Note that I have a 6.7 GB drive, where Linux itself takes up a tiny 1.7 GB, and I have 5 GB of available space.
That's neat, but I want to free up some space, how can I see where my hard drive space is going? Well, that's what the du
command was made for!
[root@localhost ~]# du -h --max-depth=1 /
Like the df
command, the du
command takes a -h
parameter to make the numeric output more readable. Next, I've limited the depth of the du
command to a single directory with the --max-depth=1
parameter. Last, I told du
to run on the root folder /
. Here's the result:
[root@localhost ~]# du -h --max-depth=1 / 162M /boot 0 /dev 0 /proc 8.4M /run 0 /sys 23M /etc 172M /root 36K /tmp 115M /var 1.1G /usr 0 /home 0 /media 0 /mnt 250M /opt 0 /srv 1.8G / [root@localhost ~]#
So, looks like most of my space is being used in the /usr
directory. That makes sense, because this is a base install of CentOS 7, and the majority of the commands are kept in the /usr
directory.
From here, I could dig down into each directory, and see exactly where all the space on my storage device is going.
Happy serving!