Skip to content

A System Administrator’s Guide to Evaluating Memory and Storage Health

As a system administrator, ensuring the health of your system’s memory (RAM), swap, and storage (SSD/HDD) is critical for maintaining optimal performance. Mismanagement of these resources can lead to slow performance, system crashes, and data loss.

In this guide, we will walk through how to read, understand, and evaluate system memory and storage, ensuring your system runs smoothly.


Understanding System Memory (RAM) Usage

Checking Total, Used, and Available RAM

The first step in evaluating memory is understanding how much RAM is available, used, and cached.

Run the following command to check RAM usage:

sh
free -h

Key Metrics Explained:

  • Total: The total installed RAM.
  • Used: The memory actively used by applications.
  • Available: Free memory + cached memory that can be used when needed.
  • Swap: Virtual memory used when RAM is full.

How to Interpret the Output?

Healthy: "Available" memory is sufficient, and swap usage is low.
⚠️ Warning: Low "Available" memory and moderate swap usage.
Critical: High RAM usage with swap being actively used (system slowdown).

Analyzing Memory Consumption by Processes

To see which processes are consuming the most memory:

sh
ps aux --sort=-%mem | head -10

If a single process is consuming an unusually high percentage of memory, consider optimizing or restarting it.

Detecting Memory Leaks

Memory leaks occur when an application does not release memory properly, causing increasing RAM usage over time.

Check for potential memory leaks using:

sh
smem -rt  # Show RAM usage sorted by total

If a process keeps growing in memory usage without decreasing, it may have a memory leak and should be investigated.


Evaluating Swap Usage

Checking Swap Usage

sh
swapon --summary

If swap usage is consistently high, it means the system is running out of RAM.

Another way to check swap activity is:

sh
vmstat 1 5
  • si (swap-in) and so (swap-out) should be close to zero under normal conditions.
  • High swap activity suggests the system is struggling with insufficient RAM.

When to Take Action?

Healthy: Swap is barely used.
⚠️ Warning: Swap usage is around 30-50%, indicating a need for RAM optimization.
Critical: Swap is heavily used, making the system slow. Consider adding more RAM.


Assessing Storage Health and Performance

Checking Disk Space Usage

sh
df -hT

This command shows:

  • Used %: If above 90%, the system may experience slowdowns.
  • Filesystem Type: ext4, xfs, etc. (Check compatibility with your OS).

Find large directories consuming disk space with:

sh
du -sh /* | sort -hr | head -10

Analyzing Disk I/O Performance

High disk activity can slow down the system. Monitor disk I/O with:

sh
iostat -xz 1 5
  • %util: If close to 100%, the disk is overloaded.
  • await: If high, it indicates slow disk response times.

Another useful command:

sh
iotop -o

It shows processes causing high disk usage, helping identify problematic applications.

Checking for Disk Errors and Bad Sectors

SMART Diagnostics (SSD/HDD Health Monitoring)

sh
smartctl -a /dev/sda

Important SMART Attributes:

  • Reallocated_Sector_Ct: If >10, the disk may fail soon.
  • Current_Pending_Sector: Non-zero value means bad sectors are present.
  • Power_On_Hours: If >50,000 hours, consider replacing the disk.

Scanning for Bad Sectors

sh
badblocks -sv /dev/sda

If errors are found, the disk may be failing and should be replaced.

Checking File System Integrity

To repair filesystem errors:

sh
fsck -f /dev/sda1

Run this command only when the disk is unmounted.


Measuring Overall System Performance

Checking System Load

sh
uptime

The output shows system load averages for 1 minute, 5 minutes, and 15 minutes.

Healthy: Load is below the number of CPU cores.
⚠️ Warning: Load is equal to CPU core count.
Critical: Load is 2x or more than CPU cores, causing slowdowns.

To find the most CPU-intensive processes:

sh
top -o %CPU

When to Take Action?

MetricHealthyWarningCritical
RAM UsageFree RAM availableSwap usage increasingSwap 100%, system slow
Swap UsageLow or zero>30% usage>90% usage
Disk Usage<70% used80-90% used>90% full
Disk Read/Write Latency<5ms5-20ms>20ms
SMART ErrorsNo errorsReallocated sectors <10Reallocated sectors >10
CPU Load<1 per core~1 per core>2 per core

Conclusion: Keeping Your System Healthy

  • Monitor RAM & Swap: Keep an eye on available memory and avoid excessive swap usage.
  • Check Disk Health: Run SMART diagnostics and look for high I/O latency.
  • Analyze Performance Regularly: Use iostat, vmstat, and iotop to prevent system slowdowns.
  • Optimize Resources: Remove unnecessary files, upgrade RAM if needed, and monitor high-memory-consuming processes.

By following these steps, system administrators can ensure their systems remain fast, stable, and efficient.

Would you like additional insights or automation scripts for monitoring? Let me know!