Can I use THP in 'madvise' mode instead of disabling it entirely for databases?

Yes, madvise mode only activates THP for memory regions that explicitly call madvise(MADV_HUGEPAGE). Most databases do not issue this call, so madvise is effectively equivalent to never for them—but always verify with /proc/vmstat thp_fault_alloc after switching.

Does disabling THP affect application performance outside the database?

Potentially yes for JVM workloads or HPC jobs that benefit from THP. On dedicated database servers this is not a concern, but on mixed-workload hosts consider madvise mode or using cgroups memory.thp_disable per service.

Why must 1 GB hugepages be reserved at boot but 2 MB hugepages can be done at runtime?

1 GB hugepages require 1 GB of physically contiguous memory, which can only be guaranteed before the allocator has distributed memory across many small allocations. 2 MB pages are small enough that the compaction subsystem can often assemble them from a running system.

Does the tuned daemon on RHEL/Rocky override my THP settings?

Yes. Profiles like throughput-performance explicitly re-enable THP. Either switch to latency-performance (tuned-adm profile latency-performance), create a custom profile, or disable tuned entirely on dedicated database hosts.

How do I confirm PostgreSQL is actually using hugepages and not silently falling back?

After restarting PostgreSQL, check pg_settings for huge_pages and look for the absence of 'huge pages not supported, using regular pages' in the PostgreSQL log. Also check HugePages_Rsvd in /proc/meminfo—it should show a non-zero reservation matching your shared_buffers allocation.

Hugepages and Transparent Huge Pages

Linux hugepages let the kernel map memory in 2 MB (or 1 GB) chunks instead of the default 4 KB pages. Fewer page-table entries means fewer TLB misses, which matters enormously for workloads that touch large, contiguous memory regions—databases being the classic case. There are two mechanisms: static hugepages (reserved at boot, used via mmap(MAP_HUGETLB) or shared memory) and Transparent Huge Pages (THP), the kernel's attempt to give you the benefit automatically. Understanding when each helps—and when THP actively hurts—is the difference between a fast database and a mysteriously slow one.

How THP Works and Why It Can Hurt

THP is managed by khugepaged, a kernel thread that continuously scans virtual memory, finds 2 MB-aligned regions of 4 KB pages, and collapses them into a single hugepage. It also does the reverse (splits hugepages) when memory pressure demands it. This sounds ideal, but the reality is more complicated.

Allocation latency spikes: Collapsing or splitting hugepages is not free. Under memory pressure, khugepaged activity shows up as latency jitter—exactly what a latency-sensitive database cannot tolerate.
Fragmentation: THP requires 2 MB of physically contiguous memory. On a long-running server, this becomes increasingly hard to satisfy, triggering compaction work that stalls application threads.
Copy-on-write penalty: When a process forks (common in PostgreSQL), a CoW fault on a 2 MB hugepage dirties 2 MB instead of 4 KB.
Database-specific pain: Oracle, MySQL InnoDB, PostgreSQL, Redis, and MongoDB all document THP as a source of latency anomalies. Oracle and Redis explicitly require it to be disabled.

Where THP genuinely helps: batch analytics workloads, HPC jobs, and JVM applications doing large, sequential heap operations—situations where allocations are predictable and latency variance is acceptable.

Checking Current State

Before changing anything, read the current configuration.

cat /sys/kernel/mm/transparent_hugepage/enabled

Output will look like: [always] madvise never. The bracketed value is active. always means THP is on for all anonymous memory. madvise means only for regions that explicitly request it. never disables THP entirely.

cat /sys/kernel/mm/transparent_hugepage/defrag

This controls memory compaction aggressiveness. always here is the most dangerous setting for latency; it blocks allocation until a hugepage can be assembled.

grep -i hugepage /proc/meminfo

Shows static hugepage pool size, usage, and free count. Also shows AnonHugePages (THP in use).

Disabling THP System-Wide for Database Hosts

Runtime changes take effect immediately but don't survive reboots. Make them persistent with a systemd unit.

Runtime change (all distributions)

echo never | sudo tee /sys/kernel/mm/transparent_hugepage/enabled
echo never | sudo tee /sys/kernel/mm/transparent_hugepage/defrag

Persistent via systemd (recommended)

Create a one-shot service that runs before your database starts.

sudo tee /etc/systemd/system/disable-thp.service <<'EOF'
[Unit]
Description=Disable Transparent Huge Pages
DefaultDependencies=no
After=sysinit.target local-fs.target
Before=basic.target

[Service]
Type=oneshot
ExecStart=/bin/sh -c 'echo never > /sys/kernel/mm/transparent_hugepage/enabled'
ExecStart=/bin/sh -c 'echo never > /sys/kernel/mm/transparent_hugepage/defrag'
RemainAfterExit=yes

[Install]
WantedBy=basic.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable --now disable-thp.service

Kernel command-line (optional belt-and-suspenders)

Add transparent_hugepage=never to your bootloader. On systems using GRUB:

# Debian/Ubuntu
sudo sed -i 's/GRUB_CMDLINE_LINUX="/GRUB_CMDLINE_LINUX="transparent_hugepage=never /' /etc/default/grub
sudo update-grub

# Fedora/RHEL/Rocky
sudo grubby --update-kernel=ALL --args="transparent_hugepage=never"

Configuring Static Hugepages for Databases

Static hugepages are reserved at boot (or shortly after) and never swapped. PostgreSQL uses them via shared memory (huge_pages = on); Oracle SGA and MySQL InnoDB buffer pool can also use them directly.

Calculate how many pages you need

For PostgreSQL, the value to cover is shared_buffers. If shared_buffers = 32GB, you need at least 16,384 pages of 2 MB each, plus a small buffer.

# Show current hugepage size (usually 2048 kB = 2 MB)
grep Hugepagesize /proc/meminfo

# Calculate: ceil(shared_buffers_bytes / hugepage_size_bytes)
# Example: 32 GB shared_buffers
python3 -c "import math; print(math.ceil(32*1024**3 / (2*1024**2)))"

Reserve hugepages at runtime

sudo sysctl -w vm.nr_hugepages=16400

The kernel allocates hugepages from contiguous free memory. Do this early after boot before memory becomes fragmented. If the system can't satisfy the full count, it allocates as many as it can—check /proc/meminfo to confirm.

Make it persistent via sysctl

echo 'vm.nr_hugepages = 16400' | sudo tee /etc/sysctl.d/90-hugepages.conf
sudo sysctl --system

Set hugepage limits for the postgres user (PostgreSQL)

PostgreSQL uses shmget(SHM_HUGETLB) to map shared memory into hugepages. The process must have sufficient locked memory limits.

sudo tee /etc/security/limits.d/postgres-hugepages.conf <<'EOF'
postgres soft memlock unlimited
postgres hard memlock unlimited
EOF

Then set huge_pages = on in postgresql.conf. PostgreSQL will fall back to regular pages if hugepages are unavailable—watch for the log message huge pages not supported, using regular pages.

Fedora/RHEL: hugetlbfs mount

Some applications (Oracle, custom C code using MAP_HUGETLB) need the hugetlbfs filesystem mounted.

sudo mkdir -p /dev/hugepages
sudo mount -t hugetlbfs nodev /dev/hugepages

# Persist it
echo 'nodev /dev/hugepages hugetlbfs defaults 0 0' | sudo tee -a /etc/fstab

1 GB hugepages (NUMA-aware servers)

1 GB pages must be reserved at boot time via the kernel command line—they cannot be allocated dynamically.

# Fedora/RHEL/Rocky
sudo grubby --update-kernel=ALL --args="hugepagesz=1G hugepages=32"

# Debian/Ubuntu (edit /etc/default/grub then run update-grub)
# Add to GRUB_CMDLINE_LINUX: hugepagesz=1G hugepages=32

Verification

# Confirm THP is disabled
cat /sys/kernel/mm/transparent_hugepage/enabled
# Expected: always madvise [never]

# Confirm static hugepage pool
grep -E 'HugePages_(Total|Free|Rsvd)' /proc/meminfo

# For PostgreSQL: confirm hugepages are in use after pg restart
sudo -u postgres psql -c "SHOW huge_pages;"
sudo -u postgres psql -c "SELECT name, setting FROM pg_settings WHERE name LIKE '%huge%';"

# Watch khugepaged activity to confirm it's idle
grep thp /proc/vmstat | grep -v ' 0$'

If THP is properly disabled, thp_collapse_alloc and thp_fault_alloc counters in /proc/vmstat should stop incrementing after your database restarts.

Troubleshooting

Hugepages not fully allocated: Memory fragmentation prevents the kernel from satisfying vm.nr_hugepages. Reboot (fragmentation resets) or try echo 1 > /proc/sys/vm/compact_memory to trigger compaction, then re-check HugePages_Total.
PostgreSQL still not using hugepages: Check huge_pages = on is set (not try) and that shared_buffers is less than or equal to the reserved hugepage pool. Verify memlock limits are in effect (ulimit -l as the postgres user).
disable-thp.service fails on boot: Ensure After=sysinit.target is correct for your init ordering. On some minimal images, replace with After=local-fs.target. Check journalctl -u disable-thp.service.
THP reappears after package update: Some tuning packages (like tuned on RHEL with a throughput-performance profile) re-enable THP. Check tuned-adm active and switch to latency-performance or create a custom profile.
NUMA systems: On multi-socket servers, verify hugepages are reserved on each NUMA node with cat /sys/devices/system/node/node*/hugepages/hugepages-2048kB/nr_hugepages. Use vm.nr_hugepages_mempolicy for NUMA-aware allocation.