moomou

(ノ≧∇≦)ノ ミ ┸┸

Messing with Nvidia GPU on Headless Linux

Posted at — Dec 29, 2021

I have a PC with Ubuntu server installed and Nvidia GPUs attached. I have Googled on and off for a while trying to learn how to overclock the GPUs without success.

Finally, after gleaning bits and parts from different sources, I got power limit and overclocking working.

Here are the steps required

Virtual Monitor

nvidia-settings need a monitor attached in order to work. For headless linux, one needs a virtual monitor. After messing with xorg conf manually for a while, I found and forked andyljones/coolgpus to automatically attach a virtual monitor for each GPU.

coolgpu was originally developed to override fan curves but I run it in debug only and rely on the default fan management.

Power Limit

With virtual monitor attached, you can now use nvidia-smi to set power limits

sudo nvidia-smi -i 0 -pl 150

This command set the power limit to 150 watt for the GPU attached at index 0.

You can verify the power limit is active via

nvidia-smi

Which should display the latest power limit.

Another good command to check is

nvidia-smi -q -d PERFORMANCE

Which not only tells you whether the GPU has a software power limit, but also whether the GPU performance is currently limited by other factors as hardware.

Overclocking Memory Transfer Rate

Also with virtual monitor attached, you can overclock via nvidia-settings tool

DISPLAY=:0 nvidia-settings -a '[gpu:0]/GPUMemoryTransferRateOffsetAllPerformanceLevels=350

I tried to set overclock settings for specific performance level without any luck and only GPUMemoryTransferRateOffsetAllPerformanceLevels worked for me. The environment variable Display is used to specify the GPU you want overclock - ie :0 for the first GPU, :1 for the 2nd etc.

You can verify the settings took effect by querying the setting

DISPLAY=:0 nvidia-settings -q '[gpu:0]/GPUMemoryTransferRateOffset[2]'