How to Identify the Cause of UNIX Server Reboots

Discover the method to determine what process is causing server reboots. Learn troubleshooting techniques and expert advice.

Server reboots can be disruptive and puzzling, especially when they occur without a clear reason. System administrators often need to identify the root cause to prevent future occurrences. A question on reader addresses this issue, seeking a way to determine what process is rebooting a server.

Table of Contents

Problem Description
Understanding Server Reboots
Common Causes of Server Reboots
Solution
Monitoring System Logs
Tools for Tracking Processes
Steps to Identify the Reboot Cause
FAQs Related to Server Reboots
Question: How can I prevent unauthorized reboots?
Question: Can hardware issues cause reboots?
Question: What should I do if reboots persist after troubleshooting?
Summary

Problem Description

Our reader inquires about a method to identify which process is responsible for initiating server reboots. This information is crucial for maintaining server stability and ensuring uptime.

Understanding Server Reboots

Unplanned Reboots: These can be caused by hardware failures, power outages, or system crashes.
Planned Reboots: Often a result of system updates, maintenance, or configuration changes.

Common Causes of Server Reboots

Kernel Panics: Critical errors within the kernel forcing a reboot.
Watchdog Timers: Hardware or software watchdogs that reset the system if it becomes unresponsive.
User Commands: Commands like reboot or shutdown issued by users with sufficient privileges.

Solution

To address the problem, one must monitor system logs and utilize tools that track process activities.

Monitoring System Logs

/var/log/messages: Contains general system activity logs.
/var/log/syslog: For Debian-based systems, it holds similar information.
/var/log/auth.log: Logs all authentication and authorization related events, including sudo commands.
/var/log/kern.log: System log file used specifically for kernel-related messages. This includes the output of the dmesg command, which displays the kernel ring buffer. Kernel messages are essential for troubleshooting and monitoring the kernel’s behavior. The file contains logs produced by the kernel and handled by syslog, a standard logging facility.
/var/log/dmesg: Contains kernel ring buffer information. It logs messages displayed on the screen during the system boot process, which includes information about hardware devices that the kernel detects. These messages are stored in the kernel ring buffer, and when new messages arrive, the older ones get overwritten. This file is particularly useful for troubleshooting hardware and boot-related issues.

Tools for Tracking Processes

auditd: The Linux Audit Daemon can be configured to monitor and log specific system calls like reboot or shutdown.
psacct or acct: Utilities that keep track of user activities and system resource consumption.

Steps to Identify the Reboot Cause

Check System Logs: Look for entries just before the reboot timestamp.
Configure auditd: Set up rules to log events related to system reboots.
Analyze Process Accounting: Use lastcomm to see commands executed before the reboot.

FAQs Related to Server Reboots

Question: How can I prevent unauthorized reboots?

Answer: Implement strict user permissions and monitor user activities with tools like auditd.

Question: Can hardware issues cause reboots?

Answer: Yes, faulty hardware or overheating can trigger reboots. Regular maintenance and monitoring are recommended.

Question: What should I do if reboots persist after troubleshooting?

Answer: Consider consulting with a system administrator or seeking professional help.

Summary

Identifying the process causing server reboots involves careful monitoring of system logs and the use of process tracking tools. By following the steps outlined above, system administrators can pinpoint the cause and implement measures to prevent future disruptions.