WordPress Website 500 Internal Server Error

Posted by NIYONSHUTI Emmanuel on June 23, 2024
#postmortem-report

WordPress Website 500 Internal Server Error

Issue Summary

Duration of Outage:

Start: June 18, 2024, 6:00 AM CAT

End: June 18, 2024, 10:55 AM CAT

Impact:

The WordPress website, running on a LAMP stack, was completely inaccessible, returning a 500 Internal Server Error. Users were unable to access any content or perform any actions on the site. This outage affected 100% of users, leading to a complete disruption of service.

Root Cause:

A typo in the /var/www/html/wp-settings.php file, where a required PHP file was incorrectly referenced with an extra ‘p’ in the file extension.

Timeline

6:00 AM CAT: Issue detected through monitoring alert indicating a 500 Internal Server Error.

6:05 AM CAT: On-call engineer confirmed the issue and began initial diagnostics using curl.

6:15 AM CAT: Assumed the issue was related to recent WordPress plugin updates; began deactivating plugins.

6:30 AM CAT: Deactivating plugins did not resolve the issue. Further investigation required.

6:45 AM CAT: Issue escalated to the DevOps team for deeper analysis.

7:00 AM CAT: Used strace to attach to the Apache process and identify the cause of the 500 error.

7:30 AM CAT: strace output revealed a typo in the /var/www/html/wp-settings.php file.

8:00 AM CAT: Misleading path: Investigated potential database connection issues.

8:30 AM CAT: Correctly identified and fixed the typo in the PHP file.

9:00 AM CAT: Restarted Apache server to apply the changes.

10:45 AM CAT: Confirmed that services were restored.

10:55 AM CAT: Services fully restored, confirmed by monitoring tools and manual checks.

Root Cause and Resolution

Root Cause:

The issue was caused by a typo in the /var/www/html/wp-settings.php file. The line require_once( ABSPATH . WPINC . ‘/class-wp-locale.phpp’ ); incorrectly referenced the file class-wp-locale.phpp instead of class-wp-locale.php.

Resolution:

Identify the Typo: Used strace to trace system calls and identify the typo in the wp-settings.php file.

Correct the Typo: Edited the wp-settings.php file to correct the typo:

require_once( ABSPATH . WPINC . ‘/class-wp-locale.php’ ); 

Restart Apache: Restarted the Apache server to apply the changes and restore service.

Corrective and Preventative Measures

Improvements:

Code Review: Implement a more thorough code review process to catch typos and other errors before deployment.

Error Logging: Enhance error logging to provide clearer error messages that can help quickly identify issues like typos.

Automation: Automate checks for common configuration errors and typos.

Tasks:

Enhance Code Review Process: Implement peer reviews and automated linting tools to catch errors in code.

Improve Error Logging:

Configure Apache and PHP to provide detailed error logs.

Implement centralized logging to make it easier to track and analyze errors.

Automate Checks:

Develop scripts to check for common configuration errors.

Integrate these scripts into the CI/CD pipeline.

Documentation and Training: Update documentation on debugging practices and provide training sessions for the team on using tools like strace.

Published on June 23, 2024
Share:

Want to discuss this post?

This post was originally published on Medium.