Next.js Server-Side Rendering Timeout Triggers Production Incident
Preface
A few days ago, our company’s platform homepage crashed with a 504 Time-out error. This document records the troubleshooting and resolution process.
Investigation and Resolution
Troubleshooting Process
- Step 1: Test and development environments showed no issues.
- Step 2: Production pipeline checks revealed no recent deployments, ruling out frontend code changes.
- Step 3: While /about-us page loaded normally, some API calls timed out in production. Hypothesis: API timeouts caused server-side requests in getServerSideProps to fail, triggering homepage timeouts.
- Step 4: Local testing confirmed that disabling all getServerSideProps fetch requests restored homepage functionality.
Root Cause
The homepage used SSR via getServerSideProps, which relied on basic fetch requests without timeout handling. Unresponsive APIs caused prolonged server-side execution, exceeding the server’s response threshold and triggering 504 errors.
Optimization Strategies
Proposed Solutions
- Disable SSR: Render entirely via client-side requests.
- Enhance Fetch: Implement manual timeout handling using AbortController and setTimeout.
- Migrate to Axios: Leverage Axios’ built-in timeout configurations for streamlined error handling.
Solution Comparison
- Disabling SSR: Rejected due to risks of client-side rendering inconsistencies and loss of SEO/performance benefits.
- Manual Fetch Modifications: High implementation complexity and compatibility risks (Node.js ≥14.17 required for AbortController).
- Axios Integration: Optimal choice with pre-existing project dependencies and robust timeout management.
Final Implementation
import axios from 'axios';
const ssrAxios = axios.create({
timeout: 3000,
timeoutErrorMessage: 'Request timed out'
});
export default ssrAxios;
export const getServerSideProps = async () => {
const addr = process.env.API_ADDR;
const [
list1Res,
list2Res,
] = await Promise.allSettled([
ssrAxios.get(`${addr}/api/list1`),
ssrAxios.get(`${addr}/api/list2`),
]);
const list1 = handleFetchResult(list1Res, 'list1');
const list2 = handleFetchResult(list2Res, 'list2');
return {
props: {
list1: list1 ?? [],
list2: list2 ?? []
},
};
};
// Helper function to handle fetch results
const handleFetchResult = (result: PromiseSettledResult<any>, key: string) => {
if (result.status === 'rejected') {
console.error(`Failed to fetch ${key}:`, result.reason);
return [];
}
const { data, success } = result.value.data;
return success ? data?.result || [] : [];
};
Key Improvements:
- Server-side requests fall back to empty arrays on timeout.
- Client-side components retry failed requests for redundancy.
Root Cause Analysis
Hypotheses
- API Performance Bottlenecks: Unstable production API response times exacerbated SSR concurrency issues.
- Missing SSR Timeout Mechanism: No timeout thresholds for server-side requests.
- Incomplete Error Handling: Failure to distinguish timeout errors from general exceptions.
Validation
- Simulating API timeouts locally reproduced the 504 error.
- Post-optimization, timeouts triggered graceful fallbacks instead of page crashes.
Conclusion
Next.js requires full completion of getServerSideProps before rendering pages. Unhandled API timeouts extended execution beyond server limits, causing systemic failures.
Improvement Plan
- Hybrid Rendering: Prioritize SSR for initial loads, fall back to CSR on timeouts.
- Observability: Add monitoring for SSR request durations and error rates.
- Infrastructure Decoupling: Reduce dependency on Vercel’s timeout constraints through custom SSR setups.
Key Metrics:
- Target: Maintain SSR’s 1.2s average page load time while achieving 99.9% uptime.
- Mechanism: Auto-switch to CSR after 3s SSR timeout2.
Comments 0
There are no comments yet.