nginx 502 bad gateway error

Posted by deng liuyan on August 1, 2018

请原谅我接下来可能要用不太平和的语气讲述下这段不太顺心的解bug之旅。

前奏:

​ 我开发的网站之前一切都顺畅,测试环境和release环境都用过很久没有出现如此不稳定,有一段时间没用过后,最近项目上突然急用,当我默认它很稳定,跟领导保证更新一个功能1个小时就完成后,事情就发生了。

​ 接着我开启解bug之旅,然后我用了一天解完,期间领导半个小时过来问一次,问到后面我已经心虚的不敢回答了……

bug描述:

​ 无故会出现无法连接此网站,然后又无故好了,如此循环,可以看到iptables中docker映射端口会无缘无故missing.

​ 无故会出现502 bad gateway,后台看docker logs就是下面这个问题,然后又无故好了,如此循环;

1
2
3
2019/06/26 16:43:11 [error] 457#457: *20 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 10.21.242.68, server: proxy, request: "GET /admin/ HTTP/1.1", upstream: "uwsgi://***", host: "***", referrer: "http://***/"
2019/06/26 16:43:11 [warn] 457#457: *20 upstream server temporarily disabled while reading response header from upstream, client: 10.21.242.68, server: proxy, request: "GET /admin/ HTTP/1.1", upstream: "uwsgi://****", host: "***", referrer: "http://***/"
10.21.242.68 - - [26/Jun/2019:16:43:11 +0800] "GET /admin/ HTTP/1.1" 502 575 "http://***/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36" "-"

尝试解决方法

增加nginx 各种buffer size,增加各种timeout
1
2
3
4
5
6
7
proxy_connect_timeout 180;
proxy_send_timeout 180;
proxy_read_timeout 180;
proxy_buffer_size 8k;
proxy_buffers 4 64k;
proxy_busy_buffers_size 128k;
proxy_temp_file_write_size 128k
增加nginx server 配置里max_fails,fail_timeout
1
2
3
4
upstream backend_server {
    server *** max_fails=2 fail_timeout=10s weight=1;
    server *** max_fails=2 fail_timeout=10s weight=1;
}
增加与后端的长连接
1
2
3
4
5
upstream  backend_server {
    server *** max_fails=2 fail_timeout=10s weight=1;
    server *** max_fails=2 fail_timeout=10s weight=1;
    keepalive 300;    
    }
1
2
3
4
5
6
server {
        location /  {
            proxy_http_version 1.1; // 这两个最好也设置
            proxy_set_header Connection "";
        }
    }
重启docker,nginx;
1
2
3
docker restart *****;

nginx -s reload;
uwsgi .ini buffer-size设为最大;
1
buffer-size     = 65535

重点是以上皆失败。

最终解决方式

引入了传说中重来一次大法,直接在之前的image上重新run了一个docker,一切皆解决了…….哭哭……..