今年 HITCON Final Infra 遇到 half-closed TCP connection 太多導致超過主辦方 TCP limit 的情況,當時完全不知道什麼是 half-closed 也不知道怎麼偵測並關掉尚未結束的連線
今天在看 linkerd 遇到相關的 feature ,查一下找到 RFC9293
還有偵測 half-closed script,去 /proc/net/tcp 抓 connection state
要找 half-closed socket client 抓 st == 08 (TCP_CLOSE_WAIT) 和 server 抓 st == 05(TCP_FIN_WAIT2)
kernel 會處理 half-closed timeout,大約兩分鐘,所以調整 net.ipv4.tcp_fin_timeout
就可以了
refs
https://www.excentis.com/blog/tcp-half-close-a-cool-feature-that-is-now-broken/
https://linkerd.io/2.14/tasks/debugging-502s/#half-closed-connection-timeouts
https://gist.github.com/adleong/0203b0864af2c29ddb821dd48f339f49