TCP half-closed 小記

今年 HITCON Final Infra 遇到 half-closed TCP connection 太多導致超過主辦方 TCP limit 的情況,當時完全不知道什麼是 half-closed 也不知道怎麼偵測並關掉尚未結束的連線

今天在看 linkerd 遇到相關的 feature ,查一下找到 RFC9293

TCP Peer A                                           TCP Peer B

1.  ESTABLISHED                                          ESTABLISHED

2.  (Close)
    FIN-WAIT-1  --> <SEQ=100><ACK=300><CTL=FIN,ACK>  --> CLOSE-WAIT

3.  FIN-WAIT-2  <-- <SEQ=300><ACK=101><CTL=ACK>      <-- CLOSE-WAIT

4.                                                       (Close)
    TIME-WAIT   <-- <SEQ=300><ACK=101><CTL=FIN,ACK>  <-- LAST-ACK

5.  TIME-WAIT   --> <SEQ=101><ACK=301><CTL=ACK>      --> CLOSED

6.  (2 MSL)
    CLOSED
Figure 12: Normal Close Sequence
TCP Peer A                                           TCP Peer B

1.  ESTABLISHED                                          ESTABLISHED

2.  (Close)                                              (Close)
    FIN-WAIT-1  --> <SEQ=100><ACK=300><CTL=FIN,ACK>  ... FIN-WAIT-1
                <-- <SEQ=300><ACK=100><CTL=FIN,ACK>  <--
                ... <SEQ=100><ACK=300><CTL=FIN,ACK>  -->

3.  CLOSING     --> <SEQ=101><ACK=301><CTL=ACK>      ... CLOSING
                <-- <SEQ=301><ACK=101><CTL=ACK>      <--
                ... <SEQ=101><ACK=301><CTL=ACK>      -->

4.  TIME-WAIT                                            TIME-WAIT
    (2 MSL)                                              (2 MSL)
    CLOSED                                               CLOSED
Figure 13: Simultaneous Close Sequence

還有偵測 half-closed script,去 /proc/net/tcp 抓 connection state

46: 010310AC:9C4C 030310AC:1770 01
|      |      |      |      |   |--> connection state
|      |      |      |      |------> remote TCP port number
|      |      |      |-------------> remote IPv4 address
|      |      |--------------------> local TCP port number
|      |---------------------------> local IPv4 address
|----------------------------------> number of entry

00000150:00000000 01:00000019 00000000
   |        |     |     |       |--> number of unrecovered RTO timeouts
   |        |     |     |----------> number of jiffies until timer expires
   |        |     |----------------> timer_active (see below)
   |        |----------------------> receive-queue
   |-------------------------------> transmit-queue

1000        0 54165785 4 cd1e6040 25 4 27 3 -1
 |          |    |     |    |     |  | |  | |--> slow start size threshold,
 |          |    |     |    |     |  | |  |      or -1 if the threshold
 |          |    |     |    |     |  | |  |      is >= 0xFFFF
 |          |    |     |    |     |  | |  |----> sending congestion window
 |          |    |     |    |     |  | |-------> (ack.quick<<1)|ack.pingpong
 |          |    |     |    |     |  |---------> Predicted tick of soft clock
 |          |    |     |    |     |              (delayed ACK control data)
 |          |    |     |    |     |------------> retransmit timeout
 |          |    |     |    |------------------> location of socket in memory
 |          |    |     |-----------------------> socket reference count
 |          |    |-----------------------------> inode
 |          |----------------------------------> unanswered 0-window probes
 |---------------------------------------------> uid
https://github.com/torvalds/linux/blob/v6.6/Documentation/networking/proc_net_tcp.rst
enum {
	TCP_ESTABLISHED = 1,
	TCP_SYN_SENT,  // 2
	TCP_SYN_RECV,  // 3
	TCP_FIN_WAIT1, // 4
	TCP_FIN_WAIT2, // 5
	TCP_TIME_WAIT, // 6
	TCP_CLOSE,     // 7
	TCP_CLOSE_WAIT,// 8
	TCP_LAST_ACK,  // 9
	TCP_LISTEN,    // 10
	TCP_CLOSING,   // 11 /* Now a valid state */
	TCP_NEW_SYN_RECV, // 12

	TCP_MAX_STATES // 13 /* Leave at the end! */
};
https://github.com/torvalds/linux/blob/98b1cc82c4affc16f5598d4fa14b1858671b2263/include/net/tcp_states.h#L12C1-L27C3

要找 half-closed socket client 抓 st == 08 (TCP_CLOSE_WAIT) 和 server 抓 st == 05(TCP_FIN_WAIT2)

kernel 會處理 half-closed timeout,大約兩分鐘,所以調整 net.ipv4.tcp_fin_timeout 就可以了

#define TCP_TIMEWAIT_LEN (60*HZ) /* how long to wait to destroy TIME-WAIT
				  * state, about 60 seconds	*/
#define TCP_FIN_TIMEOUT	TCP_TIMEWAIT_LEN
                                 /* BSD style FIN_WAIT2 deadlock breaker.
				  * It used to be 3min, new value is 60sec,
				  * to combine FIN-WAIT-2 timeout with
				  * TIME-WAIT timer.
				  */
#define TCP_FIN_TIMEOUT_MAX (120 * HZ) /* max TCP_LINGER2 value (two minutes) */
https://github.com/torvalds/linux/blob/98b1cc82c4affc16f5598d4fa14b1858671b2263/include/net/tcp.h#L124-L132

refs
https://www.excentis.com/blog/tcp-half-close-a-cool-feature-that-is-now-broken/
https://linkerd.io/2.14/tasks/debugging-502s/#half-closed-connection-timeouts
https://gist.github.com/adleong/0203b0864af2c29ddb821dd48f339f49