Linux Kernel Tuning – Network


:Mr.zhou  阅读: 1,333 次

Linux 常用网络内核调试项

TCP 三次握手建立连接,四次握手断开连接:

tcp

TCP Status:

LISTEN

(server) represents waiting for a connection request from any remote TCP and port.

SYN-SENT

(client) represents waiting for a matching connection request after having sent a connection request.

SYN-RECEVIED

(server) represents waiting for a confirming connection request acknowledgment after having both received and sent a connection request.

ESTABLISHED

(both server and client) represents an open connection, data received can be delivered to the user. The normal state for the data transfer phase of the connection.

FIN-WAIT-1

(both server and client) represents waiting for a connection termination request from the remote TCP, or an acknowledgment of the connection termination request previously sent.

FIN-WAIT-2

(both server and client) represents waiting for a connection termination request from the remote TCP.

CLOSE-WAIT

(both server and client) represents waiting for a connection termination request from the local user.

CLOSING

(both server and client) represents waiting for a connection termination request acknowledgment from the remote TCP.

LAST-ACK

(both server and client) represents waiting for an acknowledgment of the connection termination request previously sent to the remote TCP (which includes an acknowledgment of its connection termination request).

TIME-WAIT

(either server or client) represents waiting for enough time to pass to be sure the remote TCP received the acknowledgment of its connection termination request. [According to RFC 793 a connection can stay in TIME-WAIT for a maximum of four minutes known as two MSL (maximum segment lifetime).]

CLOSED

(both server and client) represents no connection state at all.

Linux Kernel Parameters:

tcp_timestamps = BOOLEAN

Enable timestamps as defined in RFC1323.

Example:

net.ipv4.tcp_timestamps = 0

tcp_window_scaling – BOOLEAN

Enable window scaling as defined in RFC1323.

Example:

net.ipv4.tcp_window_scaling = 1

tcp_mem – vector of 3 INTEGERs: min, pressure, max

min: below this number of pages TCP is not bothered about its memory appetite.

pressure: when amount of memory allocated by TCP exceeds this number of pages, TCP moderates its memory consumption and enters memory pressure mode, which is exited when memory consumption falls under “min”.

max: number of pages allowed for queueing by all TCP sockets.

Defaults are calculated at boot time from amount of available memory.

tcp_rmem – vector of 3 INTEGERs: min, default, max

min: Minimal size of receive buffer used by TCP sockets.

It is guaranteed to each TCP socket, even under moderate memory pressure.

Default: 1 page

default: initial size of receive buffer used by TCP sockets.

This value overrides net.core.rmem_default used by other protocols.

Default: 87380 bytes.

This value results in window of 65535 with default setting of tcp_adv_win_scale and tcp_app_win:0 and a bit less for default tcp_app_win.

See below about these variables.

max: maximal size of receive buffer allowed for automatically selected receiver buffers for TCP socket.

This value does not override net.core.rmem_max. Calling setsockopt() with SO_RCVBUF disables automatic tuning of that socket’s receive buffer size, in which case this value is ignored.

Default: between 87380B and 6MB, depending on RAM size.

Example:

net.ipv4.tcp_rmem = 4096 87380 16777216

net.core.rmem_max = 16777216

tcp_wmem – vector of 3 INTEGERs: min, default, max

min: Amount of memory reserved for send buffers for TCP sockets.

Each TCP socket has rights to use it due to fact of its birth.

Default: 1 page

Default: initial size of send buffer used by TCP sockets.

This value overrides net.core.wmem_default used by other protocols.

It is usually lower than net.core.wmem_default.

Default: 16K

max: Maximal amount of memory allowed for automatically tuned send buffers for TCP sockets.

This value does not override net.core.wmem_max. Calling setsockopt() with SO_SNDBUF disables automatic tuning of that socket’s send buffer size, in which case this value is ignored.

Default: between 64K and 4MB, depending on RAM size.

Example:

net.ipv4.tcp_wmem = 4096 16384 16777216

net.core.wmem_max = 16777216

tcp_tw_recycle – BOOLEAN

Enable fast recycling TIME-WAIT sockets.

Default value is 0.

It should not be changed without advice/request of technical experts.

Example:

net.ipv4.tcp_tw_recycle = 1

tcp_tw_reuse – BOOLEAN

Allow to reuse TIME-WAIT sockets for new connections when it is safe from protocol viewpoint.

Default value is 0.

It should not be changed without advice/request of technical experts.

Example:

net.ipv4.tcp_tw_reuse = 1

tcp_fin_timeout – INTEGER

The length of time an orphaned (no longer referenced by any application) connection will remain in the FIN_WAIT_2 state before it is aborted at the local end.

While a perfectly valid “receive only” state for an un-orphaned connection, an orphaned connection in FIN_WAIT_2 state could otherwise wait forever for the remote to close its end of the connection.

Cf. tcp_max_orphans

Default: 60 seconds

Example:

net.ipv4.tcp_fin_timeout = 15

tcp_max_tw_buckets – INTEGER

Maximal number of timewait sockets held by system simultaneously.

If this number is exceeded time-wait socket is immediately destroyed and warning is printed.

This limit exists only to prevent simple DoS attacks, you _must_ not lower the limit artificially, but rather increase it (probably, after increasing installed memory), if network conditions require more than default value.

Example:

net.ipv4.tcp_max_tw_buckets = 262144

somaxconn – INTEGER

Limit of socket listen() backlog, known in userspace as SOMAXCONN.

Defaults to 128.

See also tcp_max_syn_backlog for additional tuning for TCP sockets.

Example:

net.core.somaxconn = 20480

tcp_max_syn_backlog – INTEGER

Maximal number of remembered connection requests, which have not received an acknowledgment from connecting client.

The minimal value is 128 for low memory machines, and it will increase in proportion to the memory of machine.

If server suffers from overload, try increasing this number.

Example:

net.ipv4.tcp_max_syn_backlog = 20480

tcp_syncookies – BOOLEAN

Only valid when the kernel was compiled with CONFIG_SYN_COOKIES Send out syncookies when the syn backlog queue of a socket overflows.

This is to prevent against the common ‘SYN flood attack’

Default: 1

Note, that syncookies is fallback facility.

It MUST NOT be used to help highly loaded servers to stand against legal connection rate.

If you see SYN flood warnings in your logs, but investigation shows that they occur because of overload with legal connections, you should tune another parameters until this warning disappear.

See: tcp_max_syn_backlog, tcp_synack_retries, tcp_abort_on_overflow.

syncookies seriously violate TCP protocol, do not allow to use TCP extensions, can result in serious degradation of some services (f.e. SMTP relaying), visible not by you, but your clients and relays, contacting you.

While you see SYN flood warnings in logs not being really flooded, your server is seriously misconfigured.

If you want to test which effects syncookies have to your network connections you can set this knob to 2 to enable unconditionally generation of syncookies.

tcp_synack_retries – INTEGER

Number of times SYNACKs for a passive TCP connection attempt will be retransmitted.

Should not be higher than 255.

Default value is 5, which corresponds to 31seconds till the last retransmission with the current initial RTO of 1second.

With this the final timeout for a passive TCP connection will happen after 63seconds.

tcp_abort_on_overflow – BOOLEAN

If listening service is too slow to accept new connections,reset them.

Default state is FALSE.

It means that if overflow occurred due to a burst, connection will recover.

Enable this option _only_ if you are really sure that listening daemon cannot be tuned to accept connections faster.

Enabling this option can harm clients of your server.

ip_local_port_range – 2 INTEGERS

Defines the local port range that is used by TCP and UDP to choose the local port.

The first number is the first, the second the last local port number.

If possible, it is better these numbers have different parity.(one even and one odd values)

The default values are 32768 and 60999 respectively.

Example:

net.ipv4.ip_local_port_range = 10240 60999

ip_nonlocal_bind – BOOLEAN

If set, allows processes to bind() to non-local IP addresses,which can be quite useful – but may break some applications.

Default: 0

arp_announce – INTEGER

Define different restriction levels for announcing the local source IP address from IP packets in ARP requests sent on interface:

0 – (default) Use any local address, configured on any interface

1 – Try to avoid local addresses that are not in the target’s subnet for this interface.

This mode is useful when target hosts reachable via this interface require the source IP address in ARP requests to be part of their logical network configured on the receiving interface.

When we generate the request we will check all our subnets that include the target IP and will preserve the source address if it is from such subnet.

If there is no such subnet we select source address according to the rules for level 2.

2 – Always use the best local address for this target.

In this mode we ignore the source address in the IP packet and try to select local address that we prefer for talks with the target host.

Such local address is selected by looking for primary IP addresses on all our subnets on the outgoing interface that include the target IP address.

If no suitable local address is found we select the first local address we have on the outgoing interface or on all other interfaces, with the hope we will receive reply for our request and even sometimes no matter the source IP address we announce.

The max value from conf/{all,interface}/arp_announce is used.

Increasing the restriction level gives more chance for receiving answer from the resolved target while decreasing the level announces more valid sender’s information.

icmp_echo_ignore_all – BOOLEAN

If set non-zero, then the kernel will ignore all ICMP ECHO requests sent to it.

Default: 0

icmp_echo_ignore_broadcasts – BOOLEAN

If set non-zero, then the kernel will ignore all ICMP ECHO and TIMESTAMP requests sent to it via broadcast/multicast.

Default: 1

  根据服务器的配置调整相应参数,将以下项写入 /etc/sysctl.conf 文件,执行sysctl -p 使其生效。或者使用 echo 将指定参数输出重定向至 /proc/sys 下的相应文件即可临时生效。

[root@z-dig ~]# cat >> /etc/sysctl.conf <<EOF
> net.ipv4.tcp_window_scaling = 1
> net.ipv4.tcp_rmem = 4096 87380 16777216
> net.core.rmem_max = 16777216
> net.ipv4.tcp_wmem = 4096 16384 16777216
> net.core.wmem_max = 16777216
> net.ipv4.tcp_timestamps = 0
> net.ipv4.tcp_tw_recycle = 1
> net.ipv4.tcp_tw_reuse = 1
> net.ipv4.tcp_fin_timeout = 15
> net.ipv4.tcp_max_tw_buckets = 262144
> net.core.somaxconn = 20480
> net.ipv4.tcp_max_syn_backlog = 20480
> net.ipv4.ip_local_port_range = 10240 60999
> EOF
[root@z-dig ~]#
[root@z-dig ~]# sysctl -p

转载请注明原文链接:http://www.z-dig.com/linux-kernel-tuning-network.html



正文部分到此结束