RHEL4/5/6 : Quick Overview of the Kernel Parameters related to Network Tuning

Redhat Kernel Parameters for Network TuningThis article will introduce several Tunable Parameters, that we can customize for Network Tuning purpose in Redhat Enterprise Linux.

:::: Socket Layer/ Socket buffers

tcp_rmem –   vector of 3 INTEGERs: min, default, max
min: Minimal size of receive buffer used by TCP sockets.It is guaranteed to each TCP socket, even under moderate memory pressure.
tcp_wmem – vector of 3 INTEGERs: min, default, max
min: Amount of memory reserved for send buffers for TCP sockets. Each TCP socket has rights to use it due to fact of its birth. Default: 4K
wmem_max – The maximum send socket buffer size in bytes.
rmem_max – The maximum receive socket buffer size in bytes
tcp_app_win – INTEGER – Reserve max(window/2^tcp_app_win, mss) of window for application buffer. Value 0 is special, it means that nothing is reserved. Default: 31
tcp_adv_win_scale – INTEGER – Count buffering overhead as bytes/2^tcp_adv_win_scale (if tcp_adv_win_scale > 0) or bytes-bytes/2^(-tcp_adv_win_scale), if it is <= 0. Default: 2
somaxconn – INTEGER – Limit of socket listen() backlog, known in userspace as SOMAXCONN. See also tcp_max_syn_backlog for additional tuning for TCP sockets. Default: 128

:::: Socket queues

tcp_orphan_retries – INTEGER – How may times to retry before killing TCP connection, closed by our side. Default value 7 corresponds to ~50sec-16min depending on RTO. If you machine is loaded WEB server, you should think about lowering this value, such sockets may consume significant resources. Cf. tcp_max_orphans.
tcp_fin_timeout – INTEGER – Time to hold socket in state FIN-WAIT-2, if it was closed by our side. Peer can be broken and never close its side, or even died unexpectedly. Default value is 60sec. Usual value used in 2.2 was 180 seconds, you may restore it, but remember that if your machine is even underloaded WEB server, you risk to overflow memory with kilotons of dead sockets, FIN-WAIT-2 sockets are less dangerous than FIN-WAIT-1, because they eat maximum 1.5K of memory, but they tend to live longer. Cf. tcp_max_orphans.
tcp_max_tw_buckets – INTEGER – 
Maximal number of timewait sockets held by system simultaneously. If this number is exceeded time-wait socket is immediately destroyed and warning is printed. This limit exists only to prevent simple DoS attacks, you must not lower the limit artificially, but rather increase it (probably, after increasing installed memory), if network conditions require more than default value.
tcp_tw_recycle – BOOLEAN – Enable fast recycling TIME-WAIT sockets. It should not be changed without advice/request of technical experts. Default: 0
tcp_max_orphans – INTEGER – Maximal number of TCP sockets not attached to any user file handle, held by system. If this number is exceeded orphaned connections are reset immediately and warning is printed. This limit exists only to prevent simple DoS attacks, you must not rely on this or lower the limit artificially, but rather increase it (probably, after increasing installed memory), if network conditions require more than default value, and tune network services to linger and kill such states more aggressively. Let me to remind again: each orphan eats up to ~64K of unswappable memory.
tcp_abort_on_overflow – BOOLEAN – If listening service is too slow to accept new connections, reset them. It means that if overflow occurred due to a burst, connection will recover. Enable this option only if you are really sure that listening daemon cannot be tuned to accept connections faster. Enabling this option can harm clients of your server. Default: FALSE

::: TCP parameters

tcp_abc – INTEGER – Controls Appropriate Byte Count (ABC) defined in RFC3465. ABC is a way of increasing congestion window (cwnd) more slowly in response to partial acknowledgments. Default: 0 (off)
Possible values are:
0 : increase cwnd once per acknowledgment (no ABC)
1 : increase cwnd once per acknowledgment of full sized segment
2 : allow increase cwnd by two if acknowledgment is of two segments to compensate for delayed acknowledgments.
tcp_syn_retries – INTEGER – Number of times initial SYNs for an active TCP connection attempt will be retransmitted. Should not be higher than 255. Default value is 5, which corresponds to ~180 seconds.
tcp_synack_retries – INTEGER – Number of times SYNACKs for a passive TCP connection attempt will be retransmitted. Should not be higher than 255. Default value is 5, which corresponds to ~180seconds.
tcp_keepalive_time – INTEGER –  How often TCP sends out keepalive messages when keepalive is enabled. Default: 2hours.
tcp_keepalive_probes – INTEGER –  How many keepalive probes TCP sends out, until it decides that the connection is broken. Default: 9
tcp_keepalive_intvl – INTEGER –   How frequently the probes are send out. Multiplied by tcp_keepalive_probes it is time to kill not responding connection, after probes started. Default value: 75sec i.e. connection will be aborted after ~11 minutes of retries.
tcp_retries1 – INTEGER –   How many times to retry before deciding that something is wrong and it is necessary to report this suspicion to network layer. Minimal RFC value is 3, it is default, which corresponds to ~3sec-8min depending on RTO.
tcp_retries2 – INTEGER –   How may times to retry before killing alive TCP connection. RFC1122 says that the limit should be longer than 100 sec. It is too small number. Default value 15 corresponds to ~13-30min depending on RTO.
tcp_max_syn_backlog – INTEGER –  Maximal number of remembered connection requests, which are still did not receive an acknowledgment from connecting client. Default value is 1024 for systems with more than 128Mb of memory, and 128 for low memory machines. If server suffers of overload, try to increase this number.
tcp_window_scaling – BOOLEAN –  Enable window scaling as defined in RFC1323.
tcp_timestamps – BOOLEAN –  Enable timestamps as defined in RFC1323.
tcp_sack – BOOLEAN – Enable select acknowledgments (SACKS).
tcp_fack – BOOLEAN – Enable FACK congestion avoidance and fast retransmission. The value is not used, if tcp_sack is not enabled.
tcp_dsack – BOOLEAN – Allows TCP to send “duplicate” SACKs.
tcp_ecn – BOOLEAN – Enable Explicit Congestion Notification in TCP.
tcp_reordering – INTEGER – Maximal reordering of packets in a TCP stream. Default: 3
tcp_low_latency – BOOLEAN – If set, the TCP stack makes decisions that prefer lower latency as opposed to higher throughput. By default, this option is not set meaning that higher throughput is preferred. An example of an application where this
default should be changed would be a Beowulf compute cluster. Default: 0
tcp_tso_win_divisor – INTEGER – This allows control over what percentage of the congestion window can be consumed by a single TSO frame. The setting of this parameter is a choice between burstiness and building larger TSO frames. Default: 3
tcp_frto – BOOLEAN – Enables F-RTO, an enhanced recovery algorithm for TCP retransmission timeouts. It is particularly beneficial in wireless environments where packet loss is typically due to random radio interference rather than intermediate router congestion.
tcp_congestion_control – STRING – Set the congestion control algorithm to be used for new connections. The algorithm “reno” is always available, but additional choices may be available based on kernel configuration.
tcp_workaround_signed_windows – BOOLEAN – If set, assume no receipt of a window scaling option means the remote TCP is broken and treats the window as a signed quantity. If unset, assume the remote TCP is not broken even if we do not receive a window scaling option from them. Default: 0
tcp_slow_start_after_idle – BOOLEAN – If set, provide RFC2861 behavior and time out the congestion window after an idle period. An idle period is defined at the current RTO. If unset, the congestion window will not be timed out after an idle period. Default: 1

:::: UDP parameters

Following parameters affects all UDP traffic on system:
udp_mem – vector of 3 INTEGERs: min, pressure, max
Number of pages allowed for queueing by all UDP sockets. Default is calculated at boot time from amount of available memory:
min: Below this number of pages UDP is not bothered about its memory appetite. When amount of memory allocated by UDP exceeds this number, UDP starts to moderate memory usage.
pressure: This value was introduced to follow format of tcp_mem.
max: Number of pages allowed for queueing by all UDP sockets.
udp_rmem_min – INTEGER –   Minimal size of receive buffer used by UDP sockets in moderation. Each UDP socket is able to use the size for receiving data, even if total pages of UDP sockets exceed udp_mem pressure. The unit is byte. Default: 4096
udp_wmem_min – INTEGER – Minimal size of send buffer used by UDP sockets in moderation. Each UDP socket is able to use the size for sending data, even if total pages of UDP sockets exceed udp_mem pressure. The unit is byte. Default: 4096

:::: Interface Layer

ip_forward – Forward Packets between interfaces. This variable is special, its change resets all configuration parameters to their default state (RFC1122 for hosts, RFC1812 for routers)
  • 0 : disabled(default)
  • not 0 : enabled
ip_default_ttl – INTEGER –   default 64
ip_no_pmtu_disc – BOOLEAN – Disable Path MTU Discovery. Default: FALSE
min_pmtu – INTEGER – minimum discovered Path MTU. Default: 562
mtu_expires – INTEGER – Time, in seconds, that cached PMTU information is kept.
min_adv_mss – INTEGER – The advertised MSS depends on the first hop route MTU, but will never be lower than this setting.
inet_peer_threshold – INTEGER – The approximate size of the storage. Starting from this threshold entries will be thrown aggressively. This threshold also determines entries’ time-to-live and time intervals between garbage collection passes. More entries, less time-to-live, less GC interval.
inet_peer_minttl – INTEGER – Minimum time-to-live of entries. Should be enough to cover fragment time-to-live on the reassembling side. This minimum time-to-live is guaranteed if the pool size is less than inet_peer_threshold. Measured in jiffies.
inet_peer_maxttl – INTEGER – Maximum time-to-live of entries. Unused entries will expire after this period of time if there is no memory pressure on the pool (i.e. when the number of entries in the pool is very small). Measured in jiffies.
inet_peer_gc_mintime – INTEGER – Minimum interval between garbage collection passes. This interval is in effect under high memory pressure on the pool. Measured in jiffies.
inet_peer_gc_maxtime – INTEGER – Minimum interval between garbage collection passes. This interval is in effect under low (or absent) memory pressure on the pool. Measured in jiffies.

:::: Interface Layer [Hardware interrupts,Rx queues, Rx buffers]

ipfrag_high_thresh – INTEGER – Maximum memory used to reassemble IP fragments. When ipfrag_high_thresh bytes of memory is allocated for this purpose, the fragment handler will toss packets until ipfrag_low_thresh is reached.
ipfrag_low_thresh – INTEGER – See ipfrag_high_thresh
ipfrag_time – INTEGER – Time in seconds to keep an IP fragment in memory.
ipfrag_secret_interval – INTEGER – Regeneration interval (in seconds) of the hash secret (or lifetime for the hash secret) for IP fragments. Default: 600
ipfrag_max_dist – INTEGER  – 
ipfrag_max_dist is a non-negative integer value which defines the maximum “disorder” which is allowed among fragments which share a common IP source address.
Using a very small value, e.g. 1 or 2, for ipfrag_max_dist can result in unnecessarily dropping fragment queues when normal reordering of packets occurs, which could lead to poor application performance. Using a very large value, e.g. 50000, increases the likelihood of incorrectly reassembling IP fragments that originate from different IP datagrams, which could result in data corruption.
Default: 64
medium_id – INTEGER – Integer value used to differentiate the devices by the medium they are attached to. Two devices can have different id values when the broadcast packets are received only on one of them.
proxy_arp – BOOLEAN – Do proxy arp. proxy_arp for the interface will be enabled if at least one of conf/{all,interface}/proxy_arp is set to TRUE, it will be disabled otherwise
shared_media – BOOLEAN – Send(router) or accept(host) RFC1620 shared media redirects. Overrides ip_secure_redirects. shared_media for the interface will be enabled if at least one of conf/{all,interface}/shared_media is set to TRUE, it will be disabled otherwise. Default: TRUE
arp_announce – INTEGER – Define different restriction levels for announcing the local source IP address from IP packets in ARP requests sent on interface.
The max value from conf/{all,interface}/arp_announce is used.
Increasing the restriction level gives more chance for receiving answer from the resolved target while decreasing the level announces more valid sender’s information.
Possible values are:
  • 0 : (default) Use any local address, configured on any interface
  • 1 : Try to avoid local addresses that are not in the target’s subnet for this interface. This mode is useful when target hosts reachable via this interface require the source IP address in ARP requests to be part of their logical network configured on the receiving interface. When we generate the request we will check all our subnets that include the target IP and will preserve the source address if it is from such subnet. If there is no such subnet we select source address according to the rules for level 2.
  • 2 : Always use the best local address for this target. In this mode we ignore the source address in the IP packet and try to select local address that we prefer for talks with the target host. Such local address is selected by looking for primary IP addresses on all our subnets on the outgoing interface that include the target IP address. If no suitable local address is found we select the first local address we have on the outgoing interface or on all other interfaces, with the hope we will receive reply for our request and even sometimes no matter the source IP address we announce.

 

 

 

Ramdev

Ramdev

I have started unixadminschool.com ( aka gurkulindia.com) in 2009 as my own personal reference blog, and later sometime i have realized that my leanings might be helpful for other unixadmins if I manage my knowledge-base in more user friendly format. And the result is today's' unixadminschool.com. You can connect me at - https://www.linkedin.com/in/unixadminschool/

1 Response

  1. September 16, 2015

    […] Read – Kernel Parameters Reference Guide […]

What is in your mind, about this post ? Leave a Reply

Close
  Our next learning article is ready, subscribe it in your email

What is your Learning Goal for Next Six Months ? Talk to us