.. SPDX-License-Identifier: GPL-2.0
===================================
Linux Ethernet Bonding Driver HOWTO
===================================
Latest update: 27 April 2011
Initial release: Thomas Davis <tadavis at lbl.gov>
Corrections, HA extensions: 2000/10/03-15:
- Willy Tarreau <willy at meta-x.org>
- Constantine Gavrilov <const-g at xpert.com>
- Chad N. Tindel <ctindel at ieee dot org>
- Janice Girouard <girouard at us dot ibm dot com>
- Jay Vosburgh <fubar at us dot ibm dot com>
Reorganized and updated Feb 2005 by Jay Vosburgh
Added Sysfs information: 2006/04/24
- Mitch Williams <mitch.a.williams at intel.com>
Introduction
============
The Linux bonding driver provides a method for aggregating
multiple network interfaces into a single logical "bonded" interface.
The behavior of the bonded interfaces depends upon the mode; generally
speaking, modes provide either hot standby or load balancing services.
Additionally, link integrity monitoring may be performed.
The bonding driver originally came from Donald Becker's
beowulf patches for kernel 2.0. It has changed quite a bit since, and
the original tools from extreme-linux and beowulf sites will not work
with this version of the driver.
For new versions of the driver, updated userspace tools, and
who to ask for help, please follow the links at the end of this file.
.. Table of Contents
1. Bonding Driver Installation
2. Bonding Driver Options
3. Configuring Bonding Devices
3.1 Configuration with Sysconfig Support
3.1.1 Using DHCP with Sysconfig
3.1.2 Configuring Multiple Bonds with Sysconfig
3.2 Configuration with Initscripts Support
3.2.1 Using DHCP with Initscripts
3.2.2 Configuring Multiple Bonds with Initscripts
3.3 Configuring Bonding Manually with Ifenslave
3.3.1 Configuring Multiple Bonds Manually
3.4 Configuring Bonding Manually via Sysfs
3.5 Configuration with Interfaces Support
3.6 Overriding Configuration for Special Cases
3.7 Configuring LACP for 802.3ad mode in a more secure way
4. Querying Bonding Configuration
4.1 Bonding Configuration
4.2 Network Configuration
5. Switch Configuration
6. 802.1q VLAN Support
7. Link Monitoring
7.1 ARP Monitor Operation
7.2 Configuring Multiple ARP Targets
7.3 MII Monitor Operation
8. Potential Trouble Sources
8.1 Adventures in Routing
8.2 Ethernet Device Renaming
8.3 Painfully Slow Or No Failed Link Detection By Miimon
9. SNMP agents
10. Promiscuous mode
11. Configuring Bonding for High Availability
11.1 High Availability in a Single Switch Topology
11.2 High Availability in a Multiple Switch Topology
11.2.1 HA Bonding Mode Selection for Multiple Switch Topology
11.2.2 HA Link Monitoring for Multiple Switch Topology
12. Configuring Bonding for Maximum Throughput
12.1 Maximum Throughput in a Single Switch Topology
12.1.1 MT Bonding Mode Selection for Single Switch Topology
12.1.2 MT Link Monitoring for Single Switch Topology
12.2 Maximum Throughput in a Multiple Switch Topology
12.2.1 MT Bonding Mode Selection for Multiple Switch Topology
12.2.2 MT Link Monitoring for Multiple Switch Topology
13. Switch Behavior Issues
13.1 Link Establishment and Failover Delays
13.2 Duplicated Incoming Packets
14. Hardware Specific Considerations
14.1 IBM BladeCenter
15. Frequently Asked Questions
16. Resources and Links
1. Bonding Driver Installation
==============================
Most popular distro kernels ship with the bonding driver
already available as a module. If your distro does not, or you
have need to compile bonding from source (e.g., configuring and
installing a mainline kernel from kernel.org), you'll need to perform
the following steps:
1.1 Configure and build the kernel with bonding
-----------------------------------------------
The current version of the bonding driver is available in the
drivers/net/bonding subdirectory of the most recent kernel source
(which is available on
http://kernel.org). Most users "rolling their
own" will want to use the most recent kernel from kernel.org.
Configure kernel with "make menuconfig" (or "make xconfig" or
"make config"), then select "Bonding driver support" in the "Network
device support" section. It is recommended that you configure the
driver as module since it is currently the only way to pass parameters
to the driver or configure more than one bonding device.
Build and install the new kernel and modules.
1.2 Bonding Control Utility
---------------------------
It is recommended to configure bonding via iproute2 (netlink)
or sysfs, the old ifenslave control utility is obsolete.
2. Bonding Driver Options
=========================
Options for the bonding driver are supplied as parameters to the
bonding module at load time, or are specified via sysfs.
Module options may be given as command line arguments to the
insmod or modprobe command, but are usually specified in either the
``/etc/modprobe.d/*.conf`` configuration files, or in a distro-specific
configuration file (some of which are detailed in the next section).
Details on bonding support for sysfs is provided in the
"Configuring Bonding Manually via Sysfs" section, below.
The available bonding driver parameters are listed below. If a
parameter is not specified the default value is used. When initially
configuring a bond, it is recommended "tail -f /var/log/messages" be
run in a separate window to watch for bonding driver error messages.
It is critical that either the miimon or arp_interval and
arp_ip_target parameters be specified, otherwise serious network
degradation will occur during link failures. Very few devices do not
support at least miimon, so there is really no reason not to use it.
Options with textual values will accept either the text name
or, for backwards compatibility, the option value. E.g.,
"mode=802.3ad" and "mode=4" set the same mode.
The parameters are as follows:
active_slave
Specifies the new active slave for modes that support it
(active-backup, balance-alb and balance-tlb). Possible values
are the name of any currently enslaved interface, or an empty
string. If a name is given, the slave and its link must be up in order
to be selected as the new active slave. If an empty string is
specified, the current active slave is cleared, and a new active
slave is selected automatically.
Note that this is only available through the sysfs interface. No module
parameter by this name exists.
The normal value of this option is the name of the currently
active slave, or the empty string if there is no active slave or
the current mode does not use an active slave.
ad_actor_sys_prio
In an AD system, this specifies the system priority. The allowed range
is 1 - 65535. If the value is not specified, it takes 65535 as the
default value.
This parameter has effect only in 802.3ad mode and is available through
SysFs interface.
ad_actor_system
In an AD system, this specifies the mac-address for the actor in
protocol packet exchanges (LACPDUs). The value cannot be a multicast
address. If the all-zeroes MAC is specified, bonding will internally
use the MAC of the bond itself. It is preferred to have the
local-admin bit set for this mac but driver does not enforce it. If
the value is not given then system defaults to using the masters'
mac address as actors' system address.
This parameter has effect only in 802.3ad mode and is available through
SysFs interface.
ad_select
Specifies the 802.3ad aggregation selection logic to use. The
possible values and their effects are:
stable or 0
The active aggregator is chosen by largest aggregate
bandwidth.
Reselection of the active aggregator occurs only when all
slaves of the active aggregator are down or the active
aggregator has no slaves.
This is the default value.
bandwidth or 1
The active aggregator is chosen by largest aggregate
bandwidth. Reselection occurs if:
- A slave is added to or removed from the bond
- Any slave's link state changes
- Any slave's 802.3ad association state changes
- The bond's administrative state changes to up
count or 2
The active aggregator is chosen by the largest number of
ports (slaves). Reselection occurs as described under the
"bandwidth" setting, above.
The bandwidth and count selection policies permit failover of
802.3ad aggregations when partial failure of the active aggregator
occurs. This keeps the aggregator with the highest availability
(either in bandwidth or in number of ports) active at all times.
This option was added in bonding version 3.4.0.
ad_user_port_key
In an AD system, the port-key has three parts as shown below -
===== ============
Bits Use
===== ============
00 Duplex
01-05 Speed
06-15 User-defined
===== ============
This defines the upper 10 bits of the port key. The values can be
from 0 - 1023. If not given, the system defaults to 0.
This parameter has effect only in 802.3ad mode and is available through
SysFs interface.
all_slaves_active
Specifies that duplicate frames (received on inactive ports) should be
dropped (0) or delivered (1).
Normally, bonding will drop duplicate frames (received on inactive
ports), which is desirable for most users. But there are some times
it is nice to allow duplicate frames to be delivered.
The default value is 0 (drop duplicate frames received on inactive
ports).
arp_interval
Specifies the ARP link monitoring frequency in milliseconds.
The ARP monitor works by periodically checking the slave
devices to determine whether they have sent or received
traffic recently (the precise criteria depends upon the
bonding mode, and the state of the slave). Regular traffic is
generated via ARP probes issued for the addresses specified by
the arp_ip_target option.
This behavior can be modified by the arp_validate option,
below.
If ARP monitoring is used in an etherchannel compatible mode
(modes 0 and 2), the switch should be configured in a mode
that evenly distributes packets across all links. If the
switch is configured to distribute the packets in an XOR
fashion, all replies from the ARP targets will be received on
the same link which could cause the other team members to
fail. ARP monitoring should not be used in conjunction with
miimon. A value of 0 disables ARP monitoring. The default
value is 0.
arp_ip_target
Specifies the IP addresses to use as ARP monitoring peers when
arp_interval is > 0. These are the targets of the ARP request
sent to determine the health of the link to the targets.
Specify these values in ddd.ddd.ddd.ddd format. Multiple IP
addresses must be separated by a comma. At least one IP
address must be given for ARP monitoring to function. The
maximum number of targets that can be specified is 16. The
default value is no IP addresses.
ns_ip6_target
Specifies the IPv6 addresses to use as IPv6 monitoring peers when
arp_interval is > 0. These are the targets of the NS request
sent to determine the health of the link to the targets.
Specify these values in ffff:ffff::ffff:ffff format. Multiple IPv6
addresses must be separated by a comma. At least one IPv6
address must be given for NS/NA monitoring to function. The
maximum number of targets that can be specified is 16. The
default value is no IPv6 addresses.
arp_validate
Specifies whether or not ARP probes and replies should be
validated in any mode that supports arp monitoring, or whether
non-ARP traffic should be filtered (disregarded) for link
monitoring purposes.
Possible values are:
none or 0
No validation or filtering is performed.
active or 1
Validation is performed only for the active slave.
backup or 2
Validation is performed only for backup slaves.
all or 3
Validation is performed for all slaves.
filter or 4
Filtering is applied to all slaves. No validation is
performed.
filter_active or 5
Filtering is applied to all slaves, validation is performed
only for the active slave.
filter_backup or 6
Filtering is applied to all slaves, validation is performed
only for backup slaves.
Validation:
Enabling validation causes the ARP monitor to examine the incoming
ARP requests and replies, and only consider a slave to be up if it
is receiving the appropriate ARP traffic.
For an active slave, the validation checks ARP replies to confirm
that they were generated by an arp_ip_target. Since backup slaves
do not typically receive these replies, the validation performed
for backup slaves is on the broadcast ARP request sent out via the
active slave. It is possible that some switch or network
configurations may result in situations wherein the backup slaves
do not receive the ARP requests; in such a situation, validation
of backup slaves must be disabled.
The validation of ARP requests on backup slaves is mainly helping
bonding to decide which slaves are more likely to work in case of
the active slave failure, it doesn't really guarantee that the
backup slave will work if it's selected as the next active slave.
Validation is useful in network configurations in which multiple
bonding hosts are concurrently issuing ARPs to one or more targets
beyond a common switch. Should the link between the switch and
target fail (but not the switch itself), the probe traffic
generated by the multiple bonding instances will fool the standard
ARP monitor into considering the links as still up. Use of
validation can resolve this, as the ARP monitor will only consider
ARP requests and replies associated with its own instance of
bonding.
Filtering:
Enabling filtering causes the ARP monitor to only use incoming ARP
packets for link availability purposes. Arriving packets that are
not ARPs are delivered normally, but do not count when determining
if a slave is available.
Filtering operates by only considering the reception of ARP
packets (any ARP packet, regardless of source or destination) when
determining if a slave has received traffic for link availability
purposes.
Filtering is useful in network configurations in which significant
levels of third party broadcast traffic would fool the standard
ARP monitor into considering the links as still up. Use of
filtering can resolve this, as only ARP traffic is considered for
link availability purposes.
This option was added in bonding version 3.1.0.
arp_all_targets
Specifies the quantity of arp_ip_targets that must be reachable
in order for the ARP monitor to consider a slave as being up.
This option affects only active-backup mode for slaves with
arp_validation enabled.
Possible values are:
any or 0
consider the slave up only when any of the arp_ip_targets
is reachable
all or 1
consider the slave up only when all of the arp_ip_targets
are reachable
arp_missed_max
Specifies the number of arp_interval monitor checks that must
fail in order for an interface to be marked down by the ARP monitor.
In order to provide orderly failover semantics, backup interfaces
are permitted an extra monitor check (i.e., they must fail
arp_missed_max + 1 times before being marked down).
The default value is 2, and the allowable range is 1 - 255.
coupled_control
Specifies whether the LACP state machine's MUX in the 802.3ad mode
should have separate Collecting and Distributing states.
This is by implementing the independent control state machine per
IEEE 802.1AX-2008 5.4.15 in addition to the existing coupled control
state machine.
The default value is 1. This setting does not separate the Collecting
and Distributing states, maintaining the bond in coupled control.
downdelay
Specifies the time, in milliseconds, to wait before disabling
a slave after a link failure has been detected. This option
is only valid for the miimon link monitor. The downdelay
value should be a multiple of the miimon value; if not, it
will be rounded down to the nearest multiple. The default
value is 0.
fail_over_mac
Specifies whether active-backup mode should set all slaves to
the same MAC address at enslavement (the traditional
behavior), or, when enabled, perform special handling of the
bond's MAC address in accordance with the selected policy.
Possible values are:
none or 0
This setting disables fail_over_mac, and causes
bonding to set all slaves of an active-backup bond to
the same MAC address at enslavement time. This is the
default.
active or 1
The "active" fail_over_mac policy indicates that the
MAC address of the bond should always be the MAC
address of the currently active slave. The MAC
address of the slaves is not changed; instead, the MAC
address of the bond changes during a failover.
This policy is useful for devices that cannot ever
alter their MAC address, or for devices that refuse
incoming broadcasts with their own source MAC (which
interferes with the ARP monitor).
The down side of this policy is that every device on
the network must be updated via gratuitous ARP,
vs. just updating a switch or set of switches (which
often takes place for any traffic, not just ARP
traffic, if the switch snoops incoming traffic to
update its tables) for the traditional method. If the
gratuitous ARP is lost, communication may be
disrupted.
When this policy is used in conjunction with the mii
monitor, devices which assert link up prior to being
able to actually transmit and receive are particularly
susceptible to loss of the gratuitous ARP, and an
appropriate updelay setting may be required.
follow or 2
The "follow" fail_over_mac policy causes the MAC
address of the bond to be selected normally (normally
the MAC address of the first slave added to the bond).
However, the second and subsequent slaves are not set
to this MAC address while they are in a backup role; a
slave is programmed with the bond's MAC address at
failover time (and the formerly active slave receives
the newly active slave's MAC address).
This policy is useful for multiport devices that
either become confused or incur a performance penalty
when multiple ports are programmed with the same MAC
address.
The default policy is none, unless the first slave cannot
change its MAC address, in which case the active policy is
selected by default.
This option may be modified via sysfs only when no slaves are
present in the bond.
This option was added in bonding version 3.2.0. The "follow"
policy was added in bonding version 3.3.0.
lacp_active
Option specifying whether to send LACPDU frames periodically.
off or 0
LACPDU frames acts as "speak when spoken to".
on or 1
LACPDU frames are sent along the configured links
periodically. See lacp_rate for more details.
The default is on.
lacp_rate
Option specifying the rate in which we'll ask our link partner
to transmit LACPDU packets in 802.3ad mode. Possible values
are:
slow or 0
Request partner to transmit LACPDUs every 30 seconds
fast or 1
Request partner to transmit LACPDUs every 1 second
The default is slow.
broadcast_neighbor
Option specifying whether to broadcast ARP/ND packets to all
active slaves. This option has no effect in modes other than
802.3ad mode. The default is off (0).
max_bonds
Specifies the number of bonding devices to create for this
instance of the bonding driver. E.g., if max_bonds is 3, and
the bonding driver is not already loaded, then bond0, bond1
and bond2 will be created. The default value is 1. Specifying
a value of 0 will load bonding, but will not create any devices.
miimon
Specifies the MII link monitoring frequency in milliseconds.
This determines how often the link state of each slave is
inspected for link failures. A value of zero disables MII
link monitoring. A value of 100 is a good starting point.
The use_carrier option, below, affects how the link state is
determined. See the High Availability section for additional
information. The default value is 100 if arp_interval is not
set.
min_links
Specifies the minimum number of links that must be active before
asserting carrier. It is similar to the Cisco EtherChannel min-links
feature. This allows setting the minimum number of member ports that
must be up (link-up state) before marking the bond device as up
(carrier on). This is useful for situations where higher level services
such as clustering want to ensure a minimum number of low bandwidth
links are active before switchover. This option only affect 802.3ad
mode.
The default value is 0. This will cause carrier to be asserted (for
802.3ad mode) whenever there is an active aggregator, regardless of the
number of available links in that aggregator. Note that, because an
aggregator cannot be active without at least one available link,
setting this option to 0 or to 1 has the exact same effect.
mode
Specifies one of the bonding policies. The default is
balance-rr (round robin). Possible values are:
balance-rr or 0
Round-robin policy: Transmit packets in sequential
order from the first available slave through the
last. This mode provides load balancing and fault
tolerance.
active-backup or 1
Active-backup policy: Only one slave in the bond is
active. A different slave becomes active if, and only
if, the active slave fails. The bond's MAC address is
externally visible on only one port (network adapter)
to avoid confusing the switch.
In bonding version 2.6.2 or later, when a failover
occurs in active-backup mode, bonding will issue one
or more gratuitous ARPs on the newly active slave.
One gratuitous ARP is issued for the bonding master
interface and each VLAN interfaces configured above
it, provided that the interface has at least one IP
address configured. Gratuitous ARPs issued for VLAN
interfaces are tagged with the appropriate VLAN id.
This mode provides fault tolerance. The primary
option, documented below, affects the behavior of this
mode.
balance-xor or 2
XOR policy: Transmit based on the selected transmit
hash policy. The default policy is a simple [(source
MAC address XOR'd with destination MAC address XOR
packet type ID) modulo slave count]. Alternate transmit
policies may be selected via the xmit_hash_policy option,
described below.
This mode provides load balancing and fault tolerance.
broadcast or 3
Broadcast policy: transmits everything on all slave
interfaces. This mode provides fault tolerance.
802.3ad or 4
IEEE 802.3ad Dynamic link aggregation. Creates
aggregation groups that share the same speed and
duplex settings. Utilizes all slaves in the active
aggregator according to the 802.3ad specification.
Slave selection for outgoing traffic is done according
to the transmit hash policy, which may be changed from
the default simple XOR policy via the xmit_hash_policy
option, documented below. Note that not all transmit
policies may be 802.3ad compliant, particularly in
regards to the packet mis-ordering requirements of
section 43.2.4 of the 802.3ad standard. Differing
peer implementations will have varying tolerances for
noncompliance.
Prerequisites:
1. Ethtool support in the base drivers for retrieving
the speed and duplex of each slave.
2. A switch that supports IEEE 802.3ad Dynamic link
aggregation.
Most switches will require some type of configuration
to enable 802.3ad mode.
balance-tlb or 5
Adaptive transmit load balancing: channel bonding that
does not require any special switch support.
In tlb_dynamic_lb=1 mode; the outgoing traffic is
distributed according to the current load (computed
relative to the speed) on each slave.
In tlb_dynamic_lb=0 mode; the load balancing based on
current load is disabled and the load is distributed
only using the hash distribution.
Incoming traffic is received by the current slave.
If the receiving slave fails, another slave takes over
the MAC address of the failed receiving slave.
Prerequisite:
Ethtool support in the base drivers for retrieving the
speed of each slave.
balance-alb or 6
Adaptive load balancing: includes balance-tlb plus
receive load balancing (rlb) for IPV4 traffic, and
does not require any special switch support. The
receive load balancing is achieved by ARP negotiation.
The bonding driver intercepts the ARP Replies sent by
the local system on their way out and overwrites the
source hardware address with the unique hardware
address of one of the slaves in the bond such that
different peers use different hardware addresses for
the server.
Receive traffic from connections created by the server
is also balanced. When the local system sends an ARP
Request the bonding driver copies and saves the peer's
IP information from the ARP packet. When the ARP
Reply arrives from the peer, its hardware address is
retrieved and the bonding driver initiates an ARP
reply to this peer assigning it to one of the slaves
in the bond. A problematic outcome of using ARP
negotiation for balancing is that each time that an
ARP request is broadcast it uses the hardware address
of the bond. Hence, peers learn the hardware address
of the bond and the balancing of receive traffic
collapses to the current slave. This is handled by
sending updates (ARP Replies) to all the peers with
their individually assigned hardware address such that
the traffic is redistributed. Receive traffic is also
redistributed when a new slave is added to the bond
and when an inactive slave is re-activated. The
receive load is distributed sequentially (round robin)
among the group of highest speed slaves in the bond.
When a link is reconnected or a new slave joins the
bond the receive traffic is redistributed among all
active slaves in the bond by initiating ARP Replies
with the selected MAC address to each of the
clients. The updelay parameter (detailed below) must
be set to a value equal or greater than the switch's
forwarding delay so that the ARP Replies sent to the
peers will not be blocked by the switch.
Prerequisites:
1. Ethtool support in the base drivers for retrieving
the speed of each slave.
2. Base driver support for setting the hardware
address of a device while it is open. This is
required so that there will always be one slave in the
team using the bond hardware address (the
curr_active_slave) while having a unique hardware
address for each slave in the bond. If the
curr_active_slave fails its hardware address is
swapped with the new curr_active_slave that was
chosen.
num_grat_arp,
num_unsol_na
Specify the number of peer notifications (gratuitous ARPs and
unsolicited IPv6 Neighbor Advertisements) to be issued after a
failover event. As soon as the link is up on the new slave
(possibly immediately) a peer notification is sent on the
bonding device and each VLAN sub-device. This is repeated at
the rate specified by peer_notif_delay if the number is
greater than 1.
The valid range is 0 - 255; the default value is 1. These options
affect the active-backup or 802.3ad (broadcast_neighbor enabled) mode.
These options were added for bonding versions 3.3.0 and 3.4.0
respectively.
From Linux 3.0 and bonding version 3.7.1, these notifications
are generated by the ipv4 and ipv6 code and the numbers of
repetitions cannot be set independently.
packets_per_slave
Specify the number of packets to transmit through a slave before
moving to the next one. When set to 0 then a slave is chosen at
random.
The valid range is 0 - 65535; the default value is 1. This option
has effect only in balance-rr mode.
peer_notif_delay
Specify the delay, in milliseconds, between each peer
notification (gratuitous ARP and unsolicited IPv6 Neighbor
Advertisement) when they are issued after a failover event.
This delay should be a multiple of the MII link monitor interval
(miimon).
The valid range is 0 - 300000. The default value is 0, which means
to match the value of the MII link monitor interval.
prio
Slave priority. A higher number means higher priority.
The primary slave has the highest priority. This option also
follows the primary_reselect rules.
This option could only be configured via netlink, and is only valid
for active-backup(1), balance-tlb (5) and balance-alb (6) mode.
The valid value range is a signed 32 bit integer.
The default value is 0.
primary
A string (eth0, eth2, etc) specifying which slave is the
primary device. The specified device will always be the
active slave while it is available. Only when the primary is
off-line will alternate devices be used. This is useful when
one slave is preferred over another, e.g., when one slave has
higher throughput than another.
The primary option is only valid for active-backup(1),
balance-tlb (5) and balance-alb (6) mode.
primary_reselect
Specifies the reselection policy for the primary slave. This
affects how the primary slave is chosen to become the active slave
when failure of the active slave or recovery of the primary slave
occurs. This option is designed to prevent flip-flopping between
the primary slave and other slaves. Possible values are:
always or 0 (default)
The primary slave becomes the active slave whenever it
comes back up.
better or 1
The primary slave becomes the active slave when it comes
back up, if the speed and duplex of the primary slave is
better than the speed and duplex of the current active
slave.
failure or 2
The primary slave becomes the active slave only if the
current active slave fails and the primary slave is up.
The primary_reselect setting is ignored in two cases:
If no slaves are active, the first slave to recover is
made the active slave.
When initially enslaved, the primary slave is always made
the active slave.
Changing the primary_reselect policy via sysfs will cause an
immediate selection of the best active slave according to the new
policy. This may or may not result in a change of the active
slave, depending upon the circumstances.
This option was added for bonding version 3.6.0.
tlb_dynamic_lb
Specifies if dynamic shuffling of flows is enabled in tlb
or alb mode. The value has no effect on any other modes.
The default behavior of tlb mode is to shuffle active flows across
slaves based on the load in that interval. This gives nice lb
characteristics but can cause packet reordering. If re-ordering is
a concern use this variable to disable flow shuffling and rely on
load balancing provided solely by the hash distribution.
xmit-hash-policy can be used to select the appropriate hashing for
the setup.
The sysfs entry can be used to change the setting per bond device
and the initial value is derived from the module parameter. The
sysfs entry is allowed to be changed only if the bond device is
down.
The default value is "1" that enables flow shuffling while value "0"
disables it. This option was added in bonding driver 3.7.1
updelay
Specifies the time, in milliseconds, to wait before enabling a
slave after a link recovery has been detected. This option is
only valid for the miimon link monitor. The updelay value
should be a multiple of the miimon value; if not, it will be
rounded down to the nearest multiple. The default value is 0.
use_carrier
Specifies whether or not miimon should use MII or ETHTOOL
ioctls vs. netif_carrier_ok() to determine the link
status. The MII or ETHTOOL ioctls are less efficient and
utilize a deprecated calling sequence within the kernel. The
netif_carrier_ok() relies on the device driver to maintain its
state with netif_carrier_on/off; at this writing, most, but
not all, device drivers support this facility.
If bonding insists that the link is up when it should not be,
it may be that your network device driver does not support
netif_carrier_on/off. The default state for netif_carrier is
"carrier on," so if a driver does not support netif_carrier,
it will appear as if the link is always up. In this case,
setting use_carrier to 0 will cause bonding to revert to the
MII / ETHTOOL ioctl method to determine the link state.
A value of 1 enables the use of netif_carrier_ok(), a value of
0 will use the deprecated MII / ETHTOOL ioctls. The default
value is 1.
xmit_hash_policy
Selects the transmit hash policy to use for slave selection in
balance-xor, 802.3ad, and tlb modes. Possible values are:
layer2
Uses XOR of hardware MAC addresses and packet type ID
field to generate the hash. The formula is
hash = source MAC[5] XOR destination MAC[5] XOR packet type ID
slave number = hash modulo slave count
This algorithm will place all traffic to a particular
network peer on the same slave.
This algorithm is 802.3ad compliant.
layer2+3
This policy uses a combination of layer2 and layer3
protocol information to generate the hash.
Uses XOR of hardware MAC addresses and IP addresses to
generate the hash. The formula is
hash = source MAC[5] XOR destination MAC[5] XOR packet type ID
hash = hash XOR source IP XOR destination IP
hash = hash XOR (hash RSHIFT 16)
hash = hash XOR (hash RSHIFT 8)
And then hash is reduced modulo slave count.
If the protocol is IPv6 then the source and destination
addresses are first hashed using ipv6_addr_hash.
This algorithm will place all traffic to a particular
network peer on the same slave. For non-IP traffic,
the formula is the same as for the layer2 transmit
hash policy.
This policy is intended to provide a more balanced
distribution of traffic than layer2 alone, especially
in environments where a layer3 gateway device is
required to reach most destinations.
This algorithm is 802.3ad compliant.
layer3+4
This policy uses upper layer protocol information,
when available, to generate the hash. This allows for
traffic to a particular network peer to span multiple
slaves, although a single connection will not span
multiple slaves.
The formula for unfragmented TCP and UDP packets is
hash = source port, destination port (as in the header)
hash = hash XOR source IP XOR destination IP
hash = hash XOR (hash RSHIFT 16)
hash = hash XOR (hash RSHIFT 8)
hash = hash RSHIFT 1
And then hash is reduced modulo slave count.
If the protocol is IPv6 then the source and destination
addresses are first hashed using ipv6_addr_hash.
For fragmented TCP or UDP packets and all other IPv4 and
IPv6 protocol traffic, the source and destination port
information is omitted. For non-IP traffic, the
formula is the same as for the layer2 transmit hash
policy.
This algorithm is not fully 802.3ad compliant. A
single TCP or UDP conversation containing both
fragmented and unfragmented packets will see packets
striped across two interfaces. This may result in out
of order delivery. Most traffic types will not meet
this criteria, as TCP rarely fragments traffic, and
most UDP traffic is not involved in extended
conversations. Other implementations of 802.3ad may
or may not tolerate this noncompliance.
encap2+3
This policy uses the same formula as layer2+3 but it
relies on skb_flow_dissect to obtain the header fields
which might result in the use of inner headers if an
encapsulation protocol is used. For example this will
improve the performance for tunnel users because the
packets will be distributed according to the encapsulated
flows.
encap3+4
This policy uses the same formula as layer3+4 but it
relies on skb_flow_dissect to obtain the header fields
which might result in the use of inner headers if an
encapsulation protocol is used. For example this will
improve the performance for tunnel users because the
packets will be distributed according to the encapsulated
flows.
vlan+srcmac
This policy uses a very rudimentary vlan ID and source mac
hash to load-balance traffic per-vlan, with failover
should one leg fail. The intended use case is for a bond
shared by multiple virtual machines, all configured to
use their own vlan, to give lacp-like functionality
without requiring lacp-capable switching hardware.
The formula for the hash is simply
hash = (vlan ID) XOR (source MAC vendor) XOR (source MAC dev)
The default value is layer2. This option was added in bonding
version 2.6.3. In earlier versions of bonding, this parameter
does not exist, and the layer2 policy is the only policy. The
layer2+3 value was added for bonding version 3.2.2.
resend_igmp
Specifies the number of IGMP membership reports to be issued after
a failover event. One membership report is issued immediately after
the failover, subsequent packets are sent in each 200ms interval.
The valid range is 0 - 255; the default value is 1. A value of 0
prevents the IGMP membership report from being issued in response
to the failover event.
This option is useful for bonding modes balance-rr (0), active-backup
(1), balance-tlb (5) and balance-alb (6), in which a failover can
switch the IGMP traffic from one slave to another. Therefore a fresh
IGMP report must be issued to cause the switch to forward the incoming
IGMP traffic over the newly selected slave.
This option was added for bonding version 3.7.0.
lp_interval
Specifies the number of seconds between instances where the bonding
driver sends learning packets to each slaves peer switch.
The valid range is 1 - 0x7fffffff; the default value is 1. This Option
has effect only in balance-tlb and balance-alb modes.
3. Configuring Bonding Devices
==============================
You can configure bonding using either your distro's network
initialization scripts, or manually using either iproute2 or the
sysfs interface. Distros generally use one of three packages for the
network initialization scripts: initscripts, sysconfig or interfaces.
Recent versions of these packages have support for bonding, while older
versions do not.
We will first describe the options for configuring bonding for
distros using versions of initscripts, sysconfig and interfaces with full
or partial support for bonding, then provide information on enabling
bonding without support from the network initialization scripts (i.e.,
older versions of initscripts or sysconfig).
If you're unsure whether your distro uses sysconfig,
initscripts or interfaces, or don't know if it's new enough, have no fear.
Determining this is fairly straightforward.
First, look for a file called interfaces in /etc/network directory.
If this file is present in your system, then your system use interfaces. See
Configuration with Interfaces Support.
Else, issue the command::
$ rpm -qf /sbin/ifup
It will respond with a line of text starting with either
"initscripts" or "sysconfig," followed by some numbers. This is the
package that provides your network initialization scripts.
Next, to determine if your installation supports bonding,
issue the command::
$ grep ifenslave /sbin/ifup
If this returns any matches, then your initscripts or
sysconfig has support for bonding.
3.1 Configuration with Sysconfig Support
----------------------------------------
This section applies to distros using a version of sysconfig
with bonding support, for example, SuSE Linux Enterprise Server 9.
SuSE SLES 9's networking configuration system does support
bonding, however, at this writing, the YaST system configuration
front end does not provide any means to work with bonding devices.
Bonding devices can be managed by hand, however, as follows.
First, if they have not already been configured, configure the
slave devices. On SLES 9, this is most easily done by running the
yast2 sysconfig configuration utility. The goal is for to create an
ifcfg-id file for each slave device. The simplest way to accomplish
this is to configure the devices for DHCP (this is only to get the
file ifcfg-id file created; see below for some issues with DHCP). The
name of the configuration file for each device will be of the form::
ifcfg-id-xx:xx:xx:xx:xx:xx
Where the "xx" portion will be replaced with the digits from
the device's permanent MAC address.
Once the set of ifcfg-id-xx:xx:xx:xx:xx:xx files has been
created, it is necessary to edit the configuration files for the slave
devices (the MAC addresses correspond to those of the slave devices).
Before editing, the file will contain multiple lines, and will look
something like this::
BOOTPROTO='dhcp'
STARTMODE='on'
USERCTL='no'
UNIQUE='XNzu.WeZGOGF+4wE'
_nm_name='bus-pci-0001:61:01.0'
Change the BOOTPROTO and STARTMODE lines to the following::
BOOTPROTO='none'
STARTMODE='off'
Do not alter the UNIQUE or _nm_name lines. Remove any other
lines (USERCTL, etc).
Once the ifcfg-id-xx:xx:xx:xx:xx:xx files have been modified,
it's time to create the configuration file for the bonding device
itself. This file is named ifcfg-bondX, where X is the number of the
bonding device to create, starting at 0. The first such file is
ifcfg-bond0, the second is ifcfg-bond1, and so on. The sysconfig
network configuration system will correctly start multiple instances
of bonding.
The contents of the ifcfg-bondX file is as follows::
BOOTPROTO="static"
BROADCAST="10.0.2.255"
IPADDR="10.0.2.10"
NETMASK="255.255.0.0"
NETWORK="10.0.2.0"
REMOTE_IPADDR=""
STARTMODE="onboot"
BONDING_MASTER="yes"
BONDING_MODULE_OPTS="mode=active-backup miimon=100"
BONDING_SLAVE0="eth0"
BONDING_SLAVE1="bus-pci-0000:06:08.1"
Replace the sample BROADCAST, IPADDR, NETMASK and NETWORK
values with the appropriate values for your network.
The STARTMODE specifies when the device is brought online.
The possible values are:
======== ======================================================
onboot The device is started at boot time. If you're not
sure, this is probably what you want.
manual The device is started only when ifup is called
manually. Bonding devices may be configured this
way if you do not wish them to start automatically
at boot for some reason.
hotplug The device is started by a hotplug event. This is not
a valid choice for a bonding device.
off or The device configuration is ignored.
ignore
======== ======================================================
The line BONDING_MASTER='yes' indicates that the device is a
bonding master device. The only useful value is "yes."
The contents of BONDING_MODULE_OPTS are supplied to the
instance of the bonding module for this device. Specify the options
for the bonding mode, link monitoring, and so on here. Do not include
the max_bonds bonding parameter; this will confuse the configuration
system if you have multiple bonding devices.
Finally, supply one BONDING_SLAVEn="slave device" for each
slave. where "n" is an increasing value, one for each slave. The
"slave device" is either an interface name, e.g., "eth0", or a device
specifier for the network device. The interface name is easier to
find, but the ethN names are subject to change at boot time if, e.g.,
a device early in the sequence has failed. The device specifiers
(bus-pci-0000:06:08.1 in the example above) specify the physical
network device, and will not change unless the device's bus location
changes (for example, it is moved from one PCI slot to another). The
example above uses one of each type for demonstration purposes; most
configurations will choose one or the other for all slave devices.
When all configuration files have been modified or created,
networking must be restarted for the configuration changes to take
effect. This can be accomplished via the following::
# /etc/init.d/network restart
Note that the network control script (/sbin/ifdown) will
remove the bonding module as part of the network shutdown processing,
so it is not necessary to remove the module by hand if, e.g., the
module parameters have changed.
Also, at this writing, YaST/YaST2 will not manage bonding
devices (they do not show bonding interfaces on its list of network
devices). It is necessary to edit the configuration file by hand to
change the bonding configuration.
Additional general options and details of the ifcfg file
format can be found in an example ifcfg template file::
/etc/sysconfig/network/ifcfg.template
Note that the template does not document the various ``BONDING_*``
settings described above, but does describe many of the other options.
3.1.1 Using DHCP with Sysconfig
-------------------------------
Under sysconfig, configuring a device with BOOTPROTO='dhcp'
will cause it to query DHCP for its IP address information. At this
writing, this does not function for bonding devices; the scripts
attempt to obtain the device address from DHCP prior to adding any of
the slave devices. Without active slaves, the DHCP requests are not
sent to the network.
3.1.2 Configuring Multiple Bonds with Sysconfig
-----------------------------------------------
The sysconfig network initialization system is capable of
handling multiple bonding devices. All that is necessary is for each
bonding instance to have an appropriately configured ifcfg-bondX file
(as described above). Do not specify the "max_bonds" parameter to any
instance of bonding, as this will confuse sysconfig. If you require
multiple bonding devices with identical parameters, create multiple
ifcfg-bondX files.
Because the sysconfig scripts supply the bonding module
options in the ifcfg-bondX file, it is not necessary to add them to
the system ``/etc/modules.d/*.conf`` configuration files.
3.2 Configuration with Initscripts Support
------------------------------------------
This section applies to distros using a recent version of
initscripts with bonding support, for example, Red Hat Enterprise Linux
version 3 or later, Fedora, etc. On these systems, the network
initialization scripts have knowledge of bonding, and can be configured to
control bonding devices. Note that older versions of the initscripts
package have lower levels of support for bonding; this will be noted where
applicable.
These distros will not automatically load the network adapter
driver unless the ethX device is configured with an IP address.
Because of this constraint, users must manually configure a
network-script file for all physical adapters that will be members of
a bondX link. Network script files are located in the directory:
/etc/sysconfig/network-scripts
The file name must be prefixed with "ifcfg-eth" and suffixed
with the adapter's physical adapter number. For example, the script
for eth0 would be named /etc/sysconfig/network-scripts/ifcfg-eth0.
Place the following text in the file::
DEVICE=eth0
USERCTL=no
ONBOOT=yes
MASTER=bond0
SLAVE=yes
BOOTPROTO=none
The DEVICE= line will be different for every ethX device and
must correspond with the name of the file, i.e., ifcfg-eth1 must have
a device line of DEVICE=eth1. The setting of the MASTER= line will
also depend on the final bonding interface name chosen for your bond.
As with other network devices, these typically start at 0, and go up
one for each device, i.e., the first bonding instance is bond0, the
second is bond1, and so on.
Next, create a bond network script. The file name for this
script will be /etc/sysconfig/network-scripts/ifcfg-bondX where X is
the number of the bond. For bond0 the file is named "ifcfg-bond0",
for bond1 it is named "ifcfg-bond1", and so on. Within that file,
place the following text::
DEVICE=bond0
IPADDR=192.168.1.1
NETMASK=255.255.255.0
NETWORK=192.168.1.0
BROADCAST=192.168.1.255
ONBOOT=yes
BOOTPROTO=none
USERCTL=no
Be sure to change the networking specific lines (IPADDR,
NETMASK, NETWORK and BROADCAST) to match your network configuration.
For later versions of initscripts, such as that found with Fedora
7 (or later) and Red Hat Enterprise Linux version 5 (or later), it is possible,
and, indeed, preferable, to specify the bonding options in the ifcfg-bond0
file, e.g. a line of the format::
BONDING_OPTS="mode=active-backup arp_interval=60 arp_ip_target=192.168.1.254"
will configure the bond with the specified options. The options
specified in BONDING_OPTS are identical to the bonding module parameters
except for the arp_ip_target field when using versions of initscripts older
than and 8.57 (Fedora 8) and 8.45.19 (Red Hat Enterprise Linux 5.2). When
using older versions each target should be included as a separate option and
should be preceded by a '+' to indicate it should be added to the list of
queried targets, e.g.,::
arp_ip_target=+192.168.1.1 arp_ip_target=+192.168.1.2
is the proper syntax to specify multiple targets. When specifying
options via BONDING_OPTS, it is not necessary to edit
``/etc/modprobe.d/*.conf``.
For even older versions of initscripts that do not support
BONDING_OPTS, it is necessary to edit /etc/modprobe.d/*.conf, depending upon
your distro) to load the bonding module with your desired options when the
bond0 interface is brought up. The following lines in /etc/modprobe.d/*.conf
will load the bonding module, and select its options:
alias bond0 bonding
options bond0 mode=balance-alb miimon=100
Replace the sample parameters with the appropriate set of
options for your configuration.
Finally run "/etc/rc.d/init.d/network restart" as root. This
will restart the networking subsystem and your bond link should be now
up and running.
3.2.1 Using DHCP with Initscripts
---------------------------------
Recent versions of initscripts (the versions supplied with Fedora
Core 3 and Red Hat Enterprise Linux 4, or later versions, are reported to
work) have support for assigning IP information to bonding devices via
DHCP.
To configure bonding for DHCP, configure it as described
above, except replace the line "BOOTPROTO=none" with "BOOTPROTO=dhcp"
and add a line consisting of "TYPE=Bonding". Note that the TYPE value
is case sensitive.
3.2.2 Configuring Multiple Bonds with Initscripts
-------------------------------------------------
Initscripts packages that are included with Fedora 7 and Red Hat
Enterprise Linux 5 support multiple bonding interfaces by simply
specifying the appropriate BONDING_OPTS= in ifcfg-bondX where X is the
number of the bond. This support requires sysfs support in the kernel,
and a bonding driver of version 3.0.0 or later. Other configurations may
not support this method for specifying multiple bonding interfaces; for
those instances, see the "Configuring Multiple Bonds Manually" section,
below.
3.3 Configuring Bonding Manually with iproute2
-----------------------------------------------
This section applies to distros whose network initialization
scripts (the sysconfig or initscripts package) do not have specific
knowledge of bonding. One such distro is SuSE Linux Enterprise Server
version 8.
The general method for these systems is to place the bonding
module parameters into a config file in /etc/modprobe.d/ (as
appropriate for the installed distro), then add modprobe and/or
`ip link` commands to the system's global init script. The name of
the global init script differs; for sysconfig, it is
/etc/init.d/boot.local and for initscripts it is /etc/rc.d/rc.local.
For example, if you wanted to make a simple bond of two e100
devices (presumed to be eth0 and eth1), and have it persist across
reboots, edit the appropriate file (/etc/init.d/boot.local or
/etc/rc.d/rc.local), and add the following::
modprobe bonding mode=balance-alb miimon=100
modprobe e100
ifconfig bond0 192.168.1.1 netmask 255.255.255.0 up
ip link set eth0 master bond0
ip link set eth1 master bond0
Replace the example bonding module parameters and bond0
network configuration (IP address, netmask, etc) with the appropriate
values for your configuration.
Unfortunately, this method will not provide support for the
ifup and ifdown scripts on the bond devices. To reload the bonding
configuration, it is necessary to run the initialization script, e.g.,::
# /etc/init.d/boot.local
or::
# /etc/rc.d/rc.local
It may be desirable in such a case to create a separate script
which only initializes the bonding configuration, then call that
separate script from within boot.local. This allows for bonding to be
enabled without re-running the entire global init script.
To shut down the bonding devices, it is necessary to first
mark the bonding device itself as being down, then remove the
appropriate device driver modules. For our example above, you can do
the following::
# ifconfig bond0 down
# rmmod bonding
# rmmod e100
Again, for convenience, it may be desirable to create a script
with these commands.
3.3.1 Configuring Multiple Bonds Manually
-----------------------------------------
This section contains information on configuring multiple
bonding devices with differing options for those systems whose network
initialization scripts lack support for configuring multiple bonds.
If you require multiple bonding devices, but all with the same
options, you may wish to use the "max_bonds" module parameter,
documented above.
To create multiple bonding devices with differing options, it is
preferable to use bonding parameters exported by sysfs, documented in the
section below.
For versions of bonding without sysfs support, the only means to
provide multiple instances of bonding with differing options is to load
the bonding driver multiple times. Note that current versions of the
sysconfig network initialization scripts handle this automatically; if
your distro uses these scripts, no special action is needed. See the
section Configuring Bonding Devices, above, if you're not sure about your
network initialization scripts.
To load multiple instances of the module, it is necessary to
specify a different name for each instance (the module loading system
requires that every loaded module, even multiple instances of the same
module, have a unique name). This is accomplished by supplying multiple
sets of bonding options in ``/etc/modprobe.d/*.conf``, for example::
alias bond0 bonding
options bond0 -o bond0 mode=balance-rr miimon=100
alias bond1 bonding
options bond1 -o bond1 mode=balance-alb miimon=50
will load the bonding module two times. The first instance is
named "bond0" and creates the bond0 device in balance-rr mode with an
miimon of 100. The second instance is named "bond1" and creates the
bond1 device in balance-alb mode with an miimon of 50.
In some circumstances (typically with older distributions),
the above does not work, and the second bonding instance never sees
its options. In that case, the second options line can be substituted
as follows::
install bond1 /sbin/modprobe --ignore-install bonding -o bond1 \
mode=balance-alb miimon=50
This may be repeated any number of times, specifying a new and
unique name in place of bond1 for each subsequent instance.
It has been observed that some Red Hat supplied kernels are unable
to rename modules at load time (the "-o bond1" part). Attempts to pass
that option to modprobe will produce an "Operation not permitted" error.
This has been reported on some Fedora Core kernels, and has been seen on
RHEL 4 as well. On kernels exhibiting this problem, it will be impossible
to configure multiple bonds with differing parameters (as they are older
kernels, and also lack sysfs support).
3.4 Configuring Bonding Manually via Sysfs
------------------------------------------
Starting with version 3.0.0, Channel Bonding may be configured
via the sysfs interface. This interface allows dynamic configuration
of all bonds in the system without unloading the module. It also
allows for adding and removing bonds at runtime. Ifenslave is no
longer required, though it is still supported.
Use of the sysfs interface allows you to use multiple bonds
with different configurations without having to reload the module.
It also allows you to use multiple, differently configured bonds when
bonding is compiled into the kernel.
You must have the sysfs filesystem mounted to configure
bonding this way. The examples in this document assume that you
are using the standard mount point for sysfs, e.g. /sys. If your
sysfs filesystem is mounted elsewhere, you will need to adjust the
example paths accordingly.
Creating and Destroying Bonds
-----------------------------
To add a new bond foo::
# echo +foo > /sys/class/net/bonding_masters
To remove an existing bond bar::
# echo -bar > /sys/class/net/bonding_masters
To show all existing bonds::
# cat /sys/class/net/bonding_masters
.. note::
due to 4K size limitation of sysfs files, this list may be
truncated if you have more than a few hundred bonds. This is unlikely
to occur under normal operating conditions.
Adding and Removing Slaves
--------------------------
Interfaces may be enslaved to a bond using the file
/sys/class/net/<bond>/bonding/slaves. The semantics for this file
are the same as for the bonding_masters file.
To enslave interface eth0 to bond bond0::
# ifconfig bond0 up
# echo +eth0 > /sys/class/net/bond0/bonding/slaves
To free slave eth0 from bond bond0::
# echo -eth0 > /sys/class/net/bond0/bonding/slaves
When an interface is enslaved to a bond, symlinks between the
two are created in the sysfs filesystem. In this case, you would get
/sys/class/net/bond0/slave_eth0 pointing to /sys/class/net/eth0, and
/sys/class/net/eth0/master pointing to /sys/class/net/bond0.
This means that you can tell quickly whether or not an
interface is enslaved by looking for the master symlink. Thus:
# echo -eth0 > /sys/class/net/eth0/master/bonding/slaves
will free eth0 from whatever bond it is enslaved to, regardless of
the name of the bond interface.
Changing a Bond's Configuration
-------------------------------
Each bond may be configured individually by manipulating the
files located in /sys/class/net/<bond name>/bonding
The names of these files correspond directly with the command-
line parameters described elsewhere in this file, and, with the
exception of arp_ip_target, they accept the same values. To see the
current setting, simply cat the appropriate file.
A few examples will be given here; for specific usage
guidelines for each parameter, see the appropriate section in this
document.
To configure bond0 for balance-alb mode::
# ifconfig bond0 down
# echo 6 > /sys/class/net/bond0/bonding/mode
- or -
# echo balance-alb > /sys/class/net/bond0/bonding/mode
.. note::
The bond interface must be down before the mode can be changed.
To enable MII monitoring on bond0 with a 1 second interval::
# echo 1000 > /sys/class/net/bond0/bonding/miimon
.. note::
If ARP monitoring is enabled, it will disabled when MII
monitoring is enabled, and vice-versa.
To add ARP targets::
# echo +192.168.0.100 > /sys/class/net/bond0/bonding/arp_ip_target
# echo +192.168.0.101 > /sys/class/net/bond0/bonding/arp_ip_target
.. note::
up to 16 target addresses may be specified.
To remove an ARP target::
# echo -192.168.0.100 > /sys/class/net/bond0/bonding/arp_ip_target
To configure the interval between learning packet transmits::
# echo 12 > /sys/class/net/bond0/bonding/lp_interval
.. note::
the lp_interval is the number of seconds between instances where
the bonding driver sends learning packets to each slaves peer switch. The
default interval is 1 second.
Example Configuration
---------------------
We begin with the same example that is shown in section 3.3,
executed with sysfs, and without using ifenslave.
To make a simple bond of two e100 devices (presumed to be eth0
and eth1), and have it persist across reboots, edit the appropriate
file (/etc/init.d/boot.local or /etc/rc.d/rc.local), and add the
following::
modprobe bonding
modprobe e100
echo balance-alb > /sys/class/net/bond0/bonding/mode
ifconfig bond0 192.168.1.1 netmask 255.255.255.0 up
echo 100 > /sys/class/net/bond0/bonding/miimon
echo +eth0 > /sys/class/net/bond0/bonding/slaves
echo +eth1 > /sys/class/net/bond0/bonding/slaves
To add a second bond, with two e1000 interfaces in
active-backup mode, using ARP monitoring, add the following lines to
your init script::
modprobe e1000
echo +bond1 > /sys/class/net/bonding_masters
echo active-backup > /sys/class/net/bond1/bonding/mode
ifconfig bond1 192.168.2.1 netmask 255.255.255.0 up
echo +192.168.2.100 /sys/class/net/bond1/bonding/arp_ip_target
echo 2000 > /sys/class/net/bond1/bonding/arp_interval
echo +eth2 > /sys/class/net/bond1/bonding/slaves
echo +eth3 > /sys/class/net/bond1/bonding/slaves
3.5 Configuration with Interfaces Support
-----------------------------------------
This section applies to distros which use /etc/network/interfaces file
to describe network interface configuration, most notably Debian and its
derivatives.
The ifup and ifdown commands on Debian don't support bonding out of
the box. The ifenslave-2.6 package should be installed to provide bonding
support. Once installed, this package will provide ``bond-*`` options
to be used into /etc/network/interfaces.
Note that ifenslave-2.6 package will load the bonding module and use
the ifenslave command when appropriate.
Example Configurations
----------------------
In /etc/network/interfaces, the following stanza will configure bond0, in
active-backup mode, with eth0 and eth1 as slaves::
auto bond0
iface bond0 inet dhcp
bond-slaves eth0 eth1
bond-mode active-backup
bond-miimon 100
bond-primary eth0 eth1
If the above configuration doesn't work, you might have a system using
upstart for system startup. This is most notably true for recent
Ubuntu versions. The following stanza in /etc/network/interfaces will
produce the same result on those systems::
auto bond0
iface bond0 inet dhcp
bond-slaves none
bond-mode active-backup
bond-miimon 100
auto eth0
iface eth0 inet manual
bond-master bond0
bond-primary eth0 eth1
auto eth1
iface eth1 inet manual
bond-master bond0
bond-primary eth0 eth1
For a full list of ``bond-*`` supported options in /etc/network/interfaces and
some more advanced examples tailored to you particular distros, see the files in
/usr/share/doc/ifenslave-2.6.
3.6 Overriding Configuration for Special Cases
----------------------------------------------
When using the bonding driver, the physical port which transmits a frame is
typically selected by the bonding driver, and is not relevant to the user or
system administrator. The output port is simply selected using the policies of
the selected bonding mode. On occasion however, it is helpful to direct certain
classes of traffic to certain physical interfaces on output to implement
slightly more complex policies. For example, to reach a web server over a
bonded interface in which eth0 connects to a private network, while eth1
connects via a public network, it may be desirous to bias the bond to send said
traffic over eth0 first, using eth1 only as a fall back, while all other traffic
can safely be sent over either interface. Such configurations may be achieved
using the traffic control utilities inherent in linux.
By default the bonding driver is multiqueue aware and 16 queues are created
when the driver initializes (see Documentation/networking/multiqueue.rst
for details). If more or less queues are desired the module parameter
tx_queues can be used to change this value. There is no sysfs parameter
available as the allocation is done at module init time.
The output of the file /proc/net/bonding/bondX has changed so the output Queue
ID is now printed for each slave::
Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
Slave Interface: eth0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:1a:a0:12:8f:cb
Slave queue ID: 0
Slave Interface: eth1
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:1a:a0:12:8f:cc
Slave queue ID: 2
The queue_id for a slave can be set using the command::
# echo "eth1:2" > /sys/class/net/bond0/bonding/queue_id
Any interface that needs a queue_id set should set it with multiple calls
like the one above until proper priorities are set for all interfaces. On
distributions that allow configuration via initscripts, multiple 'queue_id'
arguments can be added to BONDING_OPTS to set all needed slave queues.
These queue id's can be used in conjunction with the tc utility to configure
a multiqueue qdisc and filters to bias certain traffic to transmit on certain
slave devices. For instance, say we wanted, in the above configuration to
force all traffic bound to 192.168.1.100 to use eth1 in the bond as its output
device. The following commands would accomplish this::
# tc qdisc add dev bond0 handle 1 root multiq
# tc filter add dev bond0 protocol ip parent 1: prio 1 u32 match ip \
dst 192.168.1.100 action skbedit queue_mapping 2
These commands tell the kernel to attach a multiqueue queue discipline to the
bond0 interface and filter traffic enqueued to it, such that packets with a dst
ip of 192.168.1.100 have their output queue mapping value overwritten to 2.
This value is then passed into the driver, causing the normal output path
--> --------------------
--> maximum size reached
--> --------------------