vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| This diff basically provides the same functionality as ARP load balancing, without its limitations. It can also share traffic that comes across routers and it works with IPv6. The basic concept is that we use the shared medium properties of the ethernet to direct the traffic all carp nodes. Each one hashes the source and destination IP at ip_input and decides based on the status of the carp load sharing group whether a packet should be accepted or just dropped silently. Please test this patch with -current, if you're using carp(4). The manpage should guide you, if you want to try load sharing as well. If something is unclear, please let me know so that I can improve it. Marco Index: share/man/man4/carp.4 ================================================== ================= RCS file: /cvs/src/share/man/man4/carp.4,v retrieving revision 1.23 diff -p -u -r1.23 carp.4 --- share/man/man4/carp.4 15 Jun 2006 08:55:39 -0000 1.23 +++ share/man/man4/carp.4 5 Feb 2007 09:02:18 -0000 @@ -118,10 +118,57 @@ Disabled by default. Balance local traffic using ARP. Disabled by default. .El -.Sh ARP LEVEL LOAD BALANCING +.Sh LOAD BALANCING .Nm -has limited abilities for load balancing the incoming connections -between hosts in an Ethernet network. +provides two mechanisms to load balance incoming traffic +over a group of +.Nm +hosts: +ARP balancing and IP balancing. +.Pp +Which one to use mainly depends on the network environment +.Nm +is being used in. +ARP balancing has limited abilities for load balancing the +incoming connections between hosts in an Ethernet network. +It only works for clients in the local network, because +ARP balancing spreads the load by varying ARP replies +based on the source IP address of the host sending the query. +Therefore it cannot balance traffic that crosses a router, because the +router itself will always be balanced to the same virtual host. +.Pp +IP balancing is not dependent on ARP and therefore also works +for traffic that comes over a router. +This method should work in all environments and can +also provide more fine grained load balancing than ARP balancing. +The downside of IP balancing is that it requires the traffic +that is destined towards the load balanced IP addresses +to be received by all +.Nm +hosts. +While this is always the case when connected to a hub, +it has to play some tricks in switched networks, which +will result in a higher network load. +.Pp +A rule of thumb might be to use ARP balancing if there +are many hosts on the same network segment and +to use IP balancing for all other cases. +.Pp +The configuration of ARP and IP load balancing is quite similar: +a load balancing group is created out of multiple +.Nm +interfaces by configuring them with the same IP addresses, +but to different VHIDs. +All +.Nm +nodes in the cluster are configured identically, except +for a different +.Cm advskew +to control which interfaces on a host will be the designated master. +See the +.Sx EXAMPLES +section for a practical example of load balancing. +.Ss ARP BALANCING For load balancing, several .Nm interfaces are configured to the same IP address, but to different VHIDs. @@ -132,9 +179,6 @@ If the corresponding .Nm interface is in master state, the ARP request will be answered, otherwise it will be ignored. -See the -.Sx EXAMPLES -section for a practical example of load balancing. .Pp The ARP load balancing has some limitations. Firstly, ARP balancing only works on the local network segment. @@ -153,6 +197,65 @@ This requires multiple CARP groups with .Em different IP addresses on the outgoing interface, configured so that each host is the master of one group. +.Ss IP BALANCING +IP load balancing works by utilizing the network itself to distribute +incoming traffic to all +.Nm +nodes in the cluster. +Each packet is filtered on the incoming +.Nm +interface so that only one node in the cluster accepts the +packet. +All the other nodes will just silently drop it. +The filtering function uses a hash over the source and destination +address of the IPv4 or IPv6 packet and compares the result against the +state of the +.Nm +load balancing group. +.Pp +A load balancing group consists of two or more +.Nm +interfaces per host which are configured with common IP addresses +but different VHIDs. +IP balancing is activated by setting the +.Cm link0 +flag on the first interface of the group. +In most cases it is recommended to also enable the +.Cm link1 +flag. +This flag enables the stealth mode on the interface. +In this mode +.Nm +never sends packets with its virtual MAC address as source. +This is necessary to receive incoming traffic on all hosts in switched networks. +Stealth mode prevents a switch from learning the virtual MAC +address, so that it has to flood the traffic to all its ports. +The +.Cm link1 +flag can be avoided +only if using a hub or if the switch ports that are connected +to the cluster nodes can be configured into some sort of monitoring mode. +Please note that activating stealth mode on a +.Nm +interface that has already been running might not work instantly. +As a workaround the VHID can be changed to a previously unused +one, or just wait until the MAC table entry in the switch times out. + +Some Layer-3 switches do port learning based on ARP packets. +Therefore the stealth mode cannot hide the virtual MAC address +from these kind of devices. +In such cases, +.Nm +can be told to use a multicast MAC address by additionally enabling the +.Cm link2 +flag. +.Pp +If IP balancing is being used on a firewall, it is recommended to +configure the load balancing group in a symmetrical manner. +This is achieved by prioritizing the interfaces in the same order +(ascending by VHID) on both sides of the firewall. +This ensures that packets of one connection will pass in and out +on the same host and are not routed asymmetrically. .Sh EXAMPLES For firewalls and routers with multiple interfaces, it is desirable to failover all of the @@ -167,22 +270,14 @@ Assume that host A is the preferred mast configured on one physical interface and 192.168.2.y/24 on another. This is the setup for host A: .Bd -literal -offset indent -# ifconfig carp0 create -# ifconfig carp0 vhid 1 pass mekmitasdigoat 192.168.1.1 \e - netmask 255.255.255.0 -# ifconfig carp1 create -# ifconfig carp1 vhid 2 pass mekmitasdigoat 192.168.2.1 \e - netmask 255.255.255.0 +# ifconfig carp0 192.168.1.1 vhid 1 +# ifconfig carp1 192.168.2.1 vhid 2 .Ed .Pp The setup for host B is identical, but it has a higher advskew: .Bd -literal -offset indent -# ifconfig carp0 create -# ifconfig carp0 vhid 1 advskew 100 pass mekmitasdigoat \e - 192.168.1.1 netmask 255.255.255.0 -# ifconfig carp1 create -# ifconfig carp1 vhid 2 advskew 100 pass mekmitasdigoat \e - 192.168.2.1 netmask 255.255.255.0 +# ifconfig carp0 192.168.1.1 vhid 1 advskew 100 +# ifconfig carp1 192.168.2.1 vhid 2 advskew 100 .Ed .Pp Because of the preempt option, when one of the physical interfaces of @@ -191,10 +286,9 @@ host A fails, advskew is adjusted to 240 interfaces. This will cause host B to preempt on both interfaces instead of just the failed one. -.Pp -In order to set up an ARP balanced virtual host, it is necessary to configure -one virtual host for each physical host which would respond to ARP requests -and thus handle the traffic. +.Ss LOAD BALANCING +In order to set up an load balanced virtual host, it is necessary to configure +one virtual host for each physical host. In the following example, two virtual hosts are configured on two hosts to provide balancing and failover for the IP address 192.168.1.10. .Pp @@ -204,40 +298,35 @@ interfaces on Host A are configured. The .Cm advskew of 100 on the second virtual host means that its advertisements will be sent -out slightly less frequently. +out slightly less frequently and will therefore become the designated backup. .Bd -literal -offset indent -# ifconfig carp0 create -# ifconfig carp0 vhid 1 pass mekmitasdigoat 192.168.1.10 \e - netmask 255.255.255.0 -# ifconfig carp1 create -# ifconfig carp1 vhid 2 advskew 100 pass mekmitasdigoat \e - 192.168.1.10 netmask 255.255.255.0 +# ifconfig carp0 192.168.1.10 vhid 1 +# ifconfig carp1 192.168.1.10 vhid 2 advskew 100 .Ed .Pp The configuration for host B is identical, except the skew is on virtual host 1 rather than virtual host 2. .Bd -literal -offset indent -# ifconfig carp0 create -# ifconfig carp0 vhid 1 advskew 100 pass mekmitasdigoat \e - 192.168.1.10 netmask 255.255.255.0 -# ifconfig carp1 create -# ifconfig carp1 vhid 2 pass mekmitasdigoat 192.168.1.10 \e - netmask 255.255.255.0 +# ifconfig carp0 192.168.1.10 vhid 1 advskew 100 +# ifconfig carp1 192.168.1.10 vhid 2 .Ed .Pp -Finally, the ARP balancing feature must be enabled on both hosts: +If ARP balancing is being used, it must be enabled on both hosts: .Pp .Dl # sysctl net.inet.carp.arpbalance=1 .Pp -When the hosts receive an ARP request for 192.168.1.10, the source IP address -of the request is used to compute which virtual host should answer the request. -The host which is master of the selected virtual host will reply to the -request, the other(s) will ignore it. -.Pp -This way, locally connected systems will receive different ARP replies and -subsequent IP traffic will be balanced among the hosts. -If one of the hosts fails, the other will take over the virtual MAC address, -and begin answering ARP requests on its behalf. +If IP balancing is being used, instead enable the +.Cm link0 +and +.Cm link1 +flags on the first interface of the load balancing group on both hosts: +.Bd -literal -offset indent +A# ifconfig carp0 192.168.1.10 vhid 1 link0 link1 +A# ifconfig carp1 192.168.1.10 vhid 2 advskew 100 +.Pp +B# ifconfig carp0 192.168.1.10 vhid 1 advskew 100 link0 link1 +B# ifconfig carp1 192.168.1.10 vhid 2 +.Ed .Sh SEE ALSO .Xr sysctl 3 , .Xr inet 4 , Index: sys/net/if.c ================================================== ================= RCS file: /cvs/src/sys/net/if.c,v retrieving revision 1.153 diff -p -u -r1.153 if.c --- sys/net/if.c 3 Dec 2006 13:41:19 -0000 1.153 +++ sys/net/if.c 21 Jan 2007 23:04:47 -0000 @@ -1329,6 +1329,7 @@ ifioctl(struct socket *so, u_long cmd, c return (EINVAL); switch (ifp->if_type) { case IFT_ETHER: + case IFT_CARP: case IFT_FDDI: case IFT_XETHER: case IFT_ISO88025: @@ -1345,11 +1346,14 @@ ifioctl(struct socket *so, u_long cmd, c return (ENODEV); } if (ifp->if_flags & IFF_UP) { + struct ifreq ifrq; int s = splnet(); ifp->if_flags &= ~IFF_UP; - (*ifp->if_ioctl)(ifp, SIOCSIFFLAGS, (caddr_t)&ifr); + ifrq.ifr_flags = ifp->if_flags; + (*ifp->if_ioctl)(ifp, SIOCSIFFLAGS, (caddr_t)&ifrq); ifp->if_flags |= IFF_UP; - (*ifp->if_ioctl)(ifp, SIOCSIFFLAGS, (caddr_t)&ifr); + ifrq.ifr_flags = ifp->if_flags; + (*ifp->if_ioctl)(ifp, SIOCSIFFLAGS, (caddr_t)&ifrq); splx(s); TAILQ_FOREACH(ifa, &ifp->if_addrlist, ifa_list) { if (ifa->ifa_addr != NULL && Index: sys/net/if_ethersubr.c ================================================== ================= RCS file: /cvs/src/sys/net/if_ethersubr.c,v retrieving revision 1.105 diff -p -u -r1.105 if_ethersubr.c --- sys/net/if_ethersubr.c 7 Dec 2006 18:15:29 -0000 1.105 +++ sys/net/if_ethersubr.c 13 Dec 2006 14:16:49 -0000 @@ -401,7 +401,8 @@ ether_output(ifp0, m0, dst, rt0) sizeof(eh->ether_shost)); #if NCARP > 0 - if (ifp0 != ifp && ifp0->if_type == IFT_CARP) { + if (ifp0 != ifp && ifp0->if_type == IFT_CARP && + !(ifp0->if_flags & IFF_LINK1)) { bcopy((caddr_t)((struct arpcom *)ifp0)->ac_enaddr, (caddr_t)eh->ether_shost, sizeof(eh->ether_shost)); } @@ -595,10 +596,19 @@ ether_input(ifp, eh, m) #endif /* NVLAN > 0 */ #if NCARP > 0 - if (ifp->if_carp && ifp->if_type != IFT_CARP && - (carp_input(m, (u_int8_t *)&eh->ether_shost, - (u_int8_t *)&eh->ether_dhost, eh->ether_type) == 0)) - return; + if (ifp->if_carp) { + if (ifp->if_type != IFT_CARP && + (carp_input(m, (u_int8_t *)&eh->ether_shost, + (u_int8_t *)&eh->ether_dhost, eh->ether_type) == 0)) + return; + /* Always clear multicast flags if received on a carp address */ + else if (ifp->if_type == IFT_CARP && + ifp->if_flags & IFF_LINK2 && + m->m_flags & (M_BCAST|M_MCAST) && + !bcmp(((struct arpcom *)ifp)->ac_enaddr, + (caddr_t)eh->ether_dhost, ETHER_ADDR_LEN)) + m->m_flags &= ~(M_BCAST|M_MCAST); + } #endif /* NCARP > 0 */ ac = (struct arpcom *)ifp; Index: sys/netinet/if_ether.c ================================================== ================= RCS file: /cvs/src/sys/netinet/if_ether.c,v retrieving revision 1.65 diff -p -u -r1.65 if_ether.c --- sys/netinet/if_ether.c 21 Aug 2006 21:36:53 -0000 1.65 +++ sys/netinet/if_ether.c 14 Nov 2006 07:10:40 -0000 @@ -720,7 +720,14 @@ reply: ea->arp_pro = htons(ETHERTYPE_IP); /* let's be sure! */ eh = (struct ether_header *)sa.sa_data; bcopy(ea->arp_tha, eh->ether_dhost, sizeof(eh->ether_dhost)); - bcopy(enaddr, eh->ether_shost, sizeof(eh->ether_shost)); +#if NCARP > 0 + if (ac->ac_if.if_type == IFT_CARP && ac->ac_if.if_flags & IFF_LINK1) + bcopy(((struct arpcom *)ac->ac_if.if_carpdev)->ac_enaddr, + eh->ether_shost, sizeof(eh->ether_shost)); + else +#endif + bcopy(enaddr, eh->ether_shost, sizeof(eh->ether_shost)); + eh->ether_type = htons(ETHERTYPE_ARP); sa.sa_family = pseudo_AF_HDRCMPLT; sa.sa_len = sizeof(sa); Index: sys/netinet/ip_carp.c ================================================== ================= RCS file: /cvs/src/sys/netinet/ip_carp.c,v retrieving revision 1.132 diff -p -u -r1.132 ip_carp.c --- sys/netinet/ip_carp.c 13 Dec 2006 09:01:59 -0000 1.132 +++ sys/netinet/ip_carp.c 5 Feb 2007 09:06:27 -0000 @@ -126,6 +126,8 @@ struct carp_softc { int sc_sendad_success; #define CARP_SENDAD_MIN_SUCCESS 3 + char sc_carplladdr[ETHER_ADDR_LEN]; + char sc_curlladdr[ETHER_ADDR_LEN]; int sc_vhid; int sc_advskew; int sc_naddrs; @@ -141,6 +143,8 @@ struct carp_softc { SHA1_CTX sc_sha1[HMAC_MAX]; u_int32_t sc_hashkey[2]; + u_int32_t sc_lsmask; /* load sharing mask */ + int sc_lscount; /* # load sharing interfaces (max 32) */ struct timeout sc_ad_tmo; /* advertisement timeout */ struct timeout sc_md_tmo; /* master down timeout */ @@ -192,9 +196,8 @@ void carp_ifgroup_ioctl(struct ifnet *, void carp_start(struct ifnet *); void carp_setrun(struct carp_softc *, sa_family_t); void carp_set_state(struct carp_softc *, int); -int carp_addrcount(struct carp_if *, struct in_ifaddr *, int); -enum { CARP_COUNT_MASTER, CARP_COUNT_RUNNING }; - +int carp_addrcount(struct carp_if *, struct ifaddr *, int); +enum { CARP_COUNT_MASTER, CARP_COUNT_RUNNING, CARP_COUNT_LINK0 }; void carp_multicast_cleanup(struct carp_softc *); int carp_set_ifp(struct carp_softc *, struct ifnet *); void carp_set_enaddr(struct carp_softc *); @@ -213,6 +216,7 @@ int carp_ether_addmulti(struct carp_soft int carp_ether_delmulti(struct carp_softc *, struct ifreq *); void carp_ether_purgemulti(struct carp_softc *); int carp_group_demote_count(struct carp_softc *); +void carp_update_lsmask(struct carp_softc *); struct if_clone carp_cloner = IF_CLONE_INITIALIZER("carp", carp_clone_create, carp_clone_destroy); @@ -265,6 +269,10 @@ carp_hmac_prepare_ctx(struct carp_softc sc->sc_hashkey[1] = kmd[2] ^ kmd[3]; /* the rest of the precomputation */ + if (bcmp(sc->sc_ac.ac_enaddr, &sc->sc_carplladdr, ETHER_ADDR_LEN) != 0) + SHA1Update(&sc->sc_sha1[ctx], sc->sc_ac.ac_enaddr, + ETHER_ADDR_LEN); + SHA1Update(&sc->sc_sha1[ctx], (void *)&vhid, sizeof(vhid)); /* Hash the addresses from smallest to largest, not interface order */ @@ -379,7 +387,7 @@ carp_setroute(struct carp_softc *sc, int sc->sc_carpdev->if_carp != NULL) { count = carp_addrcount( (struct carp_if *)sc->sc_carpdev->if_carp, - ifatoia(ifa), CARP_COUNT_MASTER); + ifa, CARP_COUNT_MASTER); if ((cmd == RTM_ADD && count != 1) || (cmd == RTM_DELETE && count != 0)) continue; @@ -832,7 +840,6 @@ carp_clone_create(ifc, unit) if_attach(ifp); if_alloc_sadl(ifp); - carp_set_enaddr(sc); LIST_INIT(&sc->sc_ac.ac_multiaddrs); #if NBPFILTER > 0 bpfattach(&ifp->if_bpf, ifp, DLT_EN10MB, ETHER_HDR_LEN); @@ -1175,6 +1182,10 @@ carp_send_arp(struct carp_softc *sc) if (ifa->ifa_addr->sa_family != AF_INET) continue; + if (carp_addrcount((struct carp_if *)sc->sc_carpdev->if_carp, + ifa, CARP_COUNT_LINK0)) + continue; + in = ifatoia(ifa)->ia_addr.sin_addr.s_addr; arprequest(sc->sc_carpdev, &in, &in, sc->sc_ac.ac_enaddr); DELAY(1000); /* XXX */ @@ -1242,26 +1253,131 @@ carp_hash(struct carp_softc *sc, u_char } int -carp_addrcount(struct carp_if *cif, struct in_ifaddr *ia, int type) +carp_addrcount(struct carp_if *cif, struct ifaddr *ifa0, int type) { struct carp_softc *vh; struct ifaddr *ifa; int count = 0; TAILQ_FOREACH(vh, &cif->vhif_vrs, sc_list) { - if ((type == CARP_COUNT_RUNNING && - (vh->sc_if.if_flags & (IFF_UP|IFF_RUNNING)) == - (IFF_UP|IFF_RUNNING)) || - (type == CARP_COUNT_MASTER && vh->sc_state == MASTER)) { + switch (type) { + case CARP_COUNT_RUNNING: + if ((vh->sc_if.if_flags & (IFF_UP|IFF_RUNNING)) != + (IFF_UP|IFF_RUNNING)) + continue; + break; + case CARP_COUNT_MASTER: + if (vh->sc_state != MASTER) + continue; + break; + case CARP_COUNT_LINK0: + if (!(vh->sc_if.if_flags & IFF_LINK0) || + (vh->sc_if.if_flags & (IFF_UP|IFF_RUNNING)) != + (IFF_UP|IFF_RUNNING)) + continue; + break; + } + TAILQ_FOREACH(ifa, &vh->sc_if.if_addrlist, ifa_list) { + if (ifa->ifa_addr->sa_family == AF_INET && + ifa0->ifa_addr->sa_family == AF_INET && + ifatoia(ifa0)->ia_addr.sin_addr.s_addr == + ifatoia(ifa)->ia_addr.sin_addr.s_addr) + count++; +#ifdef INET6 + if (ifa->ifa_addr->sa_family == AF_INET6 && + ifa0->ifa_addr->sa_family == AF_INET6 && + IN6_ARE_ADDR_EQUAL(IFA_IN6(ifa0), IFA_IN6(ifa))) + count++; +#endif + } + } + return (count); +} + +void +carp_update_lsmask(struct carp_softc *sc) +{ + struct carp_softc *curvh, *vh, *sc0 = NULL; + struct carp_if *cif; + struct ifaddr *ifa, *ifa0 = NULL; + int cur, last, count, found; + + if (!sc->sc_carpdev) + return; + cif = (struct carp_if *)sc->sc_carpdev->if_carp; + + /* + * Take the first IPv4 address from the LINK0 carp interface + * to determine the load sharing group. + * Fallback on the first IPv6 address. + */ + TAILQ_FOREACH(sc0, &cif->vhif_vrs, sc_list) + if (sc0->sc_if.if_flags & IFF_LINK0) + break; + if (sc0 == NULL) + return; + + TAILQ_FOREACH(ifa0, &sc0->sc_if.if_addrlist, ifa_list) + if (ifa0->ifa_addr->sa_family == AF_INET) + break; +#ifdef INET6 + if (ifa0 == NULL) + TAILQ_FOREACH(ifa0, &sc0->sc_if.if_addrlist, ifa_list) + if (ifa0->ifa_addr->sa_family == AF_INET6 && + !IN6_IS_ADDR_LINKLOCAL(IFA_IN6(ifa0))) + break; +#endif + if (ifa0 == NULL) + return; + /* + * Calculate the load sharing mask w/ all carp interfaces + * that share the first address of the LINK0 interface. + * Sort by virtual host ID. + */ + sc0->sc_lsmask = 0; + cur = 0; + curvh = NULL; + count = 0; + do { + found = 0; + last = cur; + cur = 255; + TAILQ_FOREACH(vh, &cif->vhif_vrs, sc_list) { + if ((vh->sc_if.if_flags & (IFF_UP|IFF_RUNNING)) != + (IFF_UP|IFF_RUNNING)) + continue; TAILQ_FOREACH(ifa, &vh->sc_if.if_addrlist, ifa_list) { if (ifa->ifa_addr->sa_family == AF_INET && - ia->ia_addr.sin_addr.s_addr == + ifa0->ifa_addr->sa_family == AF_INET && + ifatoia(ifa0)->ia_addr.sin_addr.s_addr == ifatoia(ifa)->ia_addr.sin_addr.s_addr) - count++; + break; +#ifdef INET6 + if (ifa->ifa_addr->sa_family == AF_INET6 && + ifa0->ifa_addr->sa_family == AF_INET6 && + IN6_ARE_ADDR_EQUAL(IFA_IN6(ifa0), IFA_IN6(ifa))) + break; +#endif + } + if (ifa && vh->sc_vhid > last && vh->sc_vhid < cur) { + cur = vh->sc_vhid; + curvh = vh; + found++; } } - } - return (count); + if (found) { + if (curvh->sc_state == MASTER && + count < sizeof(sc0->sc_lsmask) * 8) + sc0->sc_lsmask |= 1 << count; + count++; + } + } while (found); + + sc0->sc_lscount = count; + if (count == 0) + return; + + CARP_LOG(sc, ("carp_update_lsmask: %x", sc0->sc_lsmask)) } int @@ -1270,6 +1386,15 @@ carp_iamatch(struct in_ifaddr *ia, u_cha { struct carp_softc *sc = ia->ia_ifp->if_softc; + /* + * If the asked address is found on a LINK0 interface + * don't answer the arp reply unless we are MASTER on it. + */ + if (!(sc->sc_if.if_flags & IFF_LINK0) && sc->sc_carpdev && + carp_addrcount((struct carp_if *)sc->sc_carpdev->if_carp, + (struct ifaddr *)ia, CARP_COUNT_LINK0)) + return (0); + if (carp_opts[CARPCTL_ARPBALANCE]) { /* * We use the source ip to decide which virtual host should @@ -1278,11 +1403,11 @@ carp_iamatch(struct in_ifaddr *ia, u_cha * the floor. */ - /* Count the elegible carp interfaces with this address */ + /* Count the eligible carp interfaces with this address */ if (*count == 0) *count = carp_addrcount( (struct carp_if *)ia->ia_ifp->if_carpdev->if_carp, - ia, CARP_COUNT_RUNNING); + (struct ifaddr *)ia, CARP_COUNT_RUNNING); /* This should never happen, but... */ if (*count == 0) @@ -1301,24 +1426,24 @@ carp_iamatch(struct in_ifaddr *ia, u_cha } #ifdef INET6 -struct ifaddr * -carp_iamatch6(void *v, struct in6_addr *taddr) +int +carp_iamatch6(struct ifnet *ifp, struct ifaddr *ifa) { - struct carp_if *cif = v; - struct carp_softc *vh; - struct ifaddr *ifa; + struct carp_softc *sc = ifp->if_softc; - TAILQ_FOREACH(vh, &cif->vhif_vrs, sc_list) { - TAILQ_FOREACH(ifa, &vh->sc_if.if_addrlist, ifa_list) { - if (IN6_ARE_ADDR_EQUAL(taddr, - &ifatoia6(ifa)->ia_addr.sin6_addr) && - ((vh->sc_if.if_flags & (IFF_UP|IFF_RUNNING)) == - (IFF_UP|IFF_RUNNING)) && vh->sc_state == MASTER) - return (ifa); - } - } + /* + * If the asked address is found on a LINK0 interface + * don't answer the arp request unless we are MASTER on it. + */ + if (!(sc->sc_if.if_flags & IFF_LINK0) && sc->sc_carpdev && + carp_addrcount((struct carp_if *)sc->sc_carpdev->if_carp, + ifa, CARP_COUNT_LINK0)) + return (0); - return (NULL); + if (sc->sc_state == MASTER) + return (1); + + return (0); } #endif /* INET6 */ @@ -1334,28 +1459,14 @@ carp_ourether(void *v, struct ether_head else ena = (u_int8_t *)&eh->ether_dhost; - switch (iftype) { - case IFT_ETHER: - case IFT_FDDI: - if (ena[0] || ena[1] || ena[2] != 0x5e || ena[3] || ena[4] != 1) - return (NULL); - break; - case IFT_ISO88025: - if (ena[0] != 3 || ena[1] || ena[4] || ena[5]) - return (NULL); - break; - default: - return (NULL); - break; - } - - TAILQ_FOREACH(vh, &cif->vhif_vrs, sc_list) - if ((vh->sc_if.if_flags & (IFF_UP|IFF_RUNNING)) == - (IFF_UP|IFF_RUNNING) && vh->sc_state == MASTER && - !bcmp(ena, vh->sc_ac.ac_enaddr, - ETHER_ADDR_LEN)) + TAILQ_FOREACH(vh, &cif->vhif_vrs, sc_list) { + if ((vh->sc_if.if_flags & (IFF_UP|IFF_RUNNING)) != + (IFF_UP|IFF_RUNNING)) + continue; + if ((vh->sc_state == MASTER || vh->sc_if.if_flags & IFF_LINK0) + && !bcmp(ena, vh->sc_ac.ac_enaddr, ETHER_ADDR_LEN)) return (&vh->sc_if); - + } return (NULL); } @@ -1370,7 +1481,9 @@ carp_input(struct mbuf *m, u_int8_t *sho bcopy(dhost, &eh.ether_dhost, sizeof(eh.ether_dhost)); eh.ether_type = etype; - if (m->m_flags & (M_BCAST|M_MCAST)) { + if ((ifp = carp_ourether(cif, &eh, m->m_pkthdr.rcvif->if_type, 0))) + ; + else if (m->m_flags & (M_BCAST|M_MCAST)) { struct carp_softc *vh; struct mbuf *m0; @@ -1388,7 +1501,6 @@ carp_input(struct mbuf *m, u_int8_t *sho return (1); } - ifp = carp_ourether(cif, &eh, m->m_pkthdr.rcvif->if_type, 0); if (ifp == NULL) return (1); @@ -1405,6 +1517,34 @@ carp_input(struct mbuf *m, u_int8_t *sho return (0); } +int +carp_lsdrop(struct mbuf *m, sa_family_t af, u_int32_t *src, u_int32_t *dst) +{ + struct carp_softc *sc = m->m_pkthdr.rcvif->if_softc; + int match; + u_int32_t fold; + + /* Never drop carp advertisements. + * XXX Bad idea to pass all broadcast / multicast traffic? + */ + if (m->m_flags & (M_BCAST|M_MCAST)) + return (0); + + fold = src[0] ^ dst[0]; +#ifdef INET6 + if (af == AF_INET6) { + int i; + for (i = 1; i < 4; i++) + fold ^= src[i] ^ dst[i]; + } +#endif + if (sc->sc_lscount == 0) /* just to be safe */ + return (1); + match = (1 << (ntohl(fold) % sc->sc_lscount)) & sc->sc_lsmask; + + return (!match); +} + void carp_master_down(void *v) { @@ -1627,31 +1767,45 @@ carp_set_ifp(struct carp_softc *sc, stru void carp_set_enaddr(struct carp_softc *sc) { - if (sc->sc_vhid == -1 || !sc->sc_carpdev) { - bzero(&sc->sc_ac.ac_enaddr, sizeof (sc->sc_ac.ac_enaddr)); + if (sc->sc_vhid != -1 && sc->sc_carpdev) { /* XXX detach ipv6 link-local address? */ - } else if (sc->sc_carpdev && sc->sc_carpdev->if_type == IFT_ISO88025) { - sc->sc_ac.ac_enaddr[0] = 3; - sc->sc_ac.ac_enaddr[1] = 0; - sc->sc_ac.ac_enaddr[2] = 0x40 >> (sc->sc_vhid - 1); - sc->sc_ac.ac_enaddr[3] = 0x40000 >> (sc->sc_vhid - 1); - sc->sc_ac.ac_enaddr[4] = 0; - sc->sc_ac.ac_enaddr[5] = 0; - } else { - sc->sc_ac.ac_enaddr[0] = 0; - sc->sc_ac.ac_enaddr[1] = 0; - sc->sc_ac.ac_enaddr[2] = 0x5e; - sc->sc_ac.ac_enaddr[3] = 0; - sc->sc_ac.ac_enaddr[4] = 1; - sc->sc_ac.ac_enaddr[5] = sc->sc_vhid; - } + if (sc->sc_carpdev->if_type == IFT_ISO88025) { + sc->sc_carplladdr[0] = 3; + sc->sc_carplladdr[1] = 0; + sc->sc_carplladdr[2] = 0x40 >> (sc->sc_vhid - 1); + sc->sc_carplladdr[3] = 0x40000 >> (sc->sc_vhid - 1); + sc->sc_carplladdr[4] = 0; + sc->sc_carplladdr[5] = 0; + } else { + if (sc->sc_if.if_flags & IFF_LINK2) + sc->sc_carplladdr[0] = 1; + else + sc->sc_carplladdr[0] = 0; + sc->sc_carplladdr[1] = 0; + sc->sc_carplladdr[2] = 0x5e; + sc->sc_carplladdr[3] = 0; + sc->sc_carplladdr[4] = 1; + sc->sc_carplladdr[5] = sc->sc_vhid; + } + } else + bzero(&sc->sc_carplladdr, ETHER_ADDR_LEN); - /* Make sure the enaddr has changed before further twiddling. */ - if (bcmp(&sc->sc_ac.ac_enaddr, LLADDR(sc->sc_if.if_sadl), - sc->sc_if.if_addrlen) != 0) { - bcopy(&sc->sc_ac.ac_enaddr, - LLADDR(sc->sc_if.if_sadl), sc->sc_if.if_addrlen); + /* + * Use the carp lladdr if the running one isn't manually set. + * Only compare static parts of the lladdr. + */ + if ((bcmp(sc->sc_ac.ac_enaddr + 1, sc->sc_carplladdr + 1, + ETHER_ADDR_LEN - 2) == 0) || + (!sc->sc_ac.ac_enaddr[0] && !sc->sc_ac.ac_enaddr[1] && + !sc->sc_ac.ac_enaddr[2] && !sc->sc_ac.ac_enaddr[3] && + !sc->sc_ac.ac_enaddr[4] && !sc->sc_ac.ac_enaddr[5])) + bcopy(sc->sc_carplladdr, sc->sc_ac.ac_enaddr, ETHER_ADDR_LEN); + /* Make sure the enaddr has changed before further twiddling. */ + if (bcmp(sc->sc_ac.ac_enaddr, sc->sc_curlladdr, ETHER_ADDR_LEN) != 0) { + bcopy(sc->sc_ac.ac_enaddr, LLADDR(sc->sc_if.if_sadl), + ETHER_ADDR_LEN); + bcopy(sc->sc_ac.ac_enaddr, sc->sc_curlladdr, ETHER_ADDR_LEN); #ifdef INET6 /* * (re)attach a link-local address which matches @@ -1659,6 +1813,8 @@ carp_set_enaddr(struct carp_softc *sc) */ in6_ifattach_linklocal(&sc->sc_if, NULL); #endif + carp_set_state(sc, INIT); + carp_setrun(sc, 0); } } @@ -1933,7 +2089,7 @@ carp_ioctl(struct ifnet *ifp, u_long cmd #endif /* INET */ #ifdef INET6 case AF_INET6: - sc->sc_if.if_flags|= IFF_UP; + sc->sc_if.if_flags |= IFF_UP; error = carp_set_addr6(sc, satosin6(ifa->ifa_addr)); break; #endif /* INET6 */ @@ -1961,6 +2117,9 @@ carp_ioctl(struct ifnet *ifp, u_long cmd sc->sc_if.if_flags |= IFF_UP; carp_setrun(sc, 0); } + carp_set_enaddr(sc); /* for changes on LINK2 */ + if (ifr->ifr_flags & IFF_LINK0) + carp_update_lsmask(sc); break; case SIOCSVH: @@ -1989,7 +2148,7 @@ carp_ioctl(struct ifnet *ifp, u_long cmd break; } } - if (carpr.carpr_vhid > 0) { + if (carpr.carpr_vhid > 0 && carpr.carpr_vhid != sc->sc_vhid) { if (carpr.carpr_vhid > 255) { error = EINVAL; break; @@ -2002,9 +2161,11 @@ carp_ioctl(struct ifnet *ifp, u_long cmd vr->sc_vhid == carpr.carpr_vhid) return (EINVAL); } - sc->sc_vhid = carpr.carpr_vhid; - carp_set_enaddr(sc); - carp_set_state(sc, INIT); + if (carpr.carpr_vhid != sc->sc_vhid) { + sc->sc_vhid = carpr.carpr_vhid; + carp_set_enaddr(sc); + carp_set_state(sc, INIT); + } error--; } if (carpr.carpr_advbase > 0 || carpr.carpr_advskew > 0) { @@ -2060,6 +2221,8 @@ carp_ioctl(struct ifnet *ifp, u_long cmd error = EINVAL; } + if (bcmp(sc->sc_ac.ac_enaddr, sc->sc_curlladdr, ETHER_ADDR_LEN) != 0) + carp_set_enaddr(sc); carp_hmac_prepare(sc); return (error); } @@ -2114,6 +2277,8 @@ carp_set_state(struct carp_softc *sc, in return; sc->sc_state = state; + carp_update_lsmask(sc); + switch (state) { case BACKUP: sc->sc_if.if_link_state = LINK_STATE_DOWN; Index: sys/netinet/ip_carp.h ================================================== ================= RCS file: /cvs/src/sys/netinet/ip_carp.h,v retrieving revision 1.21 diff -p -u -r1.21 ip_carp.h --- sys/netinet/ip_carp.h 2 Jun 2006 19:53:12 -0000 1.21 +++ sys/netinet/ip_carp.h 14 Nov 2006 07:10:40 -0000 @@ -159,11 +159,12 @@ void carp_group_demote_adj(struct ifne int carp6_proto_input(struct mbuf **, int *, int); int carp_iamatch(struct in_ifaddr *, u_char *, u_int32_t *, u_int32_t); -struct ifaddr *carp_iamatch6(void *, struct in6_addr *); +int carp_iamatch6(struct ifnet *, struct ifaddr *); struct ifnet *carp_ourether(void *, struct ether_header *, u_char, int); int carp_input(struct mbuf *, u_int8_t *, u_int8_t *, u_int16_t); int carp_output(struct ifnet *, struct mbuf *, struct sockaddr *, struct rtentry *); int carp_sysctl(int *, u_int, void *, size_t *, void *, size_t); +int carp_lsdrop(struct mbuf *, sa_family_t, u_int32_t *, u_int32_t *); #endif /* _KERNEL */ #endif /* _NETINET_IP_CARP_H_ */ Index: sys/netinet/ip_icmp.c ================================================== ================= RCS file: /cvs/src/sys/netinet/ip_icmp.c,v retrieving revision 1.72 diff -p -u -r1.72 ip_icmp.c --- sys/netinet/ip_icmp.c 3 Jan 2007 18:39:56 -0000 1.72 +++ sys/netinet/ip_icmp.c 16 Jan 2007 21:05:26 -0000 @@ -68,6 +68,8 @@ * Research Laboratory (NRL). */ +#include "carp.h" + #include <sys/param.h> #include <sys/systm.h> #include <sys/mbuf.h> @@ -86,6 +88,11 @@ #include <netinet/ip_var.h> #include <netinet/icmp_var.h> +#if NCARP > 0 +#include <net/if_types.h> +#include <netinet/ip_carp.h> +#endif + /* * ICMP routines: error generation, receive packet processing, and * routines to turnaround packets back to the originator, and @@ -450,6 +457,13 @@ icmp_input(struct mbuf *m, ...) printf("deliver to protocol %d\n", icp->icmp_ip.ip_p); #endif icmpsrc.sin_addr = icp->icmp_ip.ip_dst; +#if NCARP > 0 + if (m->m_pkthdr.rcvif->if_type == IFT_CARP && + m->m_pkthdr.rcvif->if_flags & IFF_LINK0 && + carp_lsdrop(m, AF_INET, &icmpsrc.sin_addr.s_addr, + &ip->ip_dst.s_addr)) + goto freeit; +#endif /* * XXX if the packet contains [IPv4 AH TCP], we can't make a * notification to TCP layer. @@ -521,6 +535,13 @@ icmp_input(struct mbuf *m, ...) ip->ip_src = ia->ia_dstaddr.sin_addr; } reflect: +#if NCARP > 0 + if (m->m_pkthdr.rcvif->if_type == IFT_CARP && + m->m_pkthdr.rcvif->if_flags & IFF_LINK0 && + carp_lsdrop(m, AF_INET, &ip->ip_src.s_addr, + &ip->ip_dst.s_addr)) + goto freeit; +#endif /* Free packet atttributes */ if (m->m_flags & M_PKTHDR) m_tag_delete_chain(m); @@ -563,6 +584,13 @@ reflect: } #endif icmpsrc.sin_addr = icp->icmp_ip.ip_dst; +#if NCARP > 0 + if (m->m_pkthdr.rcvif->if_type == IFT_CARP && + m->m_pkthdr.rcvif->if_flags & IFF_LINK0 && + carp_lsdrop(m, AF_INET, &icmpsrc.sin_addr.s_addr, + &ip->ip_dst.s_addr)) + goto freeit; +#endif rt = NULL; rtredirect(sintosa(&icmpsrc), sintosa(&icmpdst), (struct sockaddr *)0, RTF_GATEWAY | RTF_HOST, Index: sys/netinet/ip_input.c ================================================== ================= RCS file: /cvs/src/sys/netinet/ip_input.c,v retrieving revision 1.146 diff -p -u -r1.146 ip_input.c --- sys/netinet/ip_input.c 28 Dec 2006 20:06:10 -0000 1.146 +++ sys/netinet/ip_input.c 16 Jan 2007 21:05:27 -0000 @@ -33,6 +33,7 @@ */ #include "pf.h" +#include "carp.h" #include <sys/param.h> #include <sys/systm.h> @@ -65,6 +66,11 @@ #include <netinet/ip_ipsp.h> #endif /* IPSEC */ +#if NCARP > 0 +#include <net/if_types.h> +#include <netinet/ip_carp.h> +#endif + #define IPMTUDISCTIMEOUT (10 * 60) /* as per RFC 1191 */ struct ipqhead ipq; @@ -370,6 +376,15 @@ ipv4_input(m) m_adj(m, len - m->m_pkthdr.len); } +#if NCARP > 0 + if (m->m_pkthdr.rcvif->if_type == IFT_CARP && + m->m_pkthdr.rcvif->if_flags & IFF_LINK0 && + ip->ip_p != IPPROTO_ICMP && + carp_lsdrop(m, AF_INET, &ip->ip_src.s_addr, + &ip->ip_dst.s_addr)) + goto bad; +#endif + #if NPF > 0 /* * Packet filter @@ -462,6 +477,14 @@ ipv4_input(m) ip->ip_dst.s_addr == INADDR_ANY) goto ours; +#if NCARP > 0 + if (m->m_pkthdr.rcvif->if_type == IFT_CARP && + m->m_pkthdr.rcvif->if_flags & IFF_LINK0 && + ip->ip_p == IPPROTO_ICMP && + carp_lsdrop(m, AF_INET, &ip->ip_src.s_addr, + &ip->ip_dst.s_addr)) + goto bad; +#endif /* * Not for us; forward if possible and desirable. */ Index: sys/netinet6/icmp6.c ================================================== ================= RCS file: /cvs/src/sys/netinet6/icmp6.c,v retrieving revision 1.90 diff -p -u -r1.90 icmp6.c --- sys/netinet6/icmp6.c 9 Dec 2006 01:12:28 -0000 1.90 +++ sys/netinet6/icmp6.c 13 Dec 2006 14:16:53 -0000 @@ -61,6 +61,9 @@ * @(#)ip_icmp.c 8.2 (Berkeley) 1/4/94 */ +#include "faith.h" +#include "carp.h" + #include <sys/param.h> #include <sys/systm.h> #include <sys/malloc.h> @@ -91,7 +94,9 @@ #include <netinet6/in6_ifattach.h> #include <netinet6/ip6protosw.h> -#include "faith.h" +#if NCARP > 0 +#include <netinet/ip_carp.h> +#endif /* inpcb members */ #define in6pcb inpcb @@ -474,6 +479,14 @@ icmp6_input(mp, offp, proto) } #endif +#if NCARP > 0 + if (m->m_pkthdr.rcvif->if_type == IFT_CARP && + m->m_pkthdr.rcvif->if_flags & IFF_LINK0 && + icmp6->icmp6_type == ICMP6_ECHO_REQUEST && + carp_lsdrop(m, AF_INET6, ip6->ip6_src.s6_addr32, + ip6->ip6_dst.s6_addr32)) + goto freeit; +#endif icmp6stat.icp6s_inhist[icmp6->icmp6_type]++; switch (icmp6->icmp6_type) { Index: sys/netinet6/ip6_input.c ================================================== ================= RCS file: /cvs/src/sys/netinet6/ip6_input.c,v retrieving revision 1.74 diff -p -u -r1.74 ip6_input.c --- sys/netinet6/ip6_input.c 28 Dec 2006 20:08:15 -0000 1.74 +++ sys/netinet6/ip6_input.c 16 Jan 2007 21:05:28 -0000 @@ -62,6 +62,7 @@ */ #include "pf.h" +#include "carp.h" #include <sys/param.h> #include <sys/systm.h> @@ -109,6 +110,11 @@ #include <net/pfvar.h> #endif +#if NCARP > 0 +#include <netinet/in_var.h> +#include <netinet/ip_carp.h> +#endif + extern struct domain inet6domain; extern struct ip6protosw inet6sw[]; @@ -246,6 +252,14 @@ ip6_input(m) goto bad; } +#if NCARP > 0 + if (m->m_pkthdr.rcvif->if_type == IFT_CARP && + m->m_pkthdr.rcvif->if_flags & IFF_LINK0 && + ip6->ip6_nxt != IPPROTO_ICMPV6 && + carp_lsdrop(m, AF_INET6, ip6->ip6_src.s6_addr32, + ip6->ip6_dst.s6_addr32)) + goto bad; +#endif ip6stat.ip6s_nxthist[ip6->ip6_nxt]++; /* @@ -523,6 +537,14 @@ ip6_input(m) } #endif +#if NCARP > 0 + if (m->m_pkthdr.rcvif->if_type == IFT_CARP && + m->m_pkthdr.rcvif->if_flags & IFF_LINK0 && + ip6->ip6_nxt == IPPROTO_ICMPV6 && + carp_lsdrop(m, AF_INET6, ip6->ip6_src.s6_addr32, + ip6->ip6_dst.s6_addr32)) + goto bad; +#endif /* * Now there is no reason to process the packet if it's not our own * and we're not a router. Index: sys/netinet6/nd6_nbr.c ================================================== ================= RCS file: /cvs/src/sys/netinet6/nd6_nbr.c,v retrieving revision 1.42 diff -p -u -r1.42 nd6_nbr.c --- sys/netinet6/nd6_nbr.c 17 Nov 2006 01:11:23 -0000 1.42 +++ sys/netinet6/nd6_nbr.c 8 Dec 2006 07:17:15 -0000 @@ -193,10 +193,13 @@ nd6_ns_input(m, off, icmp6len) */ /* (1) and (3) check. */ #if NCARP > 0 - if (ifp->if_carp && ifp->if_type != IFT_CARP) - ifa = carp_iamatch6(ifp->if_carp, &taddr6); - if (!ifa) + if (ifp->if_type == IFT_CARP) { ifa = (struct ifaddr *)in6ifa_ifpwithaddr(ifp, &taddr6); + if (ifa && !carp_iamatch6(ifp, ifa)) + ifa = NULL; + } else { + ifa = (struct ifaddr *)in6ifa_ifpwithaddr(ifp, &taddr6); + } #else ifa = (struct ifaddr *)in6ifa_ifpwithaddr(ifp, &taddr6); #endif |
| Thread Tools | |
| Display Modes | |
|
|