[Solved] Proxmox Alpine Linux LXC Container IPv6 Network Failure
Can We Solve This IPv6 Networking Problem?
Our server is an i9-9900K at Hetzner. Eight LXC containers are running various Linux distributions on Proxmox-VE 7.3. Everything seems to work great! . . . Except inside one Alpine Linux container. Somehow, IPv6 doesn't work from within this one container. Nevertheless, IPv6 ping to the container somehow seems to work from the Proxmox node. Amazingly, ping also works to the container even from the Wide Area Network (WAN).
How could such strangeness happen? Let's investigate!
A Closer Look At The Problem
The network configuration for all the containers was added individually by hand in the Proxmox GUI. Proxmox numbers LXC containers beginning with the number "100". The containers are IPv6 only. Each container was given a static IPv6 address ending with the container's Proxmox container number.
The container with the IPv6 network problem is number 102. An example of the working containers is number 106. Number 106 was chosen for comparison because both container 102 and container 106 were built from the same Alpine Linux LXC image. The other working containers are Ubuntu and Debian.
Alpine Linux seems less likely to be causing the IPv6 network failure because container 106, also built from the Alpine image, is functioning just fine. So let's look directly at the misbehaving container, number 102.
Container 102 can ping localhost.
root@Proxmox-VE ~ # lxc-attach -n 102
~ # ping6 -c 2 localhost
PING localhost (::1): 56 data bytes
64 bytes from ::1: seq=0 ttl=64 time=0.024 ms
64 bytes from ::1: seq=1 ttl=64 time=0.036 ms
--- localhost ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.024/0.030/0.036 ms
~ #
Container 102 also responds to ping from external WAN.
[opc@instance-20220717-1620 ~]$ ping -c 2 2a01:4f8:121:24cc::102
PING 2a01:4f8:121:24cc::102(2a01:4f8:121:24cc::102) 56 data bytes
64 bytes from 2a01:4f8:121:24cc::102: icmp_seq=1 ttl=51 time=154 ms
64 bytes from 2a01:4f8:121:24cc::102: icmp_seq=2 ttl=51 time=154 ms
--- 2a01:4f8:121:24cc::102 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 153.966/154.042/154.118/0.076 ms
[opc@instance-20220717-1620 ~]$
However, from inside container 102, it can't ping its DNS servers, or the WAN.
root@Proxmox-VE ~ # lxc-attach -n 102
~ # cat /etc/resolv.conf
# --- BEGIN PVE ---
search metalvps.com
nameserver 2a01:4ff:ff00::add:2
nameserver 2001:470:20::2
# --- END PVE ---
~ # ping6 -c 2 2a01:4ff:ff00::add:2
PING 2a01:4ff:ff00::add:2 (2a01:4ff:ff00::add:2): 56 data bytes
--- 2a01:4ff:ff00::add:2 ping statistics ---
2 packets transmitted, 0 packets received, 100% packet loss
~ # ping6 -c 2 2001:470:20::2
PING 2001:470:20::2 (2001:470:20::2): 56 data bytes
--- 2001:470:20::2 ping statistics ---
2 packets transmitted, 0 packets received, 100% packet loss
~ # ping6 -c 2 metalvps.com
ping6: bad address 'metalvps.com'
~ # ping6 -c 2 ipv6.google.com
ping6: bad address 'ipv6.google.com'
~ # exit
By contrast, from inside working container 106, we can ping both the DNS servers and the WAN.
root@Proxmox-VE ~ # lxc-attach -n 106
~ # cat /etc/resolv.conf
# --- BEGIN PVE ---
search metalvps.com
nameserver 2a01:4ff:ff00::add:2
nameserver 2001:470:20::2
# --- END PVE ---
~ # ping6 -c 2 2a01:4ff:ff00::add:2
PING 2a01:4ff:ff00::add:2 (2a01:4ff:ff00::add:2): 56 data bytes
64 bytes from 2a01:4ff:ff00::add:2: seq=0 ttl=61 time=0.446 ms
64 bytes from 2a01:4ff:ff00::add:2: seq=1 ttl=61 time=0.274 ms
--- 2a01:4ff:ff00::add:2 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.274/0.360/0.446 ms
~ # ping6 -c 2 2001:470:20::2
PING 2001:470:20::2 (2001:470:20::2): 56 data bytes
64 bytes from 2001:470:20::2: seq=0 ttl=57 time=18.516 ms
64 bytes from 2001:470:20::2: seq=1 ttl=57 time=18.552 ms
--- 2001:470:20::2 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 18.516/18.534/18.552 ms
~ # ping6 -c 2 metalvps.com
PING metalvps.com (2603:c020:3:a9a9::250): 56 data bytes
64 bytes from 2603:c020:3:a9a9::250: seq=0 ttl=49 time=151.895 ms
64 bytes from 2603:c020:3:a9a9::250: seq=1 ttl=49 time=151.856 ms
--- metalvps.com ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 151.856/151.875/151.895 ms
~ # ping6 -c 2 ipv6.google.com
PING ipv6.google.com (2a00:1450:4001:830::200e): 56 data bytes
64 bytes from 2a00:1450:4001:830::200e: seq=0 ttl=118 time=5.477 ms
64 bytes from 2a00:1450:4001:830::200e: seq=1 ttl=118 time=5.611 ms
--- ipv6.google.com ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 5.477/5.544/5.611 ms
~ #
Could The Firewall Cause This Problem?
The problem with container 102 probably is not a firewall drop egress issue because there aren't any egress rules in the container's zone and the default egress policy is ACCEPT. Also, the container's internal IPv6 egress problem still happens if container 102's firewall zone is set entirely off.
The ip
Command
The ip
command from the iproute2 suite provides much information about Linux networking. As the ip(8) man page explains, we can use ip
in the following syntax: ip [OPTIONS] OBJECT { COMMAND | help }
. In other words, to use ip
we need to specify three items: an OPTION, an OBJECT, and a COMMAND.
For IPv4, the usual OPTION is nothing because ip
defaults to IPv4. But, if we wish to specify the IPv4 OPTION explicitly, the option could be specified as "-family inet". The IPv4 OPTION can be abbreviated to "ip -f inet" or just "ip -4".
For IPv6, we use the OPTION "ip -family inet6". The IPv6 OPTION can be abbreviated to "ip -f inet6" or just "ip -6".
The OBJECT can be, for example, "link", "address", or "route". A "link" is a hardware or virtualized interface connecting our machine to a network. The "address" is the numerical IP address or addresses we assigned to our link. The "route" is the IP address where the kernel should send network packets initiated by programs running on our machine. "link", "address", and "route" can be abbreviated as "l", "a", and "r", respectively. Longer abbreviations also work. For example, "addr" commonly is used as an abbreviation for "address".
Frequently, when something happens involving networking, it can be very helpful to check the output of the ip
command. Let's look with the ip
command and see what we can find!
The Error Message
Checking the output of the ip -family inet6 address show
command inside container 102 reveals "tentative dadfailed" as follows.
root@Proxmox-VE ~ # lxc-attach -n 102
~ # ip -f inet6 addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 state UNKNOWN qlen 1000
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0@if45: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 state UP qlen 1000
inet6 2a01:4f8:121:24cc::102/64 scope global tentative dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::d8dd:65ff:fe45:70f2/64 scope link
valid_lft forever preferred_lft forever
~ # exit
root@Proxmox-VE ~ #
Output of the same command from inside container 106 where networking works fine does not include "tentative dadfailed."
root@Proxmox-VE ~ # lxc-attach -n 106
~ # ip -f inet6 addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 state UNKNOWN qlen 1000
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0@if28: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 state UP qlen 1000
inet6 2a01:4f8:121:24cc::106/64 scope global
valid_lft forever preferred_lft forever
inet6 fe80::64ce:78ff:fe85:15fd/64 scope link
valid_lft forever preferred_lft forever
~ #
Dadfailed
Some quick googling found a page called IPv6 'dadfailed' Problem. The problem there involved reassigning an address previously assigned to another machine which failed without the address being deleted. However, down toward the bottom of the page, we find key information: the "dad" in "dadfailed" is "Duplicate Address Detection."
No Mistake With Container 102's Creation
But, even after reading about Duplicate Address Detection, I still did not immediately find the solution. I was puzzled about how, like the dadfailed page that Google showed me, some machine possibly could be down but with its address still present on my clean, new server with all its containers working fine except for this single itty bitty networking issue inside one container. Nothing was down. All the containers had their own addresses, and all the containers successfully started and continued running.
I kept looking and looking for some mistake I somehow might have made while setting up container 102. I checked and recheckd the screenshots that recorded container 102's creation. But everything looked okay!
Asking For Help
Next I talked with a well known hosting provider about what was happening. I also wrote up a LES post asking for help with the puzzling issue. It was starting to get late by the time I finished drafting the LES post. I decided to sleep. I deferred posting to LES Help until the next morning.
When I woke up, however, the idea was in my mind that possibly I mistakenly also had assigned the 102 address to another container. Such a mistake didn't seem likely, though, because there were only 8 containers, and the container numbers, 100, 101, 102, 103, 104, 105, 106, and 107 seemed understandable. I decided to check all the other containers besides 102 to make sure no additional container inadvertently and duplicatively also had been assigned the 102 address.
Checking All The Containers
- Container 100
First I checked container 100. Container 100 was the test container that had been checked by @yoursunny and found to be "very strong." Container 100 seemed to have been assigned the correct IP address ending in 100.
- Container 101
Uh! Oh! Here it is! Container 101 got the 102 address!!
How Did The Error Happen?
Container 101 was the second container that was created. Obviously it should have an IP address ending with 2, right? Well, not when the first address ended with 0.
What About The Node And WAN Responses To Ping?
How could container 102 respond to a ping from external WAN while seeming unable to initiate a ping to an external WAN destination? Container 101 already had been created with the 102 address. So, the ping responses came from container 101.
How Do We Fix container 102?
All that's needed to fix container 102 is to change its address to any that is not already in use. I tried the 101 address, and that address immediately worked fine.
More Information About Dadfailed
The ip
command has a manual page which doesn't mention dadfailed. However, the ip
man page refers to the ip-address
man page, which does mention dadfailed as an addition to the ip address show
command.
ip address show - look at protocol addresses
[ . . . ]
dadfailed
(IPv6 only) only list addresses which have failed duplicate ad‐
dress detection.
-dadfailed
(IPv6 only) only list addresses which have not failed duplicate
address detection.
For example, inside the Linux container on my Chromebook:
chronos@penguin:~/log$ ip -6 address show dadfailed
chronos@penguin:~/log$ ip -6 address show -dadfailed
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 state UNKNOWN qlen 1000
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
5: eth0@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
inet6 fe80::216:3eff:fe43:22fb/64 scope link
valid_lft forever preferred_lft forever
chronos@penguin:~/log$
Dadfailed In The RFCs
Duplicate Address Detection failure is discussed in section 5.4.5 of RFC 4862 which says:
5.4.5. When Duplicate Address Detection Fails
A tentative address that is determined to be a duplicate as described above MUST NOT be assigned to an interface, and the node SHOULD log a system management error.
Further discussion of Duplicate Address Detection occurs in Appendix A of RFC 4862 and in RFC 7527 Enhanced Duplicate Address Detection.
Dadfailed In The Logs
Indeed, the node did log a system management failure just as RFC 4862 said it should:
root@Proxmox-VE /var/log # zcat messages.2.gz | grep "duplicate address"
Jan 10 03:01:27 Proxmox-VE kernel: [97048.519242] IPv6: eth0: IPv6 duplicate address 2a01:4f8:121:24cc::102 used by 92:17:7a:22:c1:e2 detected!
[ . . . ]
Jan 13 06:44:36 Proxmox-VE kernel: [ 141.636572] IPv6: eth0: IPv6 duplicate address 2a01:4f8:121:24cc::102 used by 92:17:7a:22:c1:e2 detected!
root@Proxmox-VE /var/log #
Source Code
Recently I started to look at C source code. I don't know hardly anything about C, so looking at source code always is an adventure.
Above, we previously saw the ip
command print the following "dadfailed" output:
~ # ip -f inet6 addr show
[ . . . ]
2: eth0@if45: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 state UP qlen 1000
inet6 2a01:4f8:121:24cc::102/64 scope global tentative dadfailed
valid_lft forever preferred_lft forever
[ . . . ]
Can we find the source code which printed the dadfailed error? Finding the printing code sounds simple enough. Probably the printing code is nowhere near as complicated as the code which checked for and found the dadfailed error, right?
We saw the dadfailed error notification inside an Alpine container. Possibly that error notification came from iproute2 code running inside Busybox inside the Alpine container. Nevertheless, let's start by imagining how we might get the iproute2 source code that Proxmox itself uses. I'm unsure the method here gets the sources correctly, but we can try. First come the Proxmox sources, which seem to include the Debian sources, and then we get what might be the upstream sources.
root@Proxmox-VE ~ # git clone git://git.proxmox.com/git/iproute2.git
Cloning into 'iproute2'...
remote: Enumerating objects: 187, done.
remote: Total 187 (delta 0), reused 0 (delta 0), pack-reused 187
Receiving objects: 100% (187/187), 11.25 MiB | 88.63 MiB/s, done.
Resolving deltas: 100% (83/83), done.
root@Proxmox-VE ~ #
cd iproute2/
root@Proxmox-VE ~/iproute2 # ls
debian iproute2 Makefile README
root@Proxmox-VE ~/iproute2 # cat README
We compile our own package to make sure we always have the
latest version and bug fixes.
root@Proxmox-VE ~/iproute2 # ls debian/
changelog copyright iproute2-doc.examples iproute2.links README.Debian source
compat doc iproute2-doc.install iproute2.manpages README.source
control iproute2-doc.docs iproute2.install patches rules
root@Proxmox-VE ~/iproute2 # ls iproute2/ #The iproute2/iproute2 directory is empty.
root@Proxmox-VE ~/iproute2 # git clone git://git.kernel.org/pub/scm/network/iproute2/iproute2.git
Cloning into 'iproute2'...
remote: Enumerating objects: 313, done.
remote: Counting objects: 100% (313/313), done.
remote: Compressing objects: 100% (289/289), done.
remote: Total 33973 (delta 60), reused 37 (delta 20), pack-reused 33660
Receiving objects: 100% (33973/33973), 8.34 MiB | 42.06 MiB/s, done.
Resolving deltas: 100% (25490/25490), done.
root@Proxmox-VE ~/iproute2 #
Let's look inside the iproute2/iproute2 directory and then inside the iproute2/iproute2/ip directory.
root@Proxmox-VE ~/iproute2 # cd iproute2/
root@Proxmox-VE ~/iproute2/iproute2 # ls
bash-completion COPYING doc genl lib misc README tc vdpa
bridge dcb etc include Makefile netem README.devel testsuite
configure devlink examples ip man rdma schema tipc
root@Proxmox-VE ~/iproute2/iproute2 # cd ip
root@Proxmox-VE ~/iproute2/iproute2/ip # ls
ila_common.h iplink.c iplink_vlan.c ipprefix.c link_vti.c
ip6tunnel.c iplink_can.c iplink_vrf.c iproute.c link_xfrm.c
ipaddress.c iplink_dsa.c iplink_vxcan.c iproute_lwtunnel.c Makefile
ipaddrlabel.c iplink_dummy.c iplink_vxlan.c iprule.c nh_common.h
ip.c iplink_geneve.c iplink_wwan.c ipseg6.c routel
ip_common.h iplink_gtp.c iplink_xdp.c ipstats.c rtm_map.c
ipfou.c iplink_hsr.c iplink_xstats.c iptoken.c rtmon.c
ipila.c iplink_ifb.c ipmacsec.c iptunnel.c static-syms.c
ipioam6.c iplink_ipoib.c ipmaddr.c iptuntap.c tcp_metrics.c
ipl2tp.c iplink_ipvlan.c ipmonitor.c ipvrf.c tunnel.c
iplink_amt.c iplink_macvlan.c ipmptcp.c ipxfrm.c tunnel.h
iplink_bareudp.c iplink_netdevsim.c ipmroute.c link_gre6.c xfrm.h
iplink_batadv.c iplink_nlmon.c ipneigh.c link_gre.c xfrm_monitor.c
iplink_bond.c iplink_rmnet.c ipnetconf.c link_ip6tnl.c xfrm_policy.c
iplink_bond_slave.c iplink_team.c ipnetns.c link_iptnl.c xfrm_state.c
iplink_bridge.c iplink_vcan.c ipnexthop.c link_veth.c
iplink_bridge_slave.c iplink_virt_wifi.c ipntable.c link_vti6.c
root@Proxmox-VE ~/iproute2/iproute2/ip #
It was the address OBJECT which drew our interest, so let's look at ipaddress.c.
root@Proxmox-VE ~/iproute2/iproute2/ip # cat -n ipaddress.c | grep dadfailed
68 " [-]tentative | [-]deprecated | [-]dadfailed | temporary |\n"
1411 { .name = "dadfailed", .mask = IFA_F_DADFAILED, .readonly = true, .v6only = true},
root@Proxmox-VE ~/iproute2/iproute2/ip #
Line 68 seems to be part of the error message which is printed when ip
doesn't understand the arguments given to it on the command line.
If we back up from line 1411 to line 1400, we can see the comment reproduced here, just below. Also, a little further down, at lines 1451 and 1452, there's code involving both "print" and "name." I'm guessing that maybe lines 1451 and 1452 might be responsible for printing "tentative" and "dadfailed."
1400 /* Mapping from argument to address flag mask and attributes */
1401 static const struct ifa_flag_data_t {
[ . . . ]
1411 { .name = "dadfailed", .mask = IFA_F_DADFAILED, .readonly = tru
e, .v6only = true},
[ . . . ]
1414 { .name = "tentative", .mask = IFA_F_TENTATIVE, .readonly = tru
e, .v6only = true},
[ . . . ]
1451 print_string(PRINT_FP, NULL,
1452 "%s ", flag_data->name);
Finally, glancing at the Busybox ip
command, the relevant source code seems to be ip.c. The usage message for ip address show
is on lines 139 to 149.
139 //usage:#define ipaddr_trivial_usage
140 //usage: "add|del IFADDR dev IFACE | show|flush [dev IFACE] [to PREFIX]"
141 //usage:#define ipaddr_full_usage "\n\n"
142 //usage: "ipaddr add|change|replace|delete dev IFACE [CONFFLAG-LIST] IFADDR\n"
143 //usage: " IFADDR := PREFIX | ADDR peer PREFIX [broadcast ADDR|+|-]\n"
144 //usage: " [anycast ADDR] [label STRING] [scope SCOPE]\n"
145 //usage: " PREFIX := ADDR[/MASK]\n"
146 //usage: " SCOPE := [host|link|global|NUMBER]\n"
147 //usage: " CONFFLAG-LIST := [CONFFLAG-LIST] CONFFLAG\n"
148 //usage: " CONFFLAG := [noprefixroute]\n"
149 //usage: "ipaddr show|flush [dev IFACE] [scope SCOPE] [to PREFIX] [label PATTERN]"
There is more on lines 341 to 347.
341 #if ENABLE_IPADDR
342 int ipaddr_main(int argc, char **argv) MAIN_EXTERNALLY_VISIBLE;
343 int ipaddr_main(int argc UNUSED_PARAM, char **argv)
344 {
345 return ip_do(do_ipaddr, argv);
346 }
347 #endif
ip_do
is defined on lines 334 to 338.
334 static int ip_do(ip_func_ptr_t ip_func, char **argv)
335 {
336 argv = ip_parse_common_args(argv + 1);
337 return ip_func(argv);
338 }
Hopefully I soon will be able to understand more about how the iproute2 and Busybox code works to print the dadfailed error. Probably someone here at LES will be kind enough to provide some hints! Thanks very much! It is a lot of fun to learn a little about dadfailed!
Comments
When I started reading this I thought, how do you know it's container 102 responding and not something else responding to those pings.
Just from looking at the code snippets you posted it looks like the
ip
command gets the mask from the kernel and looks it up in theifa_flag_data_t
table to map it to the.name
and print that. Have you tried running the command withstrace
yet - the flag will probably show up there as well... (don't really have much time right now to look further into it as I am just preparing to head off to the ISO C++ standards committee meeting in Issaquah next week...)BTW, found some reasonable good use for my container to use it for compiling Haiku OS and fixing a few IPv6 bugs in the Haiku OS kernel.
Yep! You were on it right away!
I haven't used
strace
yet. Time to start soon!Here's wishing you a fun time in Issaquah!
Thanks for telling me this delightful news! Glad to hear!
MetalVPS
Edit to OP: Add link to ipaddress.c at git.kernel.org
MetalVPS
I found a nice article about DAD. Please see IPv6 Duplicate Address Detection, by Enno Rey
Except for the image on the home page, which I think could be removed, I really liked the simple design of Enno's blog, which runs on wordpress.com and has comments.
Here are a few links I found about Enno:
LinkedIn
ERNW | Enno Rey Netzwerke GmbH
Operational Security Considerations for IPv6 Networks RFC 9099
MetalVPS
A friend suggested the following regarding ipaddress.c:
1411 and also 1400-1452:
This creates a static array of structures.
The first part defines the structure type itself. It is just a type definition:
The next part creates the static array of those structures and assigns it to
"ifa_flag_data[]"
In other words, it creates an array of structures that it uses in other functions for printing/displaying and for comparisons to find which element of the array has a matching name, among other things. The fifth element of that array, ifa_flag_data[4], is a structure of type "ifa_flag_data_t" with the values:
MetalVPS