|
He wanted to verify that the settings reported by elysium's kernel were
consistent with what the switch it was connected to was reporting.
Connor doesn't like to admit that he uses AOL instant messenger, but he
does since everyone at his work uses it. It is really easy to contact
network and development staff using it. He sent an instant message to
the senior networking engineer, asking him to check out the switch. He
was afraid of a possible duplex mismatch. elysium was forced to 100 mbit
speed and full-duplex, as were all the switches. Mismatches sometimes
occur despite this fact when one end mistakenly falls back to a slower
speed or duplex setting. He posted the ifconfig output into the AIM
message window.
The network engineer didn't see what was wrong. Connor ran 'ping -s
123.123.123.1 &' to get a constant ping going against his default gatway
IP, and started snoop in non-promiscuous mode on the public interface:
# snoop -P -d hme1
He usually uses non-promiscuous mode so that he only sees packets
destined for the host he is on (and broadcast packets of course). If he
ends up needing complex expressions he knows how to use them, but usually
doesn't need to.
Connor saw lots of broadcast packets, but nothing to or from elysium.
Weird. He started pinging elysium from another host on the same segment
in the same datacenter, using another window in his screen session. He
loves the screen program.
Once he started pinging from the second host he started seeing ARP
requests going out for elysium's hardware address:
123.123.123.10 -> (broadcast) ARP C Who is 123.123.123.123, elysium ?
...but no replies from elysium. Now he was getting somewhere, he knew
that for some reason things were going wrong at the ARP protocol level.
snoop is a great program since it understands common network protocols
and uaually gives meaningful output. snoop doesn't relieve the admin of
the need to understand how networks work, however, since it simply
gathers information. It is up to the admin to interpret all the data
collected.
He wasn't really sure what would cause ARP functionality to simply fail,
other than the patch cluster installing a patch which either didn't
install correctly or simply didn't work for some reason.
He wanted to test the interfaces, and see if they worked if static ARP
entries were entered with the 'arp' command. Connor entered a static
entry for a Linux server on the same segment as elysium:
# arp -s 123.123.123.10 00:60:4B:B1:C1:8C
...and on the Linux server the arp command had the same syntax:
# arp -s 123.123.123.123 8:0:20:e1:ca:6f
Connor tried pinging elysium from the Linux host, and it worked! He was
very happy. He now thought that the public interface worked correctly
except for it's ability to use the ARP protocol. He told the network
engineer, who said that they should add static ARP entries on elysium
and it's gatways on both interfaces.
Connor really liked this idea, as it would get the host talking to every
host on the internet again except hosts on the local LAN, which it
hardly had to do (only for DNS resolution, which could also be
statically mapped or temporarily retrieved from DNS caches on another
network segment).
Connor and the network guru added entries to their ARP tables, but with
no success. Connor still couldn't ping either gateway. Neither of them
knew what do to from there. Connor thought about it for a little while,
and posted to the sun-managers mailing list to see if any Sun gurus had
any ideas.
He started backing out patches that were installed with the patch
cluster. He found a ARP patch, but it wouldn't allow itself to be
removed. The same went for a driver for the network cards installed in
the host. He successfully removed the kernel patch and another
network-related patch, but neither patch removal (and subsequent reboot)
changed the situation in any way.
Connor has done a lot of networking with Linux, including using it as a
bridge device by employing proxy arp. He has been able to firewall
networks and hosts this way without having to explicitly set up routing
through the Linux firewall hosts. He decided to connect each interface
on elysium to a Linux host with two interfaces, and use proxy ARP on the
Linux box to get elysium back onto the network.
Someone from the sun-managers mailing list suggested using proxy ARP on
another UNIX host to get elysium back on the network, so Connor knew he
was on the right track.
|