So, recap. Last weekend was an awesome weekend, because it was FOSDEM. And J, my very good friend who is also the sysadmin for the platform where the server running the IRC is hosted, decided to finally upgrade the virtual networking system (Vmware NSX) that had been causing troubles for quite some time, with machines randomly losing networking. Because it was a routine upgrade, he decided to do it from the hotel between the conference and the dinner. I asked him "how long will the disruption last?" "Less than a second" was his answer.
Sadly, he turned out to be wrong. After upgrading NSX and rebooting the hosts, they would refuse to load anything related to networking, or hosting VMs or storage.
Dinner on saturday came and went, with him trying to fix the issues from his phone in the restaurant.
Sunday came, most of his day was spent trying to find the issue, including sitting in the passenger seat of a mutual friend's Subaru, trying to diagnose the issue over his phone while abroad, he burned through almost a gigabyte of data from RDP. By this time, another mutual friend, S, who, like J, is a VMware VCDX (basically the highest certification you can get from VMware), and is also one of the top ranking VMware NSX consultants in Europe, was also helping in trying to find the issue. At the end of the day, both had the feeling they were at least closing in on the issue, yet no idea on how to fix it yet.
Today, J and S spent most of their day telephoning with various VMware support technicians, and they finally homed in on the issue, and found out how to fix it. Which was around 17:00 CET.
So, tl;dr: An upgrade broke shit, and it took the combined might of two of the best VMware consultants, and half the VMware support staff to fix it. I was busy listening to open source hippies about cool projects and kida forgot to read the forums.