The Balboa Park Online Collaborative (BPOC) is a technology and strategy non-profit, who also serves the cultural institutions in San Diego’s Balboa Park, with a suite of IT and digital needs. BPOC’s business model is based on leveraging the economy of scale to offer quality products and services at an affordable price, in some instances providing services on a means-tested basis. One of our key offerings is wired and wireless internet connectivity which enters the park via a single-entry point and feeds into the park’s high-bandwidth fiber network which is owned and operated by us. The Park itself, is an urban cultural park that is home to 27 cultural institutions, of which, twelve are collecting museums plus performing arts organizations including the Old Globe Theater, the San Diego Youth Ballet and the San Diego Youth Symphony. The Park and buildings are city-owned under the management of Parks and Recreation, and in return for a rent-free lease, the buildings operate as public services.
As an IT services organisation, BPOC is proud of its track record of uptime for both connectivity and file and application hosting services, out of its onsite server room. Part of this successful track record is due to the fact that most cultural and arts organisations are not early birds, so critical business operating times run from 10am to 5pm on weekdays and a little later on weekends, meaning that power and connectivity outages which happen very infrequently, but are prone to occur when road or construction crews begin their day. Consequently, BPOC normally has two to three hours to troubleshoot any issue.
Any connectivity downtime is normally due to a small internal problem such as a faulty network switch, or a large external problem such as a substation power outage or a cable service interruption. So, when a few months back, we received a 3:55am notification from our monitoring alerts that some of our hosted websites were down, it looked like a typical cable-provider problem and a standard mitigation procedure would be enacted: submit a ticket to the cable company, get onsite before 7am to ensure a smooth network startup.
However, on arrival to the server room building (a city-owned property) it was clear that this event was a little unusual. The building was without power, despite power obviously to the rest of the Park, our electronic security cards would not work to give us access, so we needed the city or the building operations staff to come onsite. When our cable provider responded with no issue on their side, and the power company indicated no breakers had been tripped, we knew this was an unusual event. What we discovered was that the building’s main power had simply been switched off, alarming because it is inside the building’s electrical closet which is only accessible with a city-owned or power company-owned key, there was no vandalism or forced entry into the closet. In this scenario, the power company’s procedure requires confirming that no-one is working on the line, the City was also not well prepared for this scenario, so confusion abounded and lengthy discussions ensued which took some time, and so for the first time in a number of years, the magical hour of 10am arrived with no Park connectivity.
With power, but no connectivity, our institutions were in somewhat uncharted territory and were stumped on how to handle admissions, performances and classes. Our institutions mostly have hosted ticketing or booking systems, and no procedures in place for manual cash transactions, so they chose to remain closed until connectivity was restored. In the following days of debriefing the incident, it became clear that this malicious act was not an isolated incident, but other buildings had had their power switched off, by someone with keys. The culprit is still out there, and we are unsure whether they knew their act would affect the whole park in this way, so we are all extra vigilant. What did we learn? One minor issue was that our monitoring system was unable to distinguish between a network outage and a building power outage, now corrected. We also learned that external issues such as a cable provider or power outage, require minimal coordination, their resolution is out of our control after an initial notification, they are pro-forma mitigations. But this internal issue of mal-intent was unexpected and required a significant amount of coordination and discussion to resolve and understand.
But the event brought into focus a much larger issue for my organization and our responsibilities. Our successful track record of connectivity uptime means we are great partners who provide great service, but has in some respects, increased the vulnerability of our clients. Their faith in the technology and service we provide meant that they have had little need to plan for, or document, temporary business interruptions. They have a false sense of security.
Increasingly, connectivity is regarded in the same light as electricity. It’s just there. But as more and more critical business operations move into the cloud, their operating vulnerability increases while their ability to operate without connectivity decreases. It appears that our problem readiness only accounts for larger-scale disasters and nothing in between.
Our clients have had a good reminder that a portion of their revenue is reliant on network connectivity. Vendors who advise cloud service versions of their products, mainly sell them on the maintenance and support convenience, with only a nod to uptime connectivity, and while I have been trying to implement a redundant connection in the Park, I’ve had limited success, because our connectivity has been so consistent. However, this event has got their attention, so hopefully they will be more receptive, although I think we’ll need a few more connectivity issues to push the cost-benefit calculation in our favour, so a very cost-effective solution is required.
As a technologist, I have a healthy scepticism for technology. I took a module on data transmission and it never ceases to amaze me that packet switching actually works as well as it does.
Cultural organisations are risk-averse, and this often means they don’t discuss risks at all apart from the major ones, and this is a problem. Some form of risk analysis should be a key piece of any project or initiative, whether operational or programmatic, but especially critical business systems, and decisions to move to the cloud should consider the consequence of any downtime and have procedures or protocols in place to continue. Non-profits have lean budgets, so all the more reason to periodically revue revenue and operational risk.
So what is my cost-effective redundant solution? Mobile hotspots: 4G LTE mobile internet service beacons deployed to each admissions desk in the event of internet interruption. Well, that was unexpected.
Nik Honeysett is CEO of BPOC, a San Diego-based, non-profit consultancy that provides technology support and development services, and business and digital strategy for the cultural sector. Previously, he was Head of Administration for the Getty Museum. He is a former board member of the American Alliance of Museums and Museum Computer Network, and currently sits on the board of Guru, a mobile experience startup supporting the cultural, attraction and sports sectors. He is a frequent speaker on issues of organizational and digital strategy, and is adjunct faculty for Johns Hopkins and the Getty Leadership Institute teaching digital strategy and technology management.