Take it Easy

Four years ago I was facing one of the biggest challenges of my career. We had made the tough decision to bow out of a longstanding technology consortium which hosted a number of services for our organization and member schools. Costs were going up, and we knew we had to take it upon ourselves to find a way out, and soon.

I started making a list of all the services we needed to migrate to new solutions. Some were easier than others. One of the most critical services that I was struggling with was DNS. The acronym DNS stands for Domain Name System. It’s the technology that translates every web URL you type to a numerical IP address. For example, when you go to wadegibson.com, DNS is translating that to this web server at 192.163.202.246. Without DNS the internet would be a jumbled mess of unintelligible numbers. It’s a fundamental underpinning of nearly every technology we use today.

So, what does any unknowing tech guy do when you are trying to find a solution to a problem? You Google it, of course! I started searching for managed DNS providers and read every article or review I could get my hands on. Four years ago the market space was fairly limited in this area, which helped narrow things down quickly. One company name continued to bubble to the top with glowing positive reviews – DNS Made Easy.

I commend their team for coming up with such an ingenious name, which couldn’t be more fitting. They have really hit a home run with this service. Their interface is unbelievably simple and easy to use, they have great help articles and videos, and their price….it will shock you how inexpensive it is. We have 72 domains managed in our DNS Made Easy account, and it only costs us $166. Per year. My colleagues always ask, “You mean per month, don’t you?” No. That’s really all we pay. I can’t even pay for the electricity and cooling for one server at $166 per year, let alone buy hardware, maintain the operating systems, and everything else that goes along with on-premise hosting.

This was a no-brainer, and truly one of the best decisions I’ve made. I’m such a fan of DNS Made Easy that I practically evangelize the product every chance I get. A number of my ESU colleagues across the state have now adopted DNS Made Easy in their environments as well. I even have a personal DNS Made Easy account for all of my own domains.

Finding DNS Made Easy was a pivotal moment. It proved to me, and others, that we could find affordable, first-class solutions to meet our needs. Having solutions like DNS Made Easy makes it easier for me to focus on delivering quality services without having to worry about our infrastructure.

Ticket to Ride

Much like a Network Management System, a key component of any IT infrastructure includes a support ticketing system. There are hundreds of applications available to meet this need, all with varying features, as well as costs.

For the past ten years, we have been using a product called Web Help Desk. At the time it was one of our better options, offering most of the features we needed at an attractive price. However, over the years the application has dated itself, and it still looks and functions just like it did ten years ago. It had gotten to the point that for the past few years we only used it when we needed to track a ticket for billing purposes. Everything else was handled solely via direct email and phone communications.

As a two-man support team we had managed to make this model work, but we lacked any data to quantify the scope and breadth of the work we do on a daily basis. I knew we needed to make a change and begin tracking all of our support requests through a more modern ticketing system. In early December we fired up a trial of a product called Freshdesk. I was familiar with the Freshworks family of applications from a few years back when I dabbled with their accounting platform, Freshbooks. I knew they had a solid reputation in the industry and their interfaces were always clean and simple. Unfortunately, our trial expired over the holiday break before we had a chance to really dig in under the hood. Luckily a quick call got our trial extended by a few days so we could see if Freshdesk would be a good fit. Within 24 hours I had issued a PO and we were rocking and rolling with Freshdesk.

Since we were moving from an email-based reporting system, the ability to have forwarded emails converted to tickets was crucial. Freshdesk handles this easily, and even finds the original sender of the message to mark as the requestor. Additionally, Freshdesk allowed us to create Companies for each of our Districts, and anytime it sees a new user from a specific @schooldomain.org address, it automatically associates them with that Company. On top of that, users can easily login with their Google account to create tickets and view the status of any existing tickets. This is very slick, and makes user provisioning a breeze!

So far things are progressing nicely. I’ve already been shocked at the sheer volume of requests that we field every day. It’s taken a little bit to get acclimated to the process of logging phone calls, walk-ins, etc., in addition to traditional email requests. Luckily the process is quick and easy, and I’m confident that the data we are gathering will be invaluable as we move forward.

Be Nice…Until It’s Time to Not Be Nice

You may remember earlier this Fall when our Bluehost VPS was having issues. It ended up being a hardware issue that took more than two weeks for them to flush out. Fast forward to this past week. Our server started acting up with the exact same symptoms.

I quickly called their tech support and pleaded my case to have them check the memory (the problem last time). Instead, I was escalated to Tier 2 support, and was told to give them 24-72 hours to “look into it”. Meanwhile, I have to keep the masses at bay and hope they find the problem quickly.

I waited 24 hours and thought I would call in to see if they had found anything. They knew nothing and couldn’t tell if anything had been done. I was told again to wait the 24-72 hours.  I was not happy, but kept quiet and took my medicine.

Finally, after lunch on Friday, as the magical 72nd hour was expiring, I picked up the phone and called in. By this point it had been three days, I had heard absolutely nothing from Bluehost, and both my customers and I had lost our patience. I felt advanced sorrow for the poor unsuspecting agent that would draw the unlucky number of me in their call queue. Steadfast, I maintained my professionalism. He first asked how long it had been, to which I happily answered the requisite 72 hours. I waited on hold for another fifteen minutes while he took his best crack at the case.

Defeated, he came back online and told me that he had escalated my case to Tier 3 support. “Great,” I replied. “How soon can I expect to hear something?” He replied, “You should hear something within 24 hours.” It was at that exact moment he heard me inhale deeply as I had expended every last bit of keep quiet I had left in my soul. I explained that I had already waited 72 hours, and was not going to wait for another 24 while they do nothing. Our server had already been unusable for three days, and this could not be drug out for two weeks like it was in August.

I demanded to speak with a superior. The agent put me on hold for a good five minutes, presumably to take some of the fight out of me. Supervisor, Brett, answers the phone, to whom I quickly provide the elevator summary of the situation. I asked Brett if he’s ever seen the movie Roadhouse, and in particular, the scene where Dalton tells his bouncers to “Be nice…until it’s time to not be nice.” I told Brett that I was past the phase of being nice, and it’s time to get some answers. Brett chuckled briefly and assured me he would get someone to look at it right away. I told him if they didn’t have this fixed by Monday that I would be leaving Bluehost and would never return.

Apparently my convo with Brett was not at a loss. Within thirty minutes Tier 3 support looked into the issue and escalated it to their Systems Operations team, who eventually escalated it to a Sr. Systems Architect. By Saturday night our server was humming along under normal operation again.

As a person who has been on the receiving end of this situation, I always hate having to strongly assert my point, and yes, interject a curse here and there to drive things home. As Dalton would say, “Nobody ever wins a fight,” but sometimes you’ve got to light a fire under the posterior of the right people to get things moving again.

Keep an Eye on Things

PRTG Dashboard

Network management systems (NMS) are a key component of any technology infrastructure. They provide real-time monitoring and reporting of a variety of hardware and software components on your network. Over the years I’ve had experience with a number of software packages – Nagios, What’s Up, Zabbix, Zenoss, SolarWinds, and most recently, PRTG.

Some forward-thinking colleagues in other ESUs have been singing the praises of PRTG for a few years, but I wasn’t so quick to jump on the bandwagon. A little over a year ago I had the chance to get in on a group buy with them at a great price, but I still wasn’t sold. I was convinced that I could get what I needed from a free or community supported tool. After floundering in trial implementations with a couple other tools, I finally gave in and bought PRTG.

We’ve only been running PRTG in production for a couple weeks, but I am wholeheartedly convinced that it is worth every penny. In that short time, we have already identified a failed power supply, a degraded RAID array, a bad ROM battery, and other performance tweaks on a number of servers. We also have the most accurate visibility into our bandwidth utilization that we have ever had, which helps greatly when purchasing firewalls, bidding circuits, and buying internet capacity.

One of the things I love most about PRTG is the Maps functionality. It lets you build your own custom dashboards that can show you how things are performing on your network. The image above is a screenshot of the dashboard I have running on a TV in my office all day long. At a glance, I can keep tabs on anything happening in our area. Having this information is invaluable in our line of work.

Get to the Bottom of It

In late August I was doing a training for some of our staff when I went to pull up a page on our website. Our normally speedy WordPress site was unusually sluggish, which always makes for an awkward pause in the middle of a presentation. Eventually, the page loaded and I went about my business not paying much mind to it.

A couple days later I had a report from one of our schools that their website was also loading slowly. After doing a little digging I noticed that all sites on our VPS (Virtual Private Server) were intermittently loading slowly or timing out. I began running a ping to our server and noticed that it would drop offline at random intervals. “No big deal,” I thought to myself, as I quickly remoted into the server to give it a reboot. We have had our BlueHost VPS for years, and it’s always just worked. The server rebooted and things seemed better initially, but the problem quickly reappeared.

Thinking the problem was surely some WordPress plugin gone awry, I began the process of disabling plugins one-by-one, then waiting a minute or two to see if the problem reappeared. Much to my chagrin, I could not find a plugin causing this issue. I then began running WordPress in debug mode to see if I could get to the bottom of things. This sent me down numerous rabbit holes, none of which resolved the issue.

Frustrated, I finally gave in and called tech support. They were happy to assist me, but quickly realized that I had already exhausted all of the usual suspects. A level two engineer joined the call and quickly pointed the finger at one particular WordPress plugin. I waited until off hours to fully disable the plugin and even contacted the developer to report the findings of the engineer. Lo and behold, this did not fix the issue either.

After getting back in touch with support, they said that the problem had to be with one of my sites, and that I should go through and begin disabling them until I find the culprit. I relayed to them how difficult this is when you have a multi-network WordPress installation. They insisted that was the problem, so I begrudgingly complied. That weekend I decided to take a more drastic approach and moved all of my websites into a quarantine folder, inaccessible by the web server altogether. Within minutes the VPS was still losing connection, so I knew the issue was not with any of my sites. I quickly fired off an email to update the ticket with my findings and requested that they move my account to a different server.

Days went by with no reply. My frustration grew as the issue became more prominent. I sent more ticket updates as I thought of anything else to try. I got passed to different engineers, who all found something the other had missed, but the problem persisted. I was ready to jump ship from BlueHost, already researching other VPS providers. Finally, after more than two weeks, I get an email update from an engineer saying they found a bad stick of RAM in the host server, and that I should be good to go. And that’s all it was. A bad stick of memory that is now forever stuck in my memory.