Host/System and Device/Router Naming Standards

May 21st, 2009

At each organization I am exposed to, it is interesting to see the various naming schemes that have been employed over time.  I most often find a hodgepodge of different naming standards that have been poorly followed.  Well thought out naming standards will make a huge difference in the ease of maintaining your environment.

So how should you come up with a device naming standard?  I won’t profess to give you a one-size-fits-all solution, but instead I will outline a number of the pitfalls to device naming that I have run into in order to help you devise your own convention.

Uses for a name

In IT, device names serve three primary roles:

  • They are a unique identifier used to define a device (note that a MAC address or serial number could be used as a unique ID, though it provides no other information about the device and is difficult for humans to work with).
  • When entered into DNS they provide an easy way to connect to a given device by typing in it’s name from scratch, or device names may be selected from a list in a program such as a SSH program.
  • When you see a device name in a log, or on a document it’s name should be obvious what the device in question is and convey to you critical information about the device.

Naming goals

  • Names should be as short as possible, easy to type and read, but with enough information to be unique and descriptive.
  • Make things as intuative as possible.  If you have an IT contractor working in your environment it should be pretty obvious to them what various servers do based soley on the machine names.
  • Your naming system should be flexible enough to allow for growth.

Naming structure

  • Generally you should start the name with the most significant identifier, and work your way through to the least significant identifier.   This makes sorting useful.
  • Think about how long should each field in the name be.  It needs to be long enough to hold unique entries for as many items of that type as will likely be utilized using the characterset defined for that field (i.e. if you have a two digit alpha field for site code, you can have a max of 676 sites, though if you want them to be intuative you probably don’t want to use the XZ designator) – a numeric only field has less options, 0-9 only yields 10 possibilities per digit.
  • Within a name you might choose to include delimiters between fields in order to seperate them, or just for stylistic reasons.  This makes names longer to type (and sometimes to long to fit in documentation, etc…), but they are often worthwhile from a readability standpoint.  PRF5A is a lot harder to read than PR-F5-A.  Most special characters are banned from device names, though dash “-” seems pretty well supported.
  • You can only have one variable length field in a name, unless you are using delimeters, or adjacent fields are obviously seperate since some are alpha only, and others are numeric only.
  • Note that not everything needs to have names of the same length – It is ok to name one server PDXFILE1 and another PDXSAN1.
  • Not everything needs to follow exactly the same nomenclature – routers and network hardware can follow one standard, while servers may follow another.  THIS IS OK!  As long as they don’t conflict…

Know your organization

  • Think about how your company will grow.  Might you ever have more than one VMWare server?
  • Unless there is no way your business will ever have more than one site (what if you were acquired) I highly recommend your names start with a site code (more on this below).
  • Not everybody has the same needs!  You don’t have to force the same scheme on every organziation!  A small manufacturering company has different needs from a global multinational.  You can get away with much simpler names in a small company than in a huge multinational corporation.

Who is your audience?

  • Names should be descriptive to your audience,  Who is your audience?  Users?  IT staff?
  • In an optimal world, machine names should not be seen by users.  In end-user facing situations I recommend using CNAME’s wherever possible to alias “service names” to “server names”. (i.e. webmail.bitplumber.net could be CNAME’d to pdxmail1.bitplumber.net.  Note that this often falls down in Windows since in Outlook for instance it insists on showing the user the *real* servername…  The same goes for file server names.
  • Internet facing services should never have users seeing the machine names.  They are likely connecting to a firwall and or load balancer first anyway so this is easy to hide.

High-level recommendations

  • Don’t name things non-sensical names, this is not 1990 (yeah, I know I broke this rule when naming plunger.bitplumber.net)
  • Avoid putting un-necessary junk in server names – I don’t really care what the model number of server is (in most cases), or even if it is a VMWare guest server or a physical server (this matters less and less as time goes on).
  • Don’t put a version number of software in the name as you will likely upgrade it! (I have seen servers named Win2k that are running Windows 2003 Server)
  • If the server might end up running multiple applications don’t put the name of one piece of software in the name, call it an application server or something…  (I have seen a server named backupexec that was running netbackup…)
  • In a software development shop (or even a non-software shop), you will likely have multiple copies of similar environments for testing purposes.  PRODUCTION, QA, DEVELOPMENT, STAGING , etc…  This is a good thing to include in the name as you typically have similar server names in each and you don’t want to inadvertantly make a change in Production when you intended to make it in QA.
  • Usually it makes sense to name services with a number on the end as you might have multiple servers performing the same function, or even if you only have a single server in that function you might move to another physical server later which you designate with a different number on the end.
    Many environments put two numbers on the end of servers, but how often do you really have more than 9 servers of the same type at one site?  It may be ok for some servers to have a single digit number on the end, while others have two digits.

Site codes

In most organizations I recommend the use of site codes as even single-site companies often end up with remote sales offices, disaster recovery datacenters, etc…

The goal with site codes is to choose a identifier that people both from the site in question, and others far away can easily identify as being related to a given location.  I have often struggled with this as there is no standard, and lots of potential for confusion and overlap.

You must decide how long you want your site codes to be.  I know Intel used to use two digit codes.  Many organizations choose three digit codes which conveniently enough corresponds with airport codes.

There are  a couple issues with airport codes however:

  • Some airport codes are not obvious which city they are in
  • You often times will have multiple sites within the serving area of a single airport

Note that not all site names have to be the same length (depending on your name structure).  At the last company I worked for I gave the large headquarters site in each region a three digit code, and then the smaller satellite sites got five character codes that began with the three digit region in which they were located.  i.e. PDX was the headquarters site and PDXPC was the Pacific Center satellite site.

A few other notes

Two situations to consider: Naming a device after a department, but that department moves elsewhere physically, but the device stays…  Or, naming a device after a building, but the company moves to another facility along with the device, and keeps the name.  Sometimes you must make a decision as to what a device will stay sticky with, the company/department, or the physical facility.

What is the timespan that your naming scheme must be good for?  I doubt a single site company is going to become a multinational overnight…  Your average IT device lasts 3-7 years so your naming scheme can easily change at replacement time to handle growth.

You might need to consider naming of devices with multiple network interfaces, each with different IP’s.

  • Windows is dumb and by default wants to register every interface with the same thing in DNS.  This can lead to issues if all networks are not directly reachable by all hosts accessing the device.
  • Solaris is interesting in that it wants each interface named differently.  In this case I recommend making the main server name map to the “primary” interface (i.e. probably the one you set the default gateway on) and then use <hostname>-xx for additional interfaces where -xx is something like -bk for backups, etc…
  • Routers should have different forward and reverse names for each interface, plus forward and reverse names for a loopback IP.  (i.e. fa0-0.plunger.bitplumber.net and fa0-1.plunger.bitplumber.net and just plain plunger.bitplumber.net for the loopback IP)

In one environment I have worked in we name all of our iLO, ilom’s, DRAC’s, etc…  <hostname>-SC (sc = service controller).  This makes it easy to go login to one in an emergency.  Just don’t accidentally cross the DNS entries or else you might power cycle the wrong box!

You must be careful not use special characters in device names.  Note that different devices and directory systems may have different “special characters”.  Think about Windows names, Unix names, router names, DNS names, WINS names, etc…  Each different type of name has different restrictions on what characters and symbols are allowed, and what the minimum and maximum lengths are.  Some names could be case sensitive, but most are not.

I personally find uppercase names easier to read in documentation and on screen, but that is in many cases a matter of personal preference, and in others may be enforced by the system in/on which the name is set (i.e. DNS).

IP addressing in relation to names

This is a topic worthy of another complete blog post, but I will point out just a couple of key recommendations here.

Since private ip address space is “free” and “plentiful” I generally build my subnets with plenty of IP space so that I can space machines widely and align their last number with their server number.  Most often I will use /23 subnets for servers and clients which gives me 512 IP’s (minus a few for network, broadcast, and default gateway).  As an example, you could have a server called PDXESX1 with an IP of 10.111.2.21 and another called PDXESX2 with IP 10.111.2.22, PDXESX3 as 10.111.2.23, etc…

On a somewhat unrelated note, in my oppinion the default gateway should always be the lowest usable IP in the range because it is intuative for anyone that follows after you.  Along these same lines, I am a fan of always making my DNS servers .11 and .12 in a given subnet (or .11 in one subnet and .11 in another subnbet).

Is this the right time to change?

Is change really needed?  Or is it simply change for change sakes?

The natural tendency for each new “owner” of a network is to want to do things their way with a naming standard that makes sense to them.  Don’t keep changing your naming schemes!  Even if the existing one is not perfect, it may be better overall just to leave it as is!

You generally don’t want to avoid changing a machines name after it has been set – the name gets referenced all over the place, and unless your process to change it is perfect, it will get missed somewhere and cause confusion down the road…  Think about all of the places you might have to change the name:

  • On the machine itself (hostname, hosts files, application configurations…)
  • In your ip address spreadsheets
  • In your inventory system
  • In DNS entries (including CNAME’s that reference the host name)
  • On the labels stuck to the machine physically
  • Your labels in the network switch (and supporting documentation)
  • Labels on the cables attached to the server – network, power, etc…
  • In your monitoring software
  • On your kvm switch
  • In description fields on your remote power cycle device (PDU’s) 
  • On your network diagrams and documentation

Final thoughts

While this may be a bit overwhelming, it is crucial to consider all of these aspects ahead of time in order to avoid needing to change your standard down the road.  I hope this has given you an overview of many of the pitfalls of naming I have run into during my career such that you can avoid the same mistakes!

As always, if you have any additional comments, feel free to post them here, or shoot me an email and I may include them in a future post.

-Eric

eprosenx Uncategorized

Cisco ASA 5510 8.2 AnyConnect License Price ASA-AC-E-5510

May 13th, 2009

As a follow-up to a previous post, I am happy to report that Cisco has finally posted the bits to the ASA 8.2 code online for download.  I have been looking forward to this, as this release includes a new license model for the AnyConnect VPN client called “Cisco AnyConnect Essentials”.

While I still can’t find any written reference (on the Cisco price list or elsewhere) for how much the AnyConnect VPN client is going to cost, I have confirmed that the previous rumor of it being “next to free” is indeed true.  Cisco is only charging $150 for the AnyConnect VPN Essentials license on a 5510 which will give you up to 250 simultaneous users!  (that is about as close-too-free as Cisco gets)

This is the answer you are looking for to deal with 64 bit client support!  A coworker of mine even told me today that the AnyConnect client works in his Windows 7 Beta 2 machine (which surprised me, I suspect under-the-hood the Windows 7 networking stack is very similar to Windows Vista).

The part number you need for an ASA 5510 is ASA-AC-E-5510=.  If you need the part numbers for other models check out the release announcement.

There is some reference in the release notes to possibly needing more ram in the ASA 5510 platforms (I am not yet sure if this will impact me, I am not doing a ton of stuff on my ASA 5510 but yet I run near 80% RAM utilization on version 8.0.4).  It is worth noting that there is annoying footnote that says the 256 -> 512 meg of RAM upgrade won’t be available till June…

Also, I have been told that the Botnet detection feature will be $460 a year.  This is part number ASA5510-BOT-1YR= for the ASA 5510.

I will write up another post once I install the 8.2 code somewhere.

-Eric

UPDATE: 5/18/09

I am getting conflicting information from my VAR than I got directly from Cisco.  They say MSRP is $350 right now and it won’t be available till late this month or early June.  CDW has it posted for $232.99 without any special pricing discounts you may have with them.  Availability says to call…

UPDATE: 5/29/09

The CDW site now shows that the ASA-AC-E-5510 part is $101.99.  It still says availability is “call”…

And for those of you looking for the part numbers you need to purchase the AnyConnect Essentials for your model of ASA, here they are:

  • AnyConnect Essentials VPN License - ASA 5505 (25 Prs) - ASA-AC-E-5505=
  • AnyConnect Essentials VPN License – ASA 5510 (250 Prs) – ASA-AC-E-5510=
  • AnyConnect Essentials VPN License – ASA 5520 (750 Prs) – ASA-AC-E-5520=
  • AnyConnect Essentials VPN License – ASA 5540 (2500 Prs) – ASA-AC-E-5540=
  • AnyConnect Essentials VPN License – ASA 5550 (5000 Prs) – ASA-AC-E-5550=
  • AnyConnect Essentials VPN License – ASA 5580 (10K Prs) – ASA-AC-E-5580=

eprosenx Cisco, Network

Sun X4100 and X4200 Lower Non-critical going low

April 29th, 2009

For over a year now our team of oncall engineers has been tortured by an error generated periodically by our racks of Sun X4100 and X4200 servers.  These alerts come from the integrated ILOMs which we have set to syslog to our EM7 monitoring platform.  Usually about once a week one of our many servers will report something along the lines of the following error:

FIRST REPORTED: 2009-04-29 14:50:33

LAST REPORTED: 2009-04-29 14:50:34

 

SEVERITY: CRITICAL

OCCURRENCES: 2

SOURCE: Syslog

ORGANIZATION: Management

DEVICE: prsun1-sc

 

Full message text for most recent occurrence:

<130>logmgr: ID = 343 : Wed Apr 29 14:52:39 2009 : IPMI : Log : critical : ID =   7f : 04/29/2009 : 14:52:39 : Voltage : mb.v_+12v : Lower Non-critical going high : reading 12.16 > threshold 10.96 Volts

 

This event has not been acknowledged

 

Sent by notification policy: Major/Critical Events

 

The EM7 has received a CRITICAL syslog notification from this server.

If you go look at the event log on the ILOM it looks more like this:

04/29/2009 : 14:52:39 : Voltage : mb.v_+12v : Lower Non-critical going high : reading 12.16 > threshold 10.96 Volts
04/29/2009 : 14:52:38 : Voltage : mb.v_+12v : Lower Non-critical going low : reading 7.37 < threshold 10.96 Volts

Looking at the event log another server with the same type of issue, the error is for a different sensor, but yet it has the same behavior:

02/21/2009 : 06:25:01 : Voltage : p1.v_vddio : Lower Non-critical going high : reading 1.85 > threshold 1.60 Volts
02/21/2009 : 06:24:55 : Voltage : p1.v_vddio : Lower Non-critical going low : reading 0.97 < threshold 1.60 Volts

I should note that these errors *never* seem to turn out to be anything but noise…  We all just acknowledge the alarm and go back to bed.

This week I finally got annoyed enough to go look further into this issue as I do participate in the on-call rotation which covers these systems (even though I don’t *own* these systems).

After doing some digging, I found the following obscure note in the release notes for some firmware update bundle which includes ILOM firmware:

ILOM Service Processor firmware 2.0.2.5
  * Fixed the bug of lower non-critical voltage sense issue.

So I have gone ahead and upgraded a couple of my servers thus far.  Hopefully this will resolve the issue!

I have to get in a couple of jabs at Sun here since I burned an entire day today messing with their servers:

  • When you upload the ILOM firmware (which includes a system BIOS upgrade also)  your server may get powered off during the upgrade without any warning.
  • When you upgrade to a 2.0 BIOS from a 1.x version, you have to manually clear the CMOS according to their release notes (the update utility seriously could not do this for us?)
  • And my personal favorite, their documentation makes some obscure reference to some bug you might run into and so they tell you that you must upload the new firmware *twice* in order to ensure it applied properly.  Mind you they don’t tell you what the problem you might run into is, and they give you no way to tell if the person that upgraded the firmware for you previously did the double firmware update properly.
  • After the ILOM firmware and system BIOS updates I did today, the servers somehow managed to change the device ID’s (or something) on the onboard NVIDIA NICs in such a way that Windows recognized them as new NIC’s (5 and 6).  This caused them to loose all IP settings and I had to log in through the ILOM and reset them.  This happend on the two servers I upgraded.
  • To upgrade the RAID card firmware/BIOS you must boot the server from a CD that runs DOS.  Note that on a Dell box you drop in the Openmanage CD, it scans your system to determine what needs updating to get you to a “known good set” of drivers, and you click the go button.  It takes care of all Firmware/Drivers/Software for you.
  • The LSI software for Windows to monitor the built in RAID card is a joke.  It looks like an intern wrote it.
  • At least Sun does provide a streamlined Windows driver installer package, this did work well.

Overall, I am not completely thrilled with Sun’s x86 hardware lines, though I suppose things may be better if you are a Solaris-on-x86 shop.

-Eric

UPDATE 5/13/09

I got another voltage error on one of my fully updated servers.  I have called Sun and opened another case on this, though so far Tier 1 and Tier 2 techs do not seem to have any ideas as to what is causing this issue.  I sent them a bunch of output from the ipmi tool that they are looking through.

ID = 1 : 05/10/2009 : 23:58:42 : Voltage : p1.v_vtt : Upper Non-critical going high : reading 1.79 > threshold 1.00 Volts

I should also note that after the firmware updates, one of the machines is now reporting ECC errors.  This makes me wonder if the previous firmware was not properly reporting them.  We have had almost zero RAM problems with our dozens of Sun x86 servers which makes me worry that they are just hiding their problems.  I must say the server handled the failure gracefully.  It was getting dual bit (uncorrectable) ECC errors and so upon boot it disabled the two (of four) offending DIMMS.  Very nice.

Also, I would like to take a moment to comment on Sun’s build quality in the x4100 and x4200 servers.  I opened a couple of them up today for the first time and I must say, I am *very* impressed with the physical build quality.  Sun has some very talented hardware engineers (almost over-built I would say).  The servers are made from some heavy gauge metal among other things.

So while I have changed my mind a bit on Sun’s build quality, they are certainly lacking some of the finer touches needed for x86 servers.  Their out of band management controllers (previously ALOM’s, now iLOM’s) have been quite the fiasco for us.  They also are a royal pain to bring all the different firmwares/drivers up to “known good sets”.  Dell has quite a nice tool for this.

One of the tech’s also did mention that there was a firmware update for the power supplies to keep them from powering the machine off in the event of a momentary power loss (like as a UPS kicks in).  Apparently they are programmed to power down after 20ms of lost power.  They should be able to run for over 100ms even after power is lost.

eprosenx Uncategorized

Verizon and Verizon Business don’t peer in Portland

April 28th, 2009

I discovered last night that Verizon Business (aka UUNET, MCI, alter.net, AS701) and Verizon proper (i.e. the Local Exchange Carrier here in Portland, AS19262) don’t appear to peer here.  That is a major shame since I am on Verizon FiOS and I can’t even access other businesses that use Verizon Business as their ISP here in Portland without bouncing of Seattle.

Check out this traceroute from my router on my FiOS connection to SilverStar Telecom who uses Verizon Business as one upstream:

plunger#traceroute www.silverstartelecom.com

Type escape sequence to abort.
Tracing the route to www.silverstartelecom.com (12.111.189.3)

  1 L100.PTLDOR-VFTTP-01.verizon-gni.net (72.87.39.1) 4 msec 4 msec 4 msec
  2 P2-3.PTLDOR-LCR-01.verizon-gni.net (130.81.32.164) 4 msec 4 msec 4 msec
  3 so-7-3-0-0.SEA01-BB-RTR1.verizon-gni.net (130.81.28.160) 8 msec 8 msec 8 msec
  4 0.so-7-1-0.XT1.SEA7.ALTER.NET (152.63.105.57) 8 msec 8 msec 8 msec
  5 0.so-6-2-0.XT1.POR3.ALTER.NET (152.63.105.233) 12 msec 16 msec 12 msec
  6 POS6-0-0.GW9.POR3.ALTER.NET (152.63.104.249) 12 msec 16 msec 12 msec
  7 IT-S-Star-gw.customer.alter.net (157.130.177.118) 12 msec 16 msec 12 msec
  8 sst-pit-6509-gi25-2-gsr12-gi60.silverstartelecom.com (66.206.80.21) 12 msec 16 msec 12 msec
  9 www.silverstartelecom.com (12.111.189.3) 12 msec 16 msec 16 msec
plunger#

What a bummer.  I hope they rectify this situation soon!

-Eric

eprosenx Uncategorized

What upstream ISPs is your provider peered with?

April 27th, 2009

When evaluating a hosting provider, colocation facility, or an ISP, one of the most important aspects is “How well peered are they?”  In this day and age you certainly want to go with an organization that has redundant connections.  In general, the more entities your partner is directly connected to, the less impact individual failures will have, and the lower your latencies for connectivity will be.

The best way to quickly determine who a given provider is peered with is by looking at BGP routing tables as seen by other networks in the world.  We are very fortunate that the Route Views Project is available, which is based out of the University of Oregon (I feel dirty now linking to U of O since I am a Beaver after all).

The route views project maintains a number of routers that are peered with routers from numerous different backbones.  These peering sessions exist not for the purpose of routing packets, but instead so that people can login to a route-views router and see what other networks think the best route is to someplace, and also so that the folks from the route views project can log data in order to allow various analytics later down the road.

Let’s say you are interested in determining the upstream peers for SilverStar Telecom (an ISP located in Portland with their routing core in the Pittock building).  You must first determine an IP address that resides within their network.  For the sake of this example we will do a dns lookup on www.silverstartelecom.com which resolves to 12.111.189.3.

Once you have an IP you wish to look up, telnet to route-views.routeviews.org and login as username “rviews”:

                    Oregon Exchange BGP Route Viewer
          route-views.oregon-ix.net / route-views.routeviews.org

 route views data is archived on http://archive.routeviews.org

 This hardware is part of a grant from Cisco Systems.
 Please contact help@routeviews.org if you have questions or
 comments about this service, its use, or if you might be able to
 contribute your view.

 This router has views of the full routing tables from several ASes.
 The list of ASes is documented under “Current Participants” on
 http://www.routeviews.org/.

                          **************

 route-views.routeviews.org is now using AAA for logins.  Login with
 username “rviews”.  See http://routeviews.org/aaa.html

 **********************************************************************
User Access Verification

Username: rviews
route-views.oregon-ix.net>

Issue the “show ip bgp 12.111.189.3″ command

route-views.oregon-ix.net>show ip bgp 12.111.189.3
BGP routing table entry for 12.111.189.0/24, version 17338865
Paths: (33 available, best #22, table Default-IP-Routing-Table)
  Not advertised to any peer
  7660 2516 3356 32869
    203.181.248.168 from 203.181.248.168 (203.181.248.168)
      Origin IGP, localpref 100, valid, external
      Community: 2516:1030
  3549 1239 32869
    208.51.134.254 from 208.51.134.254 (208.178.61.33)
      Origin IGP, metric 0, localpref 100, valid, external
  3582 3701 32869
    128.223.253.8 from 128.223.253.8 (128.223.253.8)
      Origin IGP, localpref 100, valid, external
      Community: 3582:466 3701:392
  701 32869
    157.130.10.233 from 157.130.10.233 (137.39.3.60)
      Origin IGP, localpref 100, valid, external
  3333 3356 32869
    193.0.0.56 from 193.0.0.56 (193.0.0.56)
      Origin IGP, localpref 100, valid, external
  7500 2497 701 32869
    202.249.2.86 from 202.249.2.86 (203.178.133.115)
      Origin IGP, localpref 100, valid, external
  3277 3267 9002 3356 32869
    194.85.4.55 from 194.85.4.55 (194.85.4.16)
      Origin IGP, localpref 100, valid, external
      Community: 3277:3267 3277:65321 3277:65323
  2828 7018 32869
    65.106.7.139 from 65.106.7.139 (66.239.189.139)
      Origin IGP, metric 3, localpref 100, valid, external
  2914 7018 32869
    129.250.0.11 from 129.250.0.11 (129.250.0.51)
      Origin IGP, metric 5, localpref 100, valid, external
      Community: 2914:420 2914:2000 2914:3000 65504:7018
  2914 7018 32869
    129.250.0.171 from 129.250.0.171 (129.250.0.79)
      Origin IGP, metric 1, localpref 100, valid, external
      Community: 2914:420 2914:2000 2914:3000 65504:7018
  852 174 7018 32869
    154.11.98.225 from 154.11.98.225 (154.11.98.225)
      Origin IGP, metric 0, localpref 100, valid, external
      Community: 852:180
  852 174 7018 32869
    154.11.11.113 from 154.11.11.113 (154.11.11.113)
      Origin IGP, metric 0, localpref 100, valid, external
      Community: 852:180
  12956 1239 32869
    213.140.32.146 from 213.140.32.146 (213.140.32.146)
      Origin IGP, localpref 100, valid, external
      Community: 1239:100 1239:123 1239:999 1239:1000 1239:1010 12956:321 12956:
4003 12956:4030 12956:4300 12956:18500 12956:28430 12956:28431
  3582 3701 32869
    128.223.253.9 from 128.223.253.9 (128.223.253.9)
      Origin IGP, metric 0, localpref 100, valid, external
      Community: 3582:466 3701:392
  8075 3356 32869
    207.46.32.34 from 207.46.32.34 (207.46.32.34)
      Origin IGP, localpref 100, valid, external
  286 3549 1239 32869
    134.222.87.1 from 134.222.87.1 (134.222.86.1)
      Origin IGP, localpref 100, valid, external
      Community: 286:18 286:19 286:29 286:888 286:900 286:3001 3549:2355 3549:30
840
  16150 3549 1239 32869
    217.75.96.60 from 217.75.96.60 (217.75.96.60)
      Origin IGP, metric 0, localpref 100, valid, external
      Community: 3549:2773 3549:31208 16150:63392 16150:65321 16150:65326
  2905 701 32869
    196.7.106.245 from 196.7.106.245 (196.7.106.245)
      Origin IGP, metric 0, localpref 100, valid, external
  3561 701 32869
    206.24.210.102 from 206.24.210.102 (206.24.210.102)
      Origin IGP, localpref 100, valid, external
  3257 3356 3356 3356 32869
    89.149.178.10 from 89.149.178.10 (213.200.87.91)
      Origin IGP, metric 10, localpref 100, valid, external
      Community: 3257:8091 3257:30042 3257:50001 3257:54900 3257:54901
  4826 3356 32869
    114.31.199.1 from 114.31.199.1 (114.31.199.1)
      Origin IGP, localpref 100, valid, external
  3356 32869
    4.69.184.193 from 4.69.184.193 (4.68.3.50)
      Origin IGP, metric 0, localpref 100, valid, external, best
      Community: 3356:3 3356:22 3356:90 3356:123 3356:575 3356:2012 65002:0
  6079 3356 32869
    207.172.6.20 from 207.172.6.20 (207.172.6.20)
      Origin IGP, metric 0, localpref 100, valid, external
  6079 3356 32869
    207.172.6.1 from 207.172.6.1 (207.172.6.1)
      Origin IGP, metric 0, localpref 100, valid, external
  812 6461 701 32869
    64.71.255.61 from 64.71.255.61 (64.71.255.61)
      Origin IGP, localpref 100, valid, external
  6939 3549 1239 32869
    216.218.252.164 from 216.218.252.164 (216.218.252.164)
      Origin IGP, localpref 100, valid, external
  1668 7018 32869
    66.185.128.48 from 66.185.128.48 (66.185.128.50)
      Origin IGP, metric 511, localpref 100, valid, external
  6539 3561 1239 32869
    66.59.190.221 from 66.59.190.221 (66.59.190.221)
      Origin IGP, localpref 100, valid, external
  1221 4637 3356 3356 3356 32869
    203.62.252.186 from 203.62.252.186 (203.62.252.186)
      Origin IGP, localpref 100, valid, external
  6453 1239 32869
    195.219.96.239 from 195.219.96.239 (195.219.96.239)
      Origin IGP, localpref 100, valid, external
  7018 32869
    12.0.1.63 from 12.0.1.63 (12.0.1.63)
      Origin IGP, localpref 100, valid, external
      Community: 7018:2000
  6453 1239 32869
    207.45.223.244 from 207.45.223.244 (66.110.0.124)
      Origin IGP, localpref 100, valid, external
  2497 701 32869
    202.232.0.2 from 202.232.0.2 (202.232.0.2)
      Origin IGP, localpref 100, valid, external
route-views.oregon-ix.net>

This will give you an extensive list of routes which you can use to reach SilverStar.  On the first line of the output you can see that 12.111.189.0/24 is the most specific route in the BGP table that matches 12.111.189.3.  Below that line, are a number of entries, each starting with a list of AS numbers on the least-indented line.  Let’s use the 4th item as an example.  It simply contains 701 32869.

If you look up the rightmost ASN (which is the originating ASN for this prefix), you will see that it is registered to SilverStar Telecom (as you might expect).  To look this up you can go to www.arin.net and enter AS32869 into the whois search box.

Now lets take a look at the AS number directly to the left of 32869 which in this case is also the first entry in the list, 701.  By virtue of being adjacent in the list, this means that SilverStar telecom advertised 12.111.189.0/24 to AS 701.  Furthermore, since 701 is the first leftmost entry in the list, it tells us that AS 701 peers directly with the route-views router.  If you look up AS 701 you will see it is registered to MCI (aka Verizon Business).  So Verizon Business is one of SilverStar Telecom’s upstream providers.

Let’s move on and take a look at the third entry in the list, 3582 3701 32869.  If we translate those entries to entity names by using whois, we can see it equates to University of Oregon -> NERO Net -> SilverStar Telecom.  In this case, SilverStar peers directly with NERO (presumably across NWAX).  Granted I am certain NERO does not provide “transit” for SilverStar, but it is notable in that SilverStar makes the effort to connect with others locally.

Now to speed up this process a bit, all we really care about is what AS number is just to the left of SilverStar’s ASN (32869) in each entry (that we have not already looked up and recorded.  Using this method I have generated the following list:

  • 3356 – Level 3 Communications
  • 1239 – Sprint
  • 701 – MCI (Verizon Business)
  • 7018 - ATT
  • 3701 – NERO (Network for Education and Research in Oregon)

I must say, that is pretty impressive connectivity for Portland.  Verizon Business and ATT both actually have routing cores in Portland.  Sprint and Level 3 don’t and so you have to terminate circuits on routers in Seattle (or California).

That is all there is to it.  You simply login to the route views router and see what other routers think their best pathshould be to the network in question.  It is worth noting however that this is certainly not a 100% full view of the world.  It is very likely that SilverStar peers directly with other organizations (for non-transit traffic) but that we have no visibility into that since none of the downstream routers from that peering share their view of the world with the route views project.

For the most part however, the route views project has visibility into enough sites to see which major backbones a given ISP is attached to.  One other caviot to add however is that this will only give you an idea of how traffic gets *to* SilverStar Telecom, and not what outbound routes from SilverStars network packets will take.  It is possible that SilverStar is also hooked to another ISP (like XO communications) but that for some reason they don’t advertise 12.111.189.0/24 out that connection, or they use some metric to make it the least preferred route.  SilverStar may still route traffic out the XO connection even though no traffic comes in that way (I know for a fact though that SilverStar is not hooked to XO).

So go ahead and check out who your ISP is peered with!  You may be plesantly surprised (or disappointed).  This is a great way to double check what the sales droids tell you.  I have seen cases where ISP’s continue to maintain even a single T-1 to a provider in order to say that they are connected to them, while in reality they don’t route any traffic with them.  (or more likely, they have IP address space that belongs to that provider that they don’t want to have to re-number)

-Eric

eprosenx Network, Telecom

Cisco PIX/ASA VPN Client for 64 Bit Windows

April 23rd, 2009

For quite some time now I have been annoyed at Cisco for not releasing a 64 bit edition of their IPSec VPN client.  As far as I am concerned, their plan has been to force everyone over to the AnyConnect VPN client (SSL VPN) which does support 64 bit clients (i.e. Windows Vista 64 bit).

Oh, and by-the-way the AnyConnect Premium client costs $1,250 (MSRP) for a 10 concurrent user license on a 5510, where as the IPSEC client is FREE for unlimited users.

On a recent trip to Costco I was amazed at what percentage of the systems now being sold were coming with 64bit Windows Vista.  There are becoming more-and-more home users that can’t VPN in anymore with the IPSEC client.

While I am still unhappy with Cisco for artificially forcing people over to AnyConnect client, there is some amount of relief in sight to this issue.

Cisco just announced this week at RSA some new features that will be included in the ASA 8.2 code.  One of which is a new AnyConnect license level “AnyConnect Essentials” which according to my sources will be “Almost Free” (you can decide what that means for yourself).  This license will provide basic VPN  access, but not include the clientless web portal stuff or Cisco Secure Desktop (basically the stuff that I would rather not have to support anyway).

The other cool feature I am looking forward to is a new Botnet detection capability.  Basically the ASA will periodically download signature files from Cisco that tell it what traffic to look for.  If the ASA observes internal machines connecting out to known Botnet controllers it will be able to report on them.  There will be a yearly fee to enable this, so I am curious to see what they charge.

No word yet on availability dates for 8.2 or official pricing.  You can find more details on what is in 8.2 here.

-Eric

eprosenx Cisco, Network

Submarine Undersea Cables Landing in Oregon

April 20th, 2009

Until recently I never really realized how significant a role Oregon plays in the Pacific undersea cable business.  Apparently it is easier to get permits to land cables in Oregon than it is in California or Washington and our undersea geography is conducive to such projects.

As I have dug deeper into this topic, I put together a spreadsheet of all the different cables that land in Oregon (that I am aware of).  As usual, if I am missing anything, please send me an email.

I am working to add the cable landing stations and cable termination stations to my Portland/Oregon telecom map.

I find it disappointing how little “peering” we have going on in Oregon considering the amount of bandwidth flowing through the state up to Washington, down to California, East to Boise, off the coast to Alaska, and further West to Asia.  With the addition of the new immense capacity of the TPE cable Oregon has even more data flowing through it than ever before.

And just in case you want to know more about these cables, here is a list of all the references I have come across:

-Eric

eprosenx Network, Telecom

Reasons to move into a colocation facility

April 17th, 2009

Every organization I have worked at has done battle with our long lasting enemies,  ”Space”, “Power”, and “Cooling” (not to mention “Bandwidth”).  These three (or four) items seem to be the bane of ITs existence.  There is seemingly an endless demand for more capabilities for the business, which means more applications, which means more servers, which require “Space”, “Power”, “Cooling” and “Bandwidth”.

We are called upon to look into our crystal ball to determine how our needs will grow (or shrink?) over the next several years so that we can purchase the right Generator, UPS, and Cooling equipment.  And once we procure said space, power, and cooling, we spend our days, nights, and weekends making sure maintenance is performed or responding to emergencies.

I am here to tell you there may be a better way!  (I say *may* because this is not a one-size-fits all solution)  In many cases you can make an excellent business case for outsourcing these pain points that might be cash neutral, or even save money!  In the rest of this post I will outline a number of factors to consider in your decision making process:

Space

  • Is your office space very expensive per square foot?  Could you be using that high cost “Class A” space for something more appropriate (you would be surprised at the number of server rooms that have a view).  Are you otherwise space-constrained in some way that makes moving the servers elsewhere attractive?
  • Are your servers good neighbors to those working around them (i.e. are they too loud or do they create too much heat)?
  • Could someone break down the door on the weekend and steal your servers with all your client data?  i.e. is your facility adequately secure?  Are you required to meet PCI compliance, etc…?
  • Does your building have an inadequate fire system or other high risk tenants?  Is your sprinkler system a “wet pipe” system that could get bumped with a ladder and flood your servers?
  • Is the physical environment appropriate for servers?  Is there a lot of dust in the air, vibration from machinery, etc…  I have seen network equipment in mechanical rooms along with steam heat exchangers and janitorial closets with mop cleaning basins.
  • What is the seismic rating of your facility?  In a natural disaster your employees may be able to work remotely if your servers are still available across the Internet.
  • Do you want to make the investment in your current facility to provide an adequate environment for your servers?  (i.e. cooling, UPS, generator, fire suppression, etc…)  If your lease is short term or nearly up and you are not planning on moving, it may not make sense to spend any of your own capital.

Power

  • Do you have access to enough power to run your datacenter in house?  Might you need to bring in extra transformer/switchgear/riser/breaker capacity for expansion?
  • Does your facility have three phase power available? (which is required for many large UPS units and some new SAN’s, blade enclosures, etc…)
  • Are you in an industrial business park with other businesses that have large motors starting and stopping all the time?  These can cause surges/spikes/brownouts and they increase the likilyhood of causing you a power outage (especially if you share a transformer).
  • Do you have a good quality double online conversion UPS, or just random small line interactive UPSs?
  • Does your facility have a generator that can support your servers AND your cooling needs? (not to say that all environments *need* a generator)
  • How reliable is the power at your office?  Do you have a history of power problems there?
  • Do you pay for power usage at your office (i.e. are you sub-metered) or do you just get billed a portion of the overall bill split amongst the tenants?

Cooling

  • If you are in a standard commercial office building, there is a decent chance that your server room is cooled by the main building cooling units.  Most building leases provide for cooling during normal business hours (8-5pm Monday through Friday). Have you been in your server room on the weekends?  Is it 90 degrees in there?  If you want to have cooling available 24×7 you may need to pay the owner a lot more money to have them run the main building hvac units at all times.  In addition to being potentially costly, this is not very good for the environment.
  • If your server room is cooled by the main building cooling units and let’s say they are even running 24×7 you must remember that they are designed for comfort heating and cooling.  Depending on the type of system in place it may actually blow *heated* air into your server room during “morning warm up” cycles rather than cooling.  It can’t provide cooling to the server room while it is trying to heat the entire building.  I have seen server rooms that cause heating and cooling issues in the office space surrounding them as the server room is *always* calling for cooling even in the winter.
  • Normal air conditioners are designed to operate eight hours a day five days a week.  They are not designed for continuous 24×7 operation and so they will break more often when used in that fashion, and they may “freeze up” due to the fact that they must continue operating all the time (and don’t get a break to let ice melt from the coils) even in the winter when their are higher humidity levels.
  • Is providing appropriate cooling for a server room prohibitively expensive in your facility? (i.e. your in a high rise building)  Do you have somewhere to easily reject the waste heat outside from the computer room?
  • It is worth noting that while cooling for computing facilities is generally a big power hog, some modern colocation facilities are taking steps to be more environmentally friendly with their cooling by taking advantage of low outside air temperatures to cool servers without requiring the operation of refrigerant compressors. 

Communications

  • Where are the majority of your uses based?  If 100% of them are at your office and you are not concerned about other factors listed above, the best place for your servers may still be at your office as you don’t run the risk of being disconnected from them.  However, if you have the majority of your users at remote locations (including working from home) the argument for basing your servers in a colocation facility becomes much stronger. 
  • One of the most compelling arguments for colocation is being able to save on telecommunications costs.  You might be able to make it pay for itself based soley on your savings in telecom expenses.  There is more competition in a good colocation facility and so you can shop around.  Whereas at your office, you might only have the LEC (Local Exchange Carrier) available, in a good colocation facility in Portland you would have at least 4 on-net providers (if not more).  In many cases it makes sense to use one provider for access to offices in Washington, and another to get you to Boston.  Don’t let yourself get locked into using a single vendor!
  • Costs may be futher reduced by not having to pay for “local loop” access if you are in the same building as your network service providers routers.  Bringing in a DS-3 to a lumber mill out in a remote location is much more expensive than in downtown Portland.
  • In many cases, colocation facilities are located near telecommunciation hubs and as such, your chance of being cut off from your WAN or Internet provider is much lower.  Not to mention that colocation facilities are generally on fiber optic rings with redundant paths.  If you keep your servers at your office and make them available to the Internet (and other WAN locations) via copper T-1’s it is very easy for your T-1’s to be taken down by someone installing your neighbors POTS (Plain Old Telephone Service) line.  Note that even if you use T-1’s for connectivity within a colo facility, they are likely brought into the building across fiber rings which makes them much “cleaner” and more reliable.
  • If you are planning on having a DR facility in another city and you need a high speed link between them it will likely be much cheaper if you can bid this out to multiple ISP’s available at both datacenters.  High speed circuits between datacenters often cost less than circuits to customer premises.
  • One of the most important technological and financial factors in your decision making process needs to be:  “How will I get a high-speed connection from my office to my servers?”  This new cost must be factored in and the potential for the connection going down must be considered in your evaluation.  This is perhaps the most negative technical factor in the argument for moving your servers into a colo facility.

Benefits

  • You don’t have to staff in house for and spend time/effort/money on managing your physical infrastructure.  This is taken care of for you by the colocation provider.
  • You don’t have to spend capital on UPSs, generators, floor space, fire control, and cabinets.  Granted you do pay for this somehow on an ongoing basis to the colo provider.
  • Uptimes can be improved as you will have fewer power/cooling/communication failures and your servers will have fewer hardware problems as they will be operating in a more stabile environment.  (seriously, MTBF in a good colo facility will be increased)
  • You don’t have to spend your nights and weekends worrying about the air conditioning failing!  Even if there is an issue, it is someone else’s problem.  Also, you can go out of town and even if a server fails you may be able to have someone else go push buttons for you.  You could even have a vendors tech dispatched to the site to work on your server without you needing to be there
  • You can buy Internet access from the datacenter (make sure you negotiate well on this and make sure the price per megabit keeps falling over time).  If you get upstream Internet from a solid in-building provider (that has good quality upstreams), you may not even need to purchase Internet routers.  All you need is a firewall tier and switch tier.
  • You can grow incrementally in a colocation facility as your needs grow, or cut back as they shrink (assuming you are not under contract).  When you run your own fixed equipment you are most likely to not be running your UPSs near full load where they are most efficient.  In your own facility, once you run out of capacity you are artificially constrained as that next server will require you to make another major investment

Downsides

  • If you already have a server room with a dedicated cooling unit and enough power, etc.. it may simply not make financial sense to move.  You may want to pursue a split model for the things that have higher uptime requirements.
  • You need to pay for connectivy from your office to the datacenter.  This is a new cost that did not exist before.
  • Connectivity from the colo facility to your office could be interrupted, bringing work to a halt (where as previously even if cut off from the Internet, employees could still access the servers which were in-house).
  • Touching your servers requires traveling to the datacenter.  This travel time takes away from productivity (though getting out of the office once in a while can be nice!)
  • The monthly cost of a datacenter can cause sticker shock when looking at it simply from a “new cash cost” standpoint, however, this can be offset by savings on network circuits (if negotiated at the same time).  More importantly, you must consider the “total cost of ownership” of running your servers in house both in terms of hard and soft costs.

Example designs

Depending on your specific business model, there may be a few different reference designs you could choose from:

  • I have worked with a medical practice that we literally took their entire closet of servers one weekend and moved them into a colo facility.  All that was left was three network switches.  I would call this the “full colo” model.
  • If you have significant amounts of remote users but still have some in house users that require high speed access to file servers or application servers, you may want to consider a “split model” where most of your equipment (and WAN core) is located in the colo facility, but certain high-bandwidth servers stay on-site (like your file server).
  • An organization with all their employees in house may choose to keep all their corporate IT servers at the office, but put any Internet facing servers (like hosted applications or the corporate web site) at a datacenter.  I have implemented this model several times at various software companies in the past.  You must consider what the uptime requirements of your various services are.  Generally, Internet facing services will have a larger audience  and so they need a higher level of reliability than internal IT services.
  • Another variation on this may be to just keep your WAN routing equipment at a datacenter and then have a single backhaul connection to the office where the servers are located.  If you have many remote sites that get connectivity from different providers, it may make more sense to terminate them in a cabinet at a colo facility and then use a single metro ethernet connection back to the office.

Final thoughts

So your convinced?

Great!  Now check out my post on how to choose a colocation facility, and if you are based in Portland, check out my list of all the available facilities, plus the Google map I put together of where they are all located!  These resources also outline your telecommunications provider options.

As always, please email me with feedback or post a comment!

-Eric

eprosenx Colocation, Network, Telecom

Sniffing SSL TLS Sessions with Wireshark

April 10th, 2009

As a Network Engineer I am frequently called upon to assist in troubleshooting connectivity and integration issues with our customers by capturing packets and providing analysis.  These clients make SOAP calls to our system across the Internet (or private networks) using SSL (TLS) encryption.  The encryption makes troubleshooting application issues and response time issues very difficult when everything inside the TCP stream is encrypted garbage.

If your system uses SSL offloading capabilities in a load balancer, you can sniff the unencrypted packets between the load balancer and the web server, however, this will not let you see Internet caused TCP flow issues (retransmissions, out of order packets, etc…), and very often (depending on your architecture) the load balancer will change the source IP of the connection to it’s own which makes tracking down the interesting flow very difficult.

I learned a new trick this week that allows me to give Wireshark my private SSL keys (the ones loaded on the web server or load balancer), which will allow it to decrypt the SSL encryption and show me the payload in clear-text.

This is actually written up pretty well on the Wireshark support wiki so I won’t spend much time rehashing that here, however, I do have a few notes:

  • You need to get a copy of your rsa private key.  Often times this will mean exporting the certificate from the keystore (on Windows boxes, etc…), or it will mean grabbing the .key file from wherever it is stored on the file system in linux.
  • You will most likely need to use openssl (or some other tool) to extract just the private key from whatever format you have available (you don’t care about the public key, the signature, or the certificate chain).
  • In my case I had a .key file of just the private key, however, it was encrypted with a password and I needed to use openssl to convert it to an unencrypted form that Wireshark can use.
  • As described in the Wireshark documentation, you must specificy which private key files to use for which server IPs.  This is specified in the Wireshark SSL preferences.
  • To actually decode an SSL stream, find a packet that is part of an SSL exchange, right click on it, and select “Follow SSL Stream”.  This is effectively the same as “Follow TCP Stream”, except it does the decryption.
  • If it does not seem to be working, in the Wireshark SSL preferences, turn on logging to a text file and try again.  Note that this is quite verbose so once you get your issues worked out I would not leave it on.
  • Note that you must capture the entire SSL key exchange process, otherwise it will not be able to decrypt the payload.  I have seen flows where Wireshark would only decrypt the data in one direction or another.

So remember, it is incredibly important to protect your SSL keys!  All too often they are left laying around on numerous web servers that may be directly exposed to the Internet.  Make sure your keys are backed up, on a flash drive, in a safe, somewhere offline, and then mark all the keys on Windows servers as non-exportable.  On other platforms make sure the keys are at least encrypted with a password themselves.  Centralizing your keys on a load balancer (rather than a bunch of web servers) can decrease your risk exposure.

Update:  There is actually a good reference on this on Novells web site as well.  Apparently Wireshark will not decrypt the DHE protocol so if the browser and server are using that it will not work.  I am not sure if newer versions of Wireshark do already or will handle DHE.

-Eric

eprosenx Network

How to choose a colocation facility

April 7th, 2009

Choosing a colocation facility is one of the most important decisions an IT professional can make.  It will have repercussions for years down the road, as there is generally a contract term associated, and it becomes difficult/costly to move.  At the same time, unless you are a facilities professional, it is hard to tell the difference between the quality of one facility vs. that of another without knowing the right questions to ask.  I have developed this list in the hopes that it will be a reference to folks evaluating datacenter options.  This has been written using the assumption that you need a local datacenter rather than a DR facility (which can have very different needs), however, many of the same concepts will apply.

Location

  • When it comes right down to it, there are still certain things you have to do physically in person. You can’t run a network cable through SSH or RDP. Having a datacenter close by makes a huge difference, especially when you lose remote connectivity and must go push a button in an emergency (we all have done this once or twice). In general, the newer, more high-end, and redundant your equipment is, the less you should have to touch it in person. Things are getting much better with out of band remote access controllers, but sometimes being there is worth a lot. You can’t hear that fan making funny noises from your office.
  • Does the facility have good access to transportation such as freeways and airports? Are their hotels nearby if you will have out-of-town contractors visiting? How close to logistics depots are you for your vendor-of-choices parts, i.e. Cisco, Dell, HP, etc…
  • Does the facility have adequate parking that is close to the building, does it cost money? Is it somewhere you want to leave your car in the middle of the night while you are inside working?
  • Do you have line-of-sight to the datacenter? If you can manage to get a wireless link to your datacenter this can be an extremely cost-effective option for high speed connectivity. There is something to be said for controlling your own destiny when it comes to your connectivity rather than being at the mercy of a telecom provider. Will the facility allow you to put a wireless antenna on the roof and how much will they charge?

Staffing

  • Do they have on-site staff 24×7 to respond to emergency situations, to secure the facility, and to provide access when you forget/loose your badge (or have to stop by on your way home from the gym).
  • If they do not have staff on site 24×7, what is their on-call policy? How long would it take them to respond to a power failure, a UPS exploding, a transformer catching fire in the parking lot, an Internet outage, an FM-200 fire suppression system going off, an HVAC system failing, or any other major malady (yes I have had all of these things happen to me in facilities I have worked in, and I am still waiting for the day a fire sprinkler goes off or there is a real fire in a datacenter).
  • What level of professional services can they provide? Basic remote hands (please press the power button)? More advanced troubleshooting (help diagnose a failed network switch)? Or even managed services (i.e. they take care of backups).
  • How competent are their NOC engineers, facilities folks, etc… What quality of vendors do they use to do electrical work, HVAC maintenance, network cabling? This can be hard to tell, but there are lots of small clues you can pick up on.
  • Does their staff speak English fluently and without heavy accent? It is extremely difficult to communicate on the phone with someone in a loud datacenter environment about complex technical issues when both of you are having a hard time understanding each other. This dramatically slows down the troubleshooting process and increases the chance of error.

Connectivity options

  • Do they provide Internet access themselves, or do need to contract with other providers (ala the Pittock Block)? Having a datacenter provide Internet connectivity (if they give you a reasonable rate) can be more cost effective than running your own routers, with multiple ISPs (assuming you don’t have special routing needs that require it). You do need to make sure your datacenter has good upstream providers, good quality routers, and competent staff to run them. Be careful to ensure your provider can absorb moderate sized DDoS attacks without equipment failure or running out of bandwidth. You don’t want your neighbors online dating site to come under attack and impact your Internet connectivity.
  • Are they “carrier neutral”? Will they allow you to bring in your own connectivity (Internet/WAN)? Or do they want a piece of the pie of everything (i.e. resell you everything)? Are they charging your chosen provider ridiculous fees to have “right of entry” into the building (which drives up your end user costs).
  • What fiber providers do they have available? – The more connectivity options you have available, the harder bargain you can drive with providers to get the best deal possible. If you need connectivity to many different sites, it is likely that some sites will be cheaper/better/faster to connect with one provider, and others will be cheaper/better/faster with another. A good example would be TWTelecom and Integra Telecom here in Portland Oregon. They each have extensive fiber optic networks around the metro area, but if you are trying to get from Infinity Internet to various locations around town, whichever has fiber closer to your destination will have a price/technical advantage to provide you service.
  • Who is the local exchange carrier? You might need a POTS (Plain Old Telephone Service) line or two for paging access, etc…
  • What do they charge for cross connect fees? If you order a $300/mo T-1 are they going to charge you $100/mo cross connect fee for the two pairs of phone wire to get it to your cage/cabinet?

Power Infrastructure

  • What type of power grid design are they on? Radial or interconnected? On a Radial system (such as you would find out in the suburbs), if a car crashes into a pole, or a backhoe takes out a single conduit, power will be lost. In an interconnected system there are multiple “primary” feeds connected to multiple transformers which energize a “secondary” bus that actually feeds power to the facility. This type of design significantly reduces single points of failure and allows entire transformers to be taken offline for maintenance without service interruption.
  • Is the power grid in the area above ground or below ground? Above ground systems are susceptible to windstorms, lightning, trees, etc… Below ground systems fall prey to backhoes, horizontal boring machines, water penetration, etc… In general, below ground is going to be more reliable.
  • If on a Radial system, do they at least have multiple transformers (preferably off of separate primary feeds) even if they are not tied together on the secondary bus? Often you will see two transformers with each feeding a separate power distribution system within the datacenter.
  • Are the transformers well protected from vehicles in the parking lot?
  • What type of electrical transfer switches does the facility have to switch between main power and generator power? Are they capable of “make before break” operation when switching to the generator during test cycles or planned outages? Can they operate as “make before break” when switching back to grid power after an outage? This is important as the most likely time for a UPS to fail is during switching. If you can minimize the number of voltage-loss events it will reduce the likelihood of UPS failure.
  • How many generators does the facility have? If multiple, is their distribution system setup in such a way that you can get separate power feeds in your cage/rack that come from completely independent PDUs, UPSs, Generators, and Transformers? Just because a facility has multiple generators/UPSs/Transformers does not mean they are redundant for each other, they could just be there to increase capacity.
  • Does the facility regularly test their generators *with* load applied (either the actual datacenter load, or a test load)?
  • Has the facility designed and more importantly, *operated* their system such that a failure of one UPS/Transformer/Generator does not cause an overload on other parts of the distribution system.
  • Does the facility participate in programs that allow the power utility to remotely start the generators and switch the facility over to Generator power to reduce grid loading? While this is good for the overall health of the power grid (and possibly the environment), it can be a liability to your equipment at the datacenter since more power transfer events will be occurring.
  • How much fuel is stored on site – how many hours does that represent? Does the facility have contracts for emergency refueling services?
  • Can the generator be re-fueled easily from the road, or is it located on the roof?
  • What type of UPS systems do they have? How old are they? How often are the batteries tested and replaced? Can they take their UPS offline for maintenance without impacting customer power?
  • Can they provide you custom power feeds for equipment such as large Storage Area Networks or high power blade enclosures? (i.e. you need a 3 phase 208 volt 30 amp circuit)

Cooling

  • Do they use many direct expansion cooling units, or do they have a water/glycol loop with a cooling tower? Or do they even use chilled water? Each of these has it’s pros and cons, however, the multiple direct expansion model is very simple and redundant in that you likely have many individual units (it is not as energy efficient though). The trick is controlling the HVAC units to not “fight” each other, causing short-cycling on the compressors.
  • Are the cooling units designed for datacenter usage (running 24×7x365), with the ability to control humidity within reasonable levels, or are they made for office cooling applications with expected usage of 10 hours a day?
  • If the facility uses cooling towers for evaporative cooling processes, do they have on-site water storage to provide water during utility outages (such as after an earthquake). Are all parts of the cooling loop system redundant (including the control system).
  • Does the facility maintain and enforce hot/cold aisle design? This is becoming critical as power densities increase and power efficiency becomes critical.
  • Does the facility have an outside air exchange system to provide “free” cooling during the months of the year that outside air is of appropriate temperatures? While good for the environment, you must be careful about the outside air’s humidity as well as the dust/pollen that could come in with outside air. There is a dramatic difference between servers that have been in a quality datacenter for a few years, vs. ones with poor HVAC systems for a few years. I have removed servers from facilities before that have not gotten a speck of dust on them and others that are caked in black dust (depending on the facility they were in).
  • Is the entire cooling system on a single generator, or is it spread across multiple units for redundancy?

Cages/Racks

  • Does the facility provide Cages? Cabinets?  Or both?  These days most everything will fit in standard square hole cabinets, however, in some cases if you buy large enough equipment it might come with its own racks or as a freestanding unit that cannot go in cabinets provided by the facility. If you go with a cage you must carefully plan how much space you are going to need ahead of time. Adding additional cabinets as needed can be an effective growth strategy, though you must plan for network and SAN cabling between them.
  • If you get a cage (or just custom cabinets) make sure to agree upon who will bolt down your cabinets and how much it will cost.  This can be particularly tricky on raised floors to properly secure them in the event of an earthquake.  Any work done must be properly done to not throw dust into the air and to mitigate any potentially harmful vibrations that could impact running equipment.
  • One gotcha I have run into before is that some facilities cabinets are not deep enough for modern servers (specifically some Dell servers). I also have been shocked to find many facilities that still are leasing ancient cabinets that are telco-style with solid doors on them. Modern equipment requires front-to-back airflow, not bottom-to-top as was the old telco style. Also note that most network equipment is still uses side-to-side airflow and is best suited in two-post telecom racks (where possible) rather than four post server cabinets.
  • When selecting a colo facility make sure to specify exactly what type of cabinet you are expecting in the contract if they have multiple types available.
  • Modern cabinets have built in mounting holes/brackets for vertical mount PDU’s which are becoming the standard.  This allows you to use very short (think 2 foot) power cables to attach servers without excess slack.  They also do not take up usable rack space.
  • Modern cabinets should also have a way to cleanly route cables vertically (think about power cables, network cables, fiber SAN cables, etc…)
  • Does the facility provide PDU’s in the cabinets for you, or are you responsible to provide them yourself?  It is critical that your PDU’s have power meter displays on them as power in a datacenter is typically very expensive and so you want to load them up as much as possible for peak cost efficiency, while not risking tripping a circuit breaker (never load a circuit more than 80% it’s rated capacity – which means 16 amps on a 20 amp circuit, or 24 amps on a 30 amp circuit).  When plugging dual power supply servers into different circuits, ensure that in the event one circuit blows the other can handle the entire load without blowing.
  • What type of power plugs will they be delivering in your rack/cage?  I recommend locking plugs like an L5-20 or L5-30 to plug your PDU’s into (even though a NEMA recepticle can handle the current capacity in 20 amp circuits).  Also common these days is using 208 volt 30 amp circuits with an L6-30 receptacle.  Most everything manufactured in the last 5 years is capable of accepting 208 volt power.  Using the higher voltage allows you to have more equipment in a cabinet with fewer circuits which also means less PDU’s.

Fire suppression

  • Is the structure made of metal and concrete, or of wood?
  • Does it have traditional “wet-pipe” sprinklers, “dry-pipe” sprinklers, or “pre-action” sprinkers”? Or even none at all? If an electrician hits a sprinkler head with a ladder in either a “wet-pipe” or “dry-pipe” system, it will immediately release large amounts of water until the fire department shows up to turn it off. Pre-action systems require both a smoke sensing system to alarm, as well as heat setting off a sprinkler head in order to let water flow.
  • What type of fire detection system does the facility have? Standard smoke sensors, and/or VESDA sensors?
  • Does the facility have an inert gas fire suppression system such as FM-200, Inergen, or Halon? An inert gas system will deploy if two smoke sensors are deployed, and hopefully extinguish the fire before it can set off a water based system (typically still required to meet fire code). In reality though, I have never seen modern computer equipment really catch fire. Most of it does not burn very well (as long as you don’t store cardboard in the datacenter).
  • Who are your neighbors within the building? Are any of them high risk?
  • How old is the building’s fire suppression system? You might be in a suite within the building that has the latest and greatest fire control, but if the rest of the building has a simple fire panel from 1970 and no sprinklers, it could still burn to the ground. Upgrades to fire control systems are generally not required unless the building owner does a major renovation.

Physical facility

  • What is the risk of water damage to your equipment? Are you right below a poorly maintained roof? Are there non-pre-action sprinklers above you? Is there a domestic water pipe above your cage? Bathroom drains from the tenant above? Storm drain pipes from the roof? Condensate drains from the HVAC system? Cooling loop pipes? Note that if a fire sprinkler goes off several floors up it can seep down through cracks between floors you never knew existed into your equipment.
  • Is the facility located in a flood plain? Is it below ground level? There are places in Portland that have water mains large enough to cause localized flooding if they break.
  • Does the building have a convenient loading dock for receiving equipment? What is the largest equipment that will fit into the building and up the elevator? This is a problem in many older buildings.
  • How large is the space you are in (by volume) compared to the equipment load? If cooling was lost (say because the fire alarm inadvertently went off which shuts down all HVAC), how much thermal buffer is there to keep the temperature from rising too much until the system is reset?
  • Is there a grid of ceiling tiles above you? If so, it will probably fall down and create dust in an earthquake. I would rather see all of the piping and mechanical systems on the ceiling anyway rather than let them be hid above a ceiling grid.
  • Is the facility on a slab floor or raised floor? It is easier to effectively bolt things down to a slab floor for seismic purposes, but a raised floor can also conveniently provide space for electrical power and cables. It is becoming less feasible for cooling purposes however, since density is increasing so much.
  • What is the seismic rating of the facility? How much will it shake your equipment in an earthquake and will the building be damaged to the point that it is unsafe to continue operation?
  • Do they have requirements about what types of equipment you can put in the datacenter? i.e. if in a traditional telco facility certain ratings may be required.
  • Is the facility well kept and “clean”?  This can tell you a lot about the quality of the facility.  It is hard to tell if proper maintenance is being done at scheduled intervals on their power equipment, but if a facility can not simply keep cables managed properly it is a likely sign that they are skipping other non-visible things as well.

Creature comforts

  • Does the facility have comfortable areas for you to work while on-site (i.e. a conference room) or do you have to spend all your time on the cold/loud datacenter floor?
  • Do they provide “crash carts” (i.e. a portable keyboard, monitor, mouse) to utilize if you don’t have your own KVMs?
  • Do they have vending machines or refreshments when you need that late night pick-me-up?
  • Will they accept deliveries for you? Do they have someone at the facility during business hours? I find this to be *very* important.
  • How good is the cell phone coverage for the specific provider(s) you care about?
  • Do they have a guest wireless network you can jump on while you are working there to easily get Internet access without having to provide it yourself?

Security

  • How do they control access to the facility? Is it manned, or unmanned? If they have an access control system does it have biometric features?
  • Do they have security cameras? How long is the footage kept for?

Pricing

  • How much do they charge you per cabinet, or per square foot of space?
  • How much does power cost? Is it per provisioned circuit, or based on actual usage? What is their pricing model? Note that it is more and more common to need 208 volt circuits, or three phase circuits with modern blade enclosures and SANs. It is no longer just increments of 20 amp 110v circuits.
  • Will they provide you second power feeds at a reduced price if you are only going to be using them for failover? Note that these second feeds may cost them UPS, Generator, etc… capacity they must plan for, however, you won’t be utilizing electricity from them (which they must pay the utility company for) or loading their total feed capacity from the utility since they are just for redundancy.
  • Can you get price guarantees for future expansion (power costs, cabinet costs, etc…)?
  • Does the facility want to sell you completely managed services and as such makes colocation costs un-tenable?
  • Do they provide some amount of basic remote hands service hours each month? How much do they charge for professional services?
  • Does the facility provide service-level-agreements (SLAs) that have teeth? Frankly, I don’t put much faith in SLAs since usually they only involve a credit for the period of time service is unavailable. This generally is nothing in comparison the amount of money you lose when your datacenter goes down or your costs in man-hours to bring it back up.

Switching Costs

  • Once you move into a facility there can be significant (if not astronomical) switching costs. They may offer you a smoking deal to get you in the door, and then make it up by charging higher-than-market-rate for add on services down the road. Realize that you are inevitably likely to need more power down the road, and more bandwidth. Also realize that bandwidth costs fall steadily so you don’t want to get locked in for long term rates on telecommunications circuits. It is also possible in the long term for your needs to go down in the future as virtualization gets more popular, “cloud computing” becomes a reality, and computers become more efficient.
  • Contracts are normally in place to protect the provider, but they can also protect you. If you get a smoking deal on something, locking it in for a term commitment can be a good idea. It is reasonable for a provider to require a contract term as they do have significant capital and sales costs that they need to cover. Also, realize that the average lifespan of a datacenter is not all that long these days. A datacenter built 7 years ago has nowhere near the cooling capacity required in a modern datacenter.
  • Think about your growth pattern. You don’t want to be paying ahead of time for service you don’t need/use, but you also don’t want to get hit for huge incremental costs to add cabinets/power down the road. Contracts with “first right of refusal” clauses built into them (on additional space/capacity) are common.
  • Think about how difficult it will be for you to pick up and move at a later date. Some of the most “sticky” items are storage area networks. It might be easy to move a few servers at a time, but if you are all dependent on a single Storage Array, everything connected to it must move at once.
  • Telecommunication circuits also increase your “stickiness”. They are generally under term commitments and can be difficult to coordinate a move at a specific time. If you have a circuit from XO and move to a facility that does not have XO fiber, you might have to switch providers, or pay someone else for the local loop.
  • If you are purchasing Internet connectivity from your datacenter you are most likely being assigned IP addresses from their address space. When you move or change providers you will need to re-number. Depending on your network design and use cases, this might be easy, or an extremely difficult task.

Final Words

While there are numerous factors to consider, the reality is that there are likely a number of providers in town that can meet your needs successfully.  The reliability level of Portland’s power grid and of datacenter equipment is getting so high that we are really “chasing nines” to get ever so slightly more uptime (for dramatically higher cost).  For most organizations, being in a datacenter with only a single generator provides plenty of uptime.  Is that extra 0.009% uptime really worth it to go from “four nines” to “five nines”?  That is an increase of 47 minutes of uptime.  Is that worth doubling your costs?

Perhaps one of the most important aspects to your decision is the relationship you build with the owners, management, and staff of your colocation facility.  You want to have as much of a “partnership” as possible, and not merely a buyer/seller relationship.  Finding a facility with a long history of treating their customers well will increase your chances of success.

If you have any comments/questions feel free to post below, or shoot me an email.

-Eric

eprosenx Colocation, Network, Telecom, Wireless