Archive

Archive for April, 2009

Sun X4100 and X4200 Lower Non-critical going low

April 29th, 2009 4 comments

For over a year now our team of oncall engineers has been tortured by an error generated periodically by our racks of Sun X4100 and X4200 servers.  These alerts come from the integrated ILOMs which we have set to syslog to our EM7 monitoring platform.  Usually about once a week one of our many servers will report something along the lines of the following error:

FIRST REPORTED: 2009-04-29 14:50:33

LAST REPORTED: 2009-04-29 14:50:34

 

SEVERITY: CRITICAL

OCCURRENCES: 2

SOURCE: Syslog

ORGANIZATION: Management

DEVICE: prsun1-sc

 

Full message text for most recent occurrence:

<130>logmgr: ID = 343 : Wed Apr 29 14:52:39 2009 : IPMI : Log : critical : ID =   7f : 04/29/2009 : 14:52:39 : Voltage : mb.v_+12v : Lower Non-critical going high : reading 12.16 > threshold 10.96 Volts

 

This event has not been acknowledged

 

Sent by notification policy: Major/Critical Events

 

The EM7 has received a CRITICAL syslog notification from this server.

If you go look at the event log on the ILOM it looks more like this:

04/29/2009 : 14:52:39 : Voltage : mb.v_+12v : Lower Non-critical going high : reading 12.16 > threshold 10.96 Volts
04/29/2009 : 14:52:38 : Voltage : mb.v_+12v : Lower Non-critical going low : reading 7.37 < threshold 10.96 Volts

Looking at the event log another server with the same type of issue, the error is for a different sensor, but yet it has the same behavior:

02/21/2009 : 06:25:01 : Voltage : p1.v_vddio : Lower Non-critical going high : reading 1.85 > threshold 1.60 Volts
02/21/2009 : 06:24:55 : Voltage : p1.v_vddio : Lower Non-critical going low : reading 0.97 < threshold 1.60 Volts

I should note that these errors *never* seem to turn out to be anything but noise…  We all just acknowledge the alarm and go back to bed.

This week I finally got annoyed enough to go look further into this issue as I do participate in the on-call rotation which covers these systems (even though I don’t *own* these systems).

After doing some digging, I found the following obscure note in the release notes for some firmware update bundle which includes ILOM firmware:

ILOM Service Processor firmware 2.0.2.5
  * Fixed the bug of lower non-critical voltage sense issue.

So I have gone ahead and upgraded a couple of my servers thus far.  Hopefully this will resolve the issue!

I have to get in a couple of jabs at Sun here since I burned an entire day today messing with their servers:

  • When you upload the ILOM firmware (which includes a system BIOS upgrade also)  your server may get powered off during the upgrade without any warning.
  • When you upgrade to a 2.0 BIOS from a 1.x version, you have to manually clear the CMOS according to their release notes (the update utility seriously could not do this for us?)
  • And my personal favorite, their documentation makes some obscure reference to some bug you might run into and so they tell you that you must upload the new firmware *twice* in order to ensure it applied properly.  Mind you they don’t tell you what the problem you might run into is, and they give you no way to tell if the person that upgraded the firmware for you previously did the double firmware update properly.
  • After the ILOM firmware and system BIOS updates I did today, the servers somehow managed to change the device ID’s (or something) on the onboard NVIDIA NICs in such a way that Windows recognized them as new NIC’s (5 and 6).  This caused them to loose all IP settings and I had to log in through the ILOM and reset them.  This happend on the two servers I upgraded.
  • To upgrade the RAID card firmware/BIOS you must boot the server from a CD that runs DOS.  Note that on a Dell box you drop in the Openmanage CD, it scans your system to determine what needs updating to get you to a “known good set” of drivers, and you click the go button.  It takes care of all Firmware/Drivers/Software for you.
  • The LSI software for Windows to monitor the built in RAID card is a joke.  It looks like an intern wrote it.
  • At least Sun does provide a streamlined Windows driver installer package, this did work well.

Overall, I am not completely thrilled with Sun’s x86 hardware lines, though I suppose things may be better if you are a Solaris-on-x86 shop.

-Eric

UPDATE 5/13/09

I got another voltage error on one of my fully updated servers.  I have called Sun and opened another case on this, though so far Tier 1 and Tier 2 techs do not seem to have any ideas as to what is causing this issue.  I sent them a bunch of output from the ipmi tool that they are looking through.

ID = 1 : 05/10/2009 : 23:58:42 : Voltage : p1.v_vtt : Upper Non-critical going high : reading 1.79 > threshold 1.00 Volts

I should also note that after the firmware updates, one of the machines is now reporting ECC errors.  This makes me wonder if the previous firmware was not properly reporting them.  We have had almost zero RAM problems with our dozens of Sun x86 servers which makes me worry that they are just hiding their problems.  I must say the server handled the failure gracefully.  It was getting dual bit (uncorrectable) ECC errors and so upon boot it disabled the two (of four) offending DIMMS.  Very nice.

Also, I would like to take a moment to comment on Sun’s build quality in the x4100 and x4200 servers.  I opened a couple of them up today for the first time and I must say, I am *very* impressed with the physical build quality.  Sun has some very talented hardware engineers (almost over-built I would say).  The servers are made from some heavy gauge metal among other things.

So while I have changed my mind a bit on Sun’s build quality, they are certainly lacking some of the finer touches needed for x86 servers.  Their out of band management controllers (previously ALOM’s, now iLOM’s) have been quite the fiasco for us.  They also are a royal pain to bring all the different firmwares/drivers up to “known good sets”.  Dell has quite a nice tool for this.

One of the tech’s also did mention that there was a firmware update for the power supplies to keep them from powering the machine off in the event of a momentary power loss (like as a UPS kicks in).  Apparently they are programmed to power down after 20ms of lost power.  They should be able to run for over 100ms even after power is lost.

Categories: Uncategorized Tags:

Verizon and Verizon Business don’t peer in Portland

April 28th, 2009 1 comment

I discovered last night that Verizon Business (aka UUNET, MCI, alter.net, AS701) and Verizon proper (i.e. the Local Exchange Carrier here in Portland, AS19262) don’t appear to peer here.  That is a major shame since I am on Verizon FiOS and I can’t even access other businesses that use Verizon Business as their ISP here in Portland without bouncing of Seattle.

Check out this traceroute from my router on my FiOS connection to SilverStar Telecom who uses Verizon Business as one upstream:

plunger#traceroute www.silverstartelecom.com

Type escape sequence to abort.
Tracing the route to www.silverstartelecom.com (12.111.189.3)

  1 L100.PTLDOR-VFTTP-01.verizon-gni.net (72.87.39.1) 4 msec 4 msec 4 msec
  2 P2-3.PTLDOR-LCR-01.verizon-gni.net (130.81.32.164) 4 msec 4 msec 4 msec
  3 so-7-3-0-0.SEA01-BB-RTR1.verizon-gni.net (130.81.28.160) 8 msec 8 msec 8 msec
  4 0.so-7-1-0.XT1.SEA7.ALTER.NET (152.63.105.57) 8 msec 8 msec 8 msec
  5 0.so-6-2-0.XT1.POR3.ALTER.NET (152.63.105.233) 12 msec 16 msec 12 msec
  6 POS6-0-0.GW9.POR3.ALTER.NET (152.63.104.249) 12 msec 16 msec 12 msec
  7 IT-S-Star-gw.customer.alter.net (157.130.177.118) 12 msec 16 msec 12 msec
  8 sst-pit-6509-gi25-2-gsr12-gi60.silverstartelecom.com (66.206.80.21) 12 msec 16 msec 12 msec
  9 www.silverstartelecom.com (12.111.189.3) 12 msec 16 msec 16 msec
plunger#

What a bummer.  I hope they rectify this situation soon!

-Eric

Categories: Uncategorized Tags:

What upstream ISPs is your provider peered with?

April 27th, 2009 1 comment

When evaluating a hosting provider, colocation facility, or an ISP, one of the most important aspects is “How well peered are they?”  In this day and age you certainly want to go with an organization that has redundant connections.  In general, the more entities your partner is directly connected to, the less impact individual failures will have, and the lower your latencies for connectivity will be.

The best way to quickly determine who a given provider is peered with is by looking at BGP routing tables as seen by other networks in the world.  We are very fortunate that the Route Views Project is available, which is based out of the University of Oregon (I feel dirty now linking to U of O since I am a Beaver after all).

The route views project maintains a number of routers that are peered with routers from numerous different backbones.  These peering sessions exist not for the purpose of routing packets, but instead so that people can login to a route-views router and see what other networks think the best route is to someplace, and also so that the folks from the route views project can log data in order to allow various analytics later down the road.

Let’s say you are interested in determining the upstream peers for SilverStar Telecom (an ISP located in Portland with their routing core in the Pittock building).  You must first determine an IP address that resides within their network.  For the sake of this example we will do a dns lookup on www.silverstartelecom.com which resolves to 12.111.189.3.

Once you have an IP you wish to look up, telnet to route-views.routeviews.org and login as username “rviews”:

                    Oregon Exchange BGP Route Viewer
          route-views.oregon-ix.net / route-views.routeviews.org

 route views data is archived on http://archive.routeviews.org

 This hardware is part of a grant from Cisco Systems.
 Please contact help@routeviews.org if you have questions or
 comments about this service, its use, or if you might be able to
 contribute your view.

 This router has views of the full routing tables from several ASes.
 The list of ASes is documented under “Current Participants” on
 http://www.routeviews.org/.

                          **************

 route-views.routeviews.org is now using AAA for logins.  Login with
 username “rviews”.  See http://routeviews.org/aaa.html

 **********************************************************************
User Access Verification

Username: rviews
route-views.oregon-ix.net>

Issue the “show ip bgp 12.111.189.3” command

route-views.oregon-ix.net>show ip bgp 12.111.189.3
BGP routing table entry for 12.111.189.0/24, version 17338865
Paths: (33 available, best #22, table Default-IP-Routing-Table)
  Not advertised to any peer
  7660 2516 3356 32869
    203.181.248.168 from 203.181.248.168 (203.181.248.168)
      Origin IGP, localpref 100, valid, external
      Community: 2516:1030
  3549 1239 32869
    208.51.134.254 from 208.51.134.254 (208.178.61.33)
      Origin IGP, metric 0, localpref 100, valid, external
  3582 3701 32869
    128.223.253.8 from 128.223.253.8 (128.223.253.8)
      Origin IGP, localpref 100, valid, external
      Community: 3582:466 3701:392
  701 32869
    157.130.10.233 from 157.130.10.233 (137.39.3.60)
      Origin IGP, localpref 100, valid, external
  3333 3356 32869
    193.0.0.56 from 193.0.0.56 (193.0.0.56)
      Origin IGP, localpref 100, valid, external
  7500 2497 701 32869
    202.249.2.86 from 202.249.2.86 (203.178.133.115)
      Origin IGP, localpref 100, valid, external
  3277 3267 9002 3356 32869
    194.85.4.55 from 194.85.4.55 (194.85.4.16)
      Origin IGP, localpref 100, valid, external
      Community: 3277:3267 3277:65321 3277:65323
  2828 7018 32869
    65.106.7.139 from 65.106.7.139 (66.239.189.139)
      Origin IGP, metric 3, localpref 100, valid, external
  2914 7018 32869
    129.250.0.11 from 129.250.0.11 (129.250.0.51)
      Origin IGP, metric 5, localpref 100, valid, external
      Community: 2914:420 2914:2000 2914:3000 65504:7018
  2914 7018 32869
    129.250.0.171 from 129.250.0.171 (129.250.0.79)
      Origin IGP, metric 1, localpref 100, valid, external
      Community: 2914:420 2914:2000 2914:3000 65504:7018
  852 174 7018 32869
    154.11.98.225 from 154.11.98.225 (154.11.98.225)
      Origin IGP, metric 0, localpref 100, valid, external
      Community: 852:180
  852 174 7018 32869
    154.11.11.113 from 154.11.11.113 (154.11.11.113)
      Origin IGP, metric 0, localpref 100, valid, external
      Community: 852:180
  12956 1239 32869
    213.140.32.146 from 213.140.32.146 (213.140.32.146)
      Origin IGP, localpref 100, valid, external
      Community: 1239:100 1239:123 1239:999 1239:1000 1239:1010 12956:321 12956:
4003 12956:4030 12956:4300 12956:18500 12956:28430 12956:28431
  3582 3701 32869
    128.223.253.9 from 128.223.253.9 (128.223.253.9)
      Origin IGP, metric 0, localpref 100, valid, external
      Community: 3582:466 3701:392
  8075 3356 32869
    207.46.32.34 from 207.46.32.34 (207.46.32.34)
      Origin IGP, localpref 100, valid, external
  286 3549 1239 32869
    134.222.87.1 from 134.222.87.1 (134.222.86.1)
      Origin IGP, localpref 100, valid, external
      Community: 286:18 286:19 286:29 286:888 286:900 286:3001 3549:2355 3549:30
840
  16150 3549 1239 32869
    217.75.96.60 from 217.75.96.60 (217.75.96.60)
      Origin IGP, metric 0, localpref 100, valid, external
      Community: 3549:2773 3549:31208 16150:63392 16150:65321 16150:65326
  2905 701 32869
    196.7.106.245 from 196.7.106.245 (196.7.106.245)
      Origin IGP, metric 0, localpref 100, valid, external
  3561 701 32869
    206.24.210.102 from 206.24.210.102 (206.24.210.102)
      Origin IGP, localpref 100, valid, external
  3257 3356 3356 3356 32869
    89.149.178.10 from 89.149.178.10 (213.200.87.91)
      Origin IGP, metric 10, localpref 100, valid, external
      Community: 3257:8091 3257:30042 3257:50001 3257:54900 3257:54901
  4826 3356 32869
    114.31.199.1 from 114.31.199.1 (114.31.199.1)
      Origin IGP, localpref 100, valid, external
  3356 32869
    4.69.184.193 from 4.69.184.193 (4.68.3.50)
      Origin IGP, metric 0, localpref 100, valid, external, best
      Community: 3356:3 3356:22 3356:90 3356:123 3356:575 3356:2012 65002:0
  6079 3356 32869
    207.172.6.20 from 207.172.6.20 (207.172.6.20)
      Origin IGP, metric 0, localpref 100, valid, external
  6079 3356 32869
    207.172.6.1 from 207.172.6.1 (207.172.6.1)
      Origin IGP, metric 0, localpref 100, valid, external
  812 6461 701 32869
    64.71.255.61 from 64.71.255.61 (64.71.255.61)
      Origin IGP, localpref 100, valid, external
  6939 3549 1239 32869
    216.218.252.164 from 216.218.252.164 (216.218.252.164)
      Origin IGP, localpref 100, valid, external
  1668 7018 32869
    66.185.128.48 from 66.185.128.48 (66.185.128.50)
      Origin IGP, metric 511, localpref 100, valid, external
  6539 3561 1239 32869
    66.59.190.221 from 66.59.190.221 (66.59.190.221)
      Origin IGP, localpref 100, valid, external
  1221 4637 3356 3356 3356 32869
    203.62.252.186 from 203.62.252.186 (203.62.252.186)
      Origin IGP, localpref 100, valid, external
  6453 1239 32869
    195.219.96.239 from 195.219.96.239 (195.219.96.239)
      Origin IGP, localpref 100, valid, external
  7018 32869
    12.0.1.63 from 12.0.1.63 (12.0.1.63)
      Origin IGP, localpref 100, valid, external
      Community: 7018:2000
  6453 1239 32869
    207.45.223.244 from 207.45.223.244 (66.110.0.124)
      Origin IGP, localpref 100, valid, external
  2497 701 32869
    202.232.0.2 from 202.232.0.2 (202.232.0.2)
      Origin IGP, localpref 100, valid, external
route-views.oregon-ix.net>

This will give you an extensive list of routes which you can use to reach SilverStar.  On the first line of the output you can see that 12.111.189.0/24 is the most specific route in the BGP table that matches 12.111.189.3.  Below that line, are a number of entries, each starting with a list of AS numbers on the least-indented line.  Let’s use the 4th item as an example.  It simply contains 701 32869.

If you look up the rightmost ASN (which is the originating ASN for this prefix), you will see that it is registered to SilverStar Telecom (as you might expect).  To look this up you can go to www.arin.net and enter AS32869 into the whois search box.

Now lets take a look at the AS number directly to the left of 32869 which in this case is also the first entry in the list, 701.  By virtue of being adjacent in the list, this means that SilverStar telecom advertised 12.111.189.0/24 to AS 701.  Furthermore, since 701 is the first leftmost entry in the list, it tells us that AS 701 peers directly with the route-views router.  If you look up AS 701 you will see it is registered to MCI (aka Verizon Business).  So Verizon Business is one of SilverStar Telecom’s upstream providers.

Let’s move on and take a look at the third entry in the list, 3582 3701 32869.  If we translate those entries to entity names by using whois, we can see it equates to University of Oregon -> NERO Net -> SilverStar Telecom.  In this case, SilverStar peers directly with NERO (presumably across NWAX).  Granted I am certain NERO does not provide “transit” for SilverStar, but it is notable in that SilverStar makes the effort to connect with others locally.

Now to speed up this process a bit, all we really care about is what AS number is just to the left of SilverStar’s ASN (32869) in each entry (that we have not already looked up and recorded.  Using this method I have generated the following list:

  • 3356 – Level 3 Communications
  • 1239 – Sprint
  • 701 – MCI (Verizon Business)
  • 7018 – ATT
  • 3701 – NERO (Network for Education and Research in Oregon)

I must say, that is pretty impressive connectivity for Portland.  Verizon Business and ATT both actually have routing cores in Portland.  Sprint and Level 3 don’t and so you have to terminate circuits on routers in Seattle (or California).

That is all there is to it.  You simply login to the route views router and see what other routers think their best pathshould be to the network in question.  It is worth noting however that this is certainly not a 100% full view of the world.  It is very likely that SilverStar peers directly with other organizations (for non-transit traffic) but that we have no visibility into that since none of the downstream routers from that peering share their view of the world with the route views project.

For the most part however, the route views project has visibility into enough sites to see which major backbones a given ISP is attached to.  One other caviot to add however is that this will only give you an idea of how traffic gets *to* SilverStar Telecom, and not what outbound routes from SilverStars network packets will take.  It is possible that SilverStar is also hooked to another ISP (like XO communications) but that for some reason they don’t advertise 12.111.189.0/24 out that connection, or they use some metric to make it the least preferred route.  SilverStar may still route traffic out the XO connection even though no traffic comes in that way (I know for a fact though that SilverStar is not hooked to XO).

So go ahead and check out who your ISP is peered with!  You may be plesantly surprised (or disappointed).  This is a great way to double check what the sales droids tell you.  I have seen cases where ISP’s continue to maintain even a single T-1 to a provider in order to say that they are connected to them, while in reality they don’t route any traffic with them.  (or more likely, they have IP address space that belongs to that provider that they don’t want to have to re-number)

-Eric

Categories: Network, Telecom Tags:

Cisco PIX/ASA VPN Client for 64 Bit Windows

April 23rd, 2009 3 comments

For quite some time now I have been annoyed at Cisco for not releasing a 64 bit edition of their IPSec VPN client.  As far as I am concerned, their plan has been to force everyone over to the AnyConnect VPN client (SSL VPN) which does support 64 bit clients (i.e. Windows Vista 64 bit).

Oh, and by-the-way the AnyConnect Premium client costs $1,250 (MSRP) for a 10 concurrent user license on a 5510, where as the IPSEC client is FREE for unlimited users.

On a recent trip to Costco I was amazed at what percentage of the systems now being sold were coming with 64bit Windows Vista.  There are becoming more-and-more home users that can’t VPN in anymore with the IPSEC client.

While I am still unhappy with Cisco for artificially forcing people over to AnyConnect client, there is some amount of relief in sight to this issue.

Cisco just announced this week at RSA some new features that will be included in the ASA 8.2 code.  One of which is a new AnyConnect license level “AnyConnect Essentials” which according to my sources will be “Almost Free” (you can decide what that means for yourself).  This license will provide basic VPN  access, but not include the clientless web portal stuff or Cisco Secure Desktop (basically the stuff that I would rather not have to support anyway).

The other cool feature I am looking forward to is a new Botnet detection capability.  Basically the ASA will periodically download signature files from Cisco that tell it what traffic to look for.  If the ASA observes internal machines connecting out to known Botnet controllers it will be able to report on them.  There will be a yearly fee to enable this, so I am curious to see what they charge.

No word yet on availability dates for 8.2 or official pricing.  You can find more details on what is in 8.2 here.

-Eric

Categories: Cisco, Network Tags:

Submarine Undersea Cables Landing in Oregon

April 20th, 2009 No comments

Until recently I never really realized how significant a role Oregon plays in the Pacific undersea cable business.  Apparently it is easier to get permits to land cables in Oregon than it is in California or Washington and our undersea geography is conducive to such projects.

As I have dug deeper into this topic, I put together a spreadsheet of all the different cables that land in Oregon (that I am aware of).  As usual, if I am missing anything, please send me an email.

I am working to add the cable landing stations and cable termination stations to my Portland/Oregon telecom map.

I find it disappointing how little “peering” we have going on in Oregon considering the amount of bandwidth flowing through the state up to Washington, down to California, East to Boise, off the coast to Alaska, and further West to Asia.  With the addition of the new immense capacity of the TPE cable Oregon has even more data flowing through it than ever before.

And just in case you want to know more about these cables, here is a list of all the references I have come across:

-Eric

Categories: Network, Telecom Tags:

Reasons to move into a colocation facility

April 17th, 2009 No comments

Every organization I have worked at has done battle with our long lasting enemies,  “Space”, “Power”, and “Cooling” (not to mention “Bandwidth”).  These three (or four) items seem to be the bane of ITs existence.  There is seemingly an endless demand for more capabilities for the business, which means more applications, which means more servers, which require “Space”, “Power”, “Cooling” and “Bandwidth”.

We are called upon to look into our crystal ball to determine how our needs will grow (or shrink?) over the next several years so that we can purchase the right Generator, UPS, and Cooling equipment.  And once we procure said space, power, and cooling, we spend our days, nights, and weekends making sure maintenance is performed or responding to emergencies.

I am here to tell you there may be a better way!  (I say *may* because this is not a one-size-fits all solution)  In many cases you can make an excellent business case for outsourcing these pain points that might be cash neutral, or even save money!  In the rest of this post I will outline a number of factors to consider in your decision making process:

Space

  • Is your office space very expensive per square foot?  Could you be using that high cost “Class A” space for something more appropriate (you would be surprised at the number of server rooms that have a view).  Are you otherwise space-constrained in some way that makes moving the servers elsewhere attractive?
  • Are your servers good neighbors to those working around them (i.e. are they too loud or do they create too much heat)?
  • Could someone break down the door on the weekend and steal your servers with all your client data?  i.e. is your facility adequately secure?  Are you required to meet PCI compliance, etc…?
  • Does your building have an inadequate fire system or other high risk tenants?  Is your sprinkler system a “wet pipe” system that could get bumped with a ladder and flood your servers?
  • Is the physical environment appropriate for servers?  Is there a lot of dust in the air, vibration from machinery, etc…  I have seen network equipment in mechanical rooms along with steam heat exchangers and janitorial closets with mop cleaning basins.
  • What is the seismic rating of your facility?  In a natural disaster your employees may be able to work remotely if your servers are still available across the Internet.
  • Do you want to make the investment in your current facility to provide an adequate environment for your servers?  (i.e. cooling, UPS, generator, fire suppression, etc…)  If your lease is short term or nearly up and you are not planning on moving, it may not make sense to spend any of your own capital.

Power

  • Do you have access to enough power to run your datacenter in house?  Might you need to bring in extra transformer/switchgear/riser/breaker capacity for expansion?
  • Does your facility have three phase power available? (which is required for many large UPS units and some new SAN’s, blade enclosures, etc…)
  • Are you in an industrial business park with other businesses that have large motors starting and stopping all the time?  These can cause surges/spikes/brownouts and they increase the likilyhood of causing you a power outage (especially if you share a transformer).
  • Do you have a good quality double online conversion UPS, or just random small line interactive UPSs?
  • Does your facility have a generator that can support your servers AND your cooling needs? (not to say that all environments *need* a generator)
  • How reliable is the power at your office?  Do you have a history of power problems there?
  • Do you pay for power usage at your office (i.e. are you sub-metered) or do you just get billed a portion of the overall bill split amongst the tenants?

Cooling

  • If you are in a standard commercial office building, there is a decent chance that your server room is cooled by the main building cooling units.  Most building leases provide for cooling during normal business hours (8-5pm Monday through Friday). Have you been in your server room on the weekends?  Is it 90 degrees in there?  If you want to have cooling available 24×7 you may need to pay the owner a lot more money to have them run the main building hvac units at all times.  In addition to being potentially costly, this is not very good for the environment.
  • If your server room is cooled by the main building cooling units and let’s say they are even running 24×7 you must remember that they are designed for comfort heating and cooling.  Depending on the type of system in place it may actually blow *heated* air into your server room during “morning warm up” cycles rather than cooling.  It can’t provide cooling to the server room while it is trying to heat the entire building.  I have seen server rooms that cause heating and cooling issues in the office space surrounding them as the server room is *always* calling for cooling even in the winter.
  • Normal air conditioners are designed to operate eight hours a day five days a week.  They are not designed for continuous 24×7 operation and so they will break more often when used in that fashion, and they may “freeze up” due to the fact that they must continue operating all the time (and don’t get a break to let ice melt from the coils) even in the winter when their are higher humidity levels.
  • Is providing appropriate cooling for a server room prohibitively expensive in your facility? (i.e. your in a high rise building)  Do you have somewhere to easily reject the waste heat outside from the computer room?
  • It is worth noting that while cooling for computing facilities is generally a big power hog, some modern colocation facilities are taking steps to be more environmentally friendly with their cooling by taking advantage of low outside air temperatures to cool servers without requiring the operation of refrigerant compressors. 

Communications

  • Where are the majority of your uses based?  If 100% of them are at your office and you are not concerned about other factors listed above, the best place for your servers may still be at your office as you don’t run the risk of being disconnected from them.  However, if you have the majority of your users at remote locations (including working from home) the argument for basing your servers in a colocation facility becomes much stronger. 
  • One of the most compelling arguments for colocation is being able to save on telecommunications costs.  You might be able to make it pay for itself based soley on your savings in telecom expenses.  There is more competition in a good colocation facility and so you can shop around.  Whereas at your office, you might only have the LEC (Local Exchange Carrier) available, in a good colocation facility in Portland you would have at least 4 on-net providers (if not more).  In many cases it makes sense to use one provider for access to offices in Washington, and another to get you to Boston.  Don’t let yourself get locked into using a single vendor!
  • Costs may be futher reduced by not having to pay for “local loop” access if you are in the same building as your network service providers routers.  Bringing in a DS-3 to a lumber mill out in a remote location is much more expensive than in downtown Portland.
  • In many cases, colocation facilities are located near telecommunciation hubs and as such, your chance of being cut off from your WAN or Internet provider is much lower.  Not to mention that colocation facilities are generally on fiber optic rings with redundant paths.  If you keep your servers at your office and make them available to the Internet (and other WAN locations) via copper T-1’s it is very easy for your T-1’s to be taken down by someone installing your neighbors POTS (Plain Old Telephone Service) line.  Note that even if you use T-1’s for connectivity within a colo facility, they are likely brought into the building across fiber rings which makes them much “cleaner” and more reliable.
  • If you are planning on having a DR facility in another city and you need a high speed link between them it will likely be much cheaper if you can bid this out to multiple ISP’s available at both datacenters.  High speed circuits between datacenters often cost less than circuits to customer premises.
  • One of the most important technological and financial factors in your decision making process needs to be:  “How will I get a high-speed connection from my office to my servers?”  This new cost must be factored in and the potential for the connection going down must be considered in your evaluation.  This is perhaps the most negative technical factor in the argument for moving your servers into a colo facility.

Benefits

  • You don’t have to staff in house for and spend time/effort/money on managing your physical infrastructure.  This is taken care of for you by the colocation provider.
  • You don’t have to spend capital on UPSs, generators, floor space, fire control, and cabinets.  Granted you do pay for this somehow on an ongoing basis to the colo provider.
  • Uptimes can be improved as you will have fewer power/cooling/communication failures and your servers will have fewer hardware problems as they will be operating in a more stabile environment.  (seriously, MTBF in a good colo facility will be increased)
  • You don’t have to spend your nights and weekends worrying about the air conditioning failing!  Even if there is an issue, it is someone else’s problem.  Also, you can go out of town and even if a server fails you may be able to have someone else go push buttons for you.  You could even have a vendors tech dispatched to the site to work on your server without you needing to be there
  • You can buy Internet access from the datacenter (make sure you negotiate well on this and make sure the price per megabit keeps falling over time).  If you get upstream Internet from a solid in-building provider (that has good quality upstreams), you may not even need to purchase Internet routers.  All you need is a firewall tier and switch tier.
  • You can grow incrementally in a colocation facility as your needs grow, or cut back as they shrink (assuming you are not under contract).  When you run your own fixed equipment you are most likely to not be running your UPSs near full load where they are most efficient.  In your own facility, once you run out of capacity you are artificially constrained as that next server will require you to make another major investment

Downsides

  • If you already have a server room with a dedicated cooling unit and enough power, etc.. it may simply not make financial sense to move.  You may want to pursue a split model for the things that have higher uptime requirements.
  • You need to pay for connectivy from your office to the datacenter.  This is a new cost that did not exist before.
  • Connectivity from the colo facility to your office could be interrupted, bringing work to a halt (where as previously even if cut off from the Internet, employees could still access the servers which were in-house).
  • Touching your servers requires traveling to the datacenter.  This travel time takes away from productivity (though getting out of the office once in a while can be nice!)
  • The monthly cost of a datacenter can cause sticker shock when looking at it simply from a “new cash cost” standpoint, however, this can be offset by savings on network circuits (if negotiated at the same time).  More importantly, you must consider the “total cost of ownership” of running your servers in house both in terms of hard and soft costs.

Example designs

Depending on your specific business model, there may be a few different reference designs you could choose from:

  • I have worked with a medical practice that we literally took their entire closet of servers one weekend and moved them into a colo facility.  All that was left was three network switches.  I would call this the “full colo” model.
  • If you have significant amounts of remote users but still have some in house users that require high speed access to file servers or application servers, you may want to consider a “split model” where most of your equipment (and WAN core) is located in the colo facility, but certain high-bandwidth servers stay on-site (like your file server).
  • An organization with all their employees in house may choose to keep all their corporate IT servers at the office, but put any Internet facing servers (like hosted applications or the corporate web site) at a datacenter.  I have implemented this model several times at various software companies in the past.  You must consider what the uptime requirements of your various services are.  Generally, Internet facing services will have a larger audience  and so they need a higher level of reliability than internal IT services.
  • Another variation on this may be to just keep your WAN routing equipment at a datacenter and then have a single backhaul connection to the office where the servers are located.  If you have many remote sites that get connectivity from different providers, it may make more sense to terminate them in a cabinet at a colo facility and then use a single metro ethernet connection back to the office.

Final thoughts

So your convinced?

Great!  Now check out my post on how to choose a colocation facility, and if you are based in Portland, check out my list of all the available facilities, plus the Google map I put together of where they are all located!  These resources also outline your telecommunications provider options.

As always, please email me with feedback or post a comment!

-Eric

Categories: Colocation, Network, Telecom Tags:

Sniffing SSL TLS Sessions with Wireshark

April 10th, 2009 3 comments

As a Network Engineer I am frequently called upon to assist in troubleshooting connectivity and integration issues with our customers by capturing packets and providing analysis.  These clients make SOAP calls to our system across the Internet (or private networks) using SSL (TLS) encryption.  The encryption makes troubleshooting application issues and response time issues very difficult when everything inside the TCP stream is encrypted garbage.

If your system uses SSL offloading capabilities in a load balancer, you can sniff the unencrypted packets between the load balancer and the web server, however, this will not let you see Internet caused TCP flow issues (retransmissions, out of order packets, etc…), and very often (depending on your architecture) the load balancer will change the source IP of the connection to it’s own which makes tracking down the interesting flow very difficult.

I learned a new trick this week that allows me to give Wireshark my private SSL keys (the ones loaded on the web server or load balancer), which will allow it to decrypt the SSL encryption and show me the payload in clear-text.

This is actually written up pretty well on the Wireshark support wiki so I won’t spend much time rehashing that here, however, I do have a few notes:

  • You need to get a copy of your rsa private key.  Often times this will mean exporting the certificate from the keystore (on Windows boxes, etc…), or it will mean grabbing the .key file from wherever it is stored on the file system in linux.
  • You will most likely need to use openssl (or some other tool) to extract just the private key from whatever format you have available (you don’t care about the public key, the signature, or the certificate chain).
  • In my case I had a .key file of just the private key, however, it was encrypted with a password and I needed to use openssl to convert it to an unencrypted form that Wireshark can use.
  • As described in the Wireshark documentation, you must specificy which private key files to use for which server IPs.  This is specified in the Wireshark SSL preferences.
  • To actually decode an SSL stream, find a packet that is part of an SSL exchange, right click on it, and select “Follow SSL Stream”.  This is effectively the same as “Follow TCP Stream”, except it does the decryption.
  • If it does not seem to be working, in the Wireshark SSL preferences, turn on logging to a text file and try again.  Note that this is quite verbose so once you get your issues worked out I would not leave it on.
  • Note that you must capture the entire SSL key exchange process, otherwise it will not be able to decrypt the payload.  I have seen flows where Wireshark would only decrypt the data in one direction or another.

So remember, it is incredibly important to protect your SSL keys!  All too often they are left laying around on numerous web servers that may be directly exposed to the Internet.  Make sure your keys are backed up, on a flash drive, in a safe, somewhere offline, and then mark all the keys on Windows servers as non-exportable.  On other platforms make sure the keys are at least encrypted with a password themselves.  Centralizing your keys on a load balancer (rather than a bunch of web servers) can decrease your risk exposure.

Update:  There is actually a good reference on this on Novells web site as well.  Apparently Wireshark will not decrypt the DHE protocol so if the browser and server are using that it will not work.  I am not sure if newer versions of Wireshark do already or will handle DHE.

-Eric

Categories: Network Tags:

How to choose a colocation facility

April 7th, 2009 4 comments

Choosing a colocation facility is one of the most important decisions an IT professional can make.  It will have repercussions for years down the road, as there is generally a contract term associated, and it becomes difficult/costly to move.  At the same time, unless you are a facilities professional, it is hard to tell the difference between the quality of one facility vs. that of another without knowing the right questions to ask.  I have developed this list in the hopes that it will be a reference to folks evaluating datacenter options.  This has been written using the assumption that you need a local datacenter rather than a DR facility (which can have very different needs), however, many of the same concepts will apply.

Location

  • When it comes right down to it, there are still certain things you have to do physically in person. You can’t run a network cable through SSH or RDP. Having a datacenter close by makes a huge difference, especially when you lose remote connectivity and must go push a button in an emergency (we all have done this once or twice). In general, the newer, more high-end, and redundant your equipment is, the less you should have to touch it in person. Things are getting much better with out of band remote access controllers, but sometimes being there is worth a lot. You can’t hear that fan making funny noises from your office.
  • Does the facility have good access to transportation such as freeways and airports? Are their hotels nearby if you will have out-of-town contractors visiting? How close to logistics depots are you for your vendor-of-choices parts, i.e. Cisco, Dell, HP, etc…
  • Does the facility have adequate parking that is close to the building, does it cost money? Is it somewhere you want to leave your car in the middle of the night while you are inside working?
  • Do you have line-of-sight to the datacenter? If you can manage to get a wireless link to your datacenter this can be an extremely cost-effective option for high speed connectivity. There is something to be said for controlling your own destiny when it comes to your connectivity rather than being at the mercy of a telecom provider. Will the facility allow you to put a wireless antenna on the roof and how much will they charge?

Staffing

  • Do they have on-site staff 24×7 to respond to emergency situations, to secure the facility, and to provide access when you forget/loose your badge (or have to stop by on your way home from the gym).
  • If they do not have staff on site 24×7, what is their on-call policy? How long would it take them to respond to a power failure, a UPS exploding, a transformer catching fire in the parking lot, an Internet outage, an FM-200 fire suppression system going off, an HVAC system failing, or any other major malady (yes I have had all of these things happen to me in facilities I have worked in, and I am still waiting for the day a fire sprinkler goes off or there is a real fire in a datacenter).
  • What level of professional services can they provide? Basic remote hands (please press the power button)? More advanced troubleshooting (help diagnose a failed network switch)? Or even managed services (i.e. they take care of backups).
  • How competent are their NOC engineers, facilities folks, etc… What quality of vendors do they use to do electrical work, HVAC maintenance, network cabling? This can be hard to tell, but there are lots of small clues you can pick up on.
  • Does their staff speak English fluently and without heavy accent? It is extremely difficult to communicate on the phone with someone in a loud datacenter environment about complex technical issues when both of you are having a hard time understanding each other. This dramatically slows down the troubleshooting process and increases the chance of error.

Connectivity options

  • Do they provide Internet access themselves, or do need to contract with other providers (ala the Pittock Block)? Having a datacenter provide Internet connectivity (if they give you a reasonable rate) can be more cost effective than running your own routers, with multiple ISPs (assuming you don’t have special routing needs that require it). You do need to make sure your datacenter has good upstream providers, good quality routers, and competent staff to run them. Be careful to ensure your provider can absorb moderate sized DDoS attacks without equipment failure or running out of bandwidth. You don’t want your neighbors online dating site to come under attack and impact your Internet connectivity.
  • Are they “carrier neutral”? Will they allow you to bring in your own connectivity (Internet/WAN)? Or do they want a piece of the pie of everything (i.e. resell you everything)? Are they charging your chosen provider ridiculous fees to have “right of entry” into the building (which drives up your end user costs).
  • What fiber providers do they have available? – The more connectivity options you have available, the harder bargain you can drive with providers to get the best deal possible. If you need connectivity to many different sites, it is likely that some sites will be cheaper/better/faster to connect with one provider, and others will be cheaper/better/faster with another. A good example would be TWTelecom and Integra Telecom here in Portland Oregon. They each have extensive fiber optic networks around the metro area, but if you are trying to get from Infinity Internet to various locations around town, whichever has fiber closer to your destination will have a price/technical advantage to provide you service.
  • Who is the local exchange carrier? You might need a POTS (Plain Old Telephone Service) line or two for paging access, etc…
  • What do they charge for cross connect fees? If you order a $300/mo T-1 are they going to charge you $100/mo cross connect fee for the two pairs of phone wire to get it to your cage/cabinet?

Power Infrastructure

  • What type of power grid design are they on? Radial or interconnected? On a Radial system (such as you would find out in the suburbs), if a car crashes into a pole, or a backhoe takes out a single conduit, power will be lost. In an interconnected system there are multiple “primary” feeds connected to multiple transformers which energize a “secondary” bus that actually feeds power to the facility. This type of design significantly reduces single points of failure and allows entire transformers to be taken offline for maintenance without service interruption.
  • Is the power grid in the area above ground or below ground? Above ground systems are susceptible to windstorms, lightning, trees, etc… Below ground systems fall prey to backhoes, horizontal boring machines, water penetration, etc… In general, below ground is going to be more reliable.
  • If on a Radial system, do they at least have multiple transformers (preferably off of separate primary feeds) even if they are not tied together on the secondary bus? Often you will see two transformers with each feeding a separate power distribution system within the datacenter.
  • Are the transformers well protected from vehicles in the parking lot?
  • What type of electrical transfer switches does the facility have to switch between main power and generator power? Are they capable of “make before break” operation when switching to the generator during test cycles or planned outages? Can they operate as “make before break” when switching back to grid power after an outage? This is important as the most likely time for a UPS to fail is during switching. If you can minimize the number of voltage-loss events it will reduce the likelihood of UPS failure.
  • How many generators does the facility have? If multiple, is their distribution system setup in such a way that you can get separate power feeds in your cage/rack that come from completely independent PDUs, UPSs, Generators, and Transformers? Just because a facility has multiple generators/UPSs/Transformers does not mean they are redundant for each other, they could just be there to increase capacity.
  • Does the facility regularly test their generators *with* load applied (either the actual datacenter load, or a test load)?
  • Has the facility designed and more importantly, *operated* their system such that a failure of one UPS/Transformer/Generator does not cause an overload on other parts of the distribution system.
  • Does the facility participate in programs that allow the power utility to remotely start the generators and switch the facility over to Generator power to reduce grid loading? While this is good for the overall health of the power grid (and possibly the environment), it can be a liability to your equipment at the datacenter since more power transfer events will be occurring.
  • How much fuel is stored on site – how many hours does that represent? Does the facility have contracts for emergency refueling services?
  • Can the generator be re-fueled easily from the road, or is it located on the roof?
  • What type of UPS systems do they have? How old are they? How often are the batteries tested and replaced? Can they take their UPS offline for maintenance without impacting customer power?
  • Can they provide you custom power feeds for equipment such as large Storage Area Networks or high power blade enclosures? (i.e. you need a 3 phase 208 volt 30 amp circuit)

Cooling

  • Do they use many direct expansion cooling units, or do they have a water/glycol loop with a cooling tower? Or do they even use chilled water? Each of these has it’s pros and cons, however, the multiple direct expansion model is very simple and redundant in that you likely have many individual units (it is not as energy efficient though). The trick is controlling the HVAC units to not “fight” each other, causing short-cycling on the compressors.
  • Are the cooling units designed for datacenter usage (running 24x7x365), with the ability to control humidity within reasonable levels, or are they made for office cooling applications with expected usage of 10 hours a day?
  • If the facility uses cooling towers for evaporative cooling processes, do they have on-site water storage to provide water during utility outages (such as after an earthquake). Are all parts of the cooling loop system redundant (including the control system).
  • Does the facility maintain and enforce hot/cold aisle design? This is becoming critical as power densities increase and power efficiency becomes critical.
  • Does the facility have an outside air exchange system to provide “free” cooling during the months of the year that outside air is of appropriate temperatures? While good for the environment, you must be careful about the outside air’s humidity as well as the dust/pollen that could come in with outside air. There is a dramatic difference between servers that have been in a quality datacenter for a few years, vs. ones with poor HVAC systems for a few years. I have removed servers from facilities before that have not gotten a speck of dust on them and others that are caked in black dust (depending on the facility they were in).
  • Is the entire cooling system on a single generator, or is it spread across multiple units for redundancy?

Cages/Racks

  • Does the facility provide Cages? Cabinets?  Or both?  These days most everything will fit in standard square hole cabinets, however, in some cases if you buy large enough equipment it might come with its own racks or as a freestanding unit that cannot go in cabinets provided by the facility. If you go with a cage you must carefully plan how much space you are going to need ahead of time. Adding additional cabinets as needed can be an effective growth strategy, though you must plan for network and SAN cabling between them.
  • If you get a cage (or just custom cabinets) make sure to agree upon who will bolt down your cabinets and how much it will cost.  This can be particularly tricky on raised floors to properly secure them in the event of an earthquake.  Any work done must be properly done to not throw dust into the air and to mitigate any potentially harmful vibrations that could impact running equipment.
  • One gotcha I have run into before is that some facilities cabinets are not deep enough for modern servers (specifically some Dell servers). I also have been shocked to find many facilities that still are leasing ancient cabinets that are telco-style with solid doors on them. Modern equipment requires front-to-back airflow, not bottom-to-top as was the old telco style. Also note that most network equipment is still uses side-to-side airflow and is best suited in two-post telecom racks (where possible) rather than four post server cabinets.
  • When selecting a colo facility make sure to specify exactly what type of cabinet you are expecting in the contract if they have multiple types available.
  • Modern cabinets have built in mounting holes/brackets for vertical mount PDU’s which are becoming the standard.  This allows you to use very short (think 2 foot) power cables to attach servers without excess slack.  They also do not take up usable rack space.
  • Modern cabinets should also have a way to cleanly route cables vertically (think about power cables, network cables, fiber SAN cables, etc…)
  • Does the facility provide PDU’s in the cabinets for you, or are you responsible to provide them yourself?  It is critical that your PDU’s have power meter displays on them as power in a datacenter is typically very expensive and so you want to load them up as much as possible for peak cost efficiency, while not risking tripping a circuit breaker (never load a circuit more than 80% it’s rated capacity – which means 16 amps on a 20 amp circuit, or 24 amps on a 30 amp circuit).  When plugging dual power supply servers into different circuits, ensure that in the event one circuit blows the other can handle the entire load without blowing.
  • What type of power plugs will they be delivering in your rack/cage?  I recommend locking plugs like an L5-20 or L5-30 to plug your PDU’s into (even though a NEMA recepticle can handle the current capacity in 20 amp circuits).  Also common these days is using 208 volt 30 amp circuits with an L6-30 receptacle.  Most everything manufactured in the last 5 years is capable of accepting 208 volt power.  Using the higher voltage allows you to have more equipment in a cabinet with fewer circuits which also means less PDU’s.

Fire suppression

  • Is the structure made of metal and concrete, or of wood?
  • Does it have traditional “wet-pipe” sprinklers, “dry-pipe” sprinklers, or “pre-action” sprinkers”? Or even none at all? If an electrician hits a sprinkler head with a ladder in either a “wet-pipe” or “dry-pipe” system, it will immediately release large amounts of water until the fire department shows up to turn it off. Pre-action systems require both a smoke sensing system to alarm, as well as heat setting off a sprinkler head in order to let water flow.
  • What type of fire detection system does the facility have? Standard smoke sensors, and/or VESDA sensors?
  • Does the facility have an inert gas fire suppression system such as FM-200, Inergen, or Halon? An inert gas system will deploy if two smoke sensors are deployed, and hopefully extinguish the fire before it can set off a water based system (typically still required to meet fire code). In reality though, I have never seen modern computer equipment really catch fire. Most of it does not burn very well (as long as you don’t store cardboard in the datacenter).
  • Who are your neighbors within the building? Are any of them high risk?
  • How old is the building’s fire suppression system? You might be in a suite within the building that has the latest and greatest fire control, but if the rest of the building has a simple fire panel from 1970 and no sprinklers, it could still burn to the ground. Upgrades to fire control systems are generally not required unless the building owner does a major renovation.

Physical facility

  • What is the risk of water damage to your equipment? Are you right below a poorly maintained roof? Are there non-pre-action sprinklers above you? Is there a domestic water pipe above your cage? Bathroom drains from the tenant above? Storm drain pipes from the roof? Condensate drains from the HVAC system? Cooling loop pipes? Note that if a fire sprinkler goes off several floors up it can seep down through cracks between floors you never knew existed into your equipment.
  • Is the facility located in a flood plain? Is it below ground level? There are places in Portland that have water mains large enough to cause localized flooding if they break.
  • Does the building have a convenient loading dock for receiving equipment? What is the largest equipment that will fit into the building and up the elevator? This is a problem in many older buildings.
  • How large is the space you are in (by volume) compared to the equipment load? If cooling was lost (say because the fire alarm inadvertently went off which shuts down all HVAC), how much thermal buffer is there to keep the temperature from rising too much until the system is reset?
  • Is there a grid of ceiling tiles above you? If so, it will probably fall down and create dust in an earthquake. I would rather see all of the piping and mechanical systems on the ceiling anyway rather than let them be hid above a ceiling grid.
  • Is the facility on a slab floor or raised floor? It is easier to effectively bolt things down to a slab floor for seismic purposes, but a raised floor can also conveniently provide space for electrical power and cables. It is becoming less feasible for cooling purposes however, since density is increasing so much.
  • What is the seismic rating of the facility? How much will it shake your equipment in an earthquake and will the building be damaged to the point that it is unsafe to continue operation?
  • Do they have requirements about what types of equipment you can put in the datacenter? i.e. if in a traditional telco facility certain ratings may be required.
  • Is the facility well kept and “clean”?  This can tell you a lot about the quality of the facility.  It is hard to tell if proper maintenance is being done at scheduled intervals on their power equipment, but if a facility can not simply keep cables managed properly it is a likely sign that they are skipping other non-visible things as well.

Creature comforts

  • Does the facility have comfortable areas for you to work while on-site (i.e. a conference room) or do you have to spend all your time on the cold/loud datacenter floor?
  • Do they provide “crash carts” (i.e. a portable keyboard, monitor, mouse) to utilize if you don’t have your own KVMs?
  • Do they have vending machines or refreshments when you need that late night pick-me-up?
  • Will they accept deliveries for you? Do they have someone at the facility during business hours? I find this to be *very* important.
  • How good is the cell phone coverage for the specific provider(s) you care about?
  • Do they have a guest wireless network you can jump on while you are working there to easily get Internet access without having to provide it yourself?

Security

  • How do they control access to the facility? Is it manned, or unmanned? If they have an access control system does it have biometric features?
  • Do they have security cameras? How long is the footage kept for?

Pricing

  • How much do they charge you per cabinet, or per square foot of space?
  • How much does power cost? Is it per provisioned circuit, or based on actual usage? What is their pricing model? Note that it is more and more common to need 208 volt circuits, or three phase circuits with modern blade enclosures and SANs. It is no longer just increments of 20 amp 110v circuits.
  • Will they provide you second power feeds at a reduced price if you are only going to be using them for failover? Note that these second feeds may cost them UPS, Generator, etc… capacity they must plan for, however, you won’t be utilizing electricity from them (which they must pay the utility company for) or loading their total feed capacity from the utility since they are just for redundancy.
  • Can you get price guarantees for future expansion (power costs, cabinet costs, etc…)?
  • Does the facility want to sell you completely managed services and as such makes colocation costs un-tenable?
  • Do they provide some amount of basic remote hands service hours each month? How much do they charge for professional services?
  • Does the facility provide service-level-agreements (SLAs) that have teeth? Frankly, I don’t put much faith in SLAs since usually they only involve a credit for the period of time service is unavailable. This generally is nothing in comparison the amount of money you lose when your datacenter goes down or your costs in man-hours to bring it back up.

Switching Costs

  • Once you move into a facility there can be significant (if not astronomical) switching costs. They may offer you a smoking deal to get you in the door, and then make it up by charging higher-than-market-rate for add on services down the road. Realize that you are inevitably likely to need more power down the road, and more bandwidth. Also realize that bandwidth costs fall steadily so you don’t want to get locked in for long term rates on telecommunications circuits. It is also possible in the long term for your needs to go down in the future as virtualization gets more popular, “cloud computing” becomes a reality, and computers become more efficient.
  • Contracts are normally in place to protect the provider, but they can also protect you. If you get a smoking deal on something, locking it in for a term commitment can be a good idea. It is reasonable for a provider to require a contract term as they do have significant capital and sales costs that they need to cover. Also, realize that the average lifespan of a datacenter is not all that long these days. A datacenter built 7 years ago has nowhere near the cooling capacity required in a modern datacenter.
  • Think about your growth pattern. You don’t want to be paying ahead of time for service you don’t need/use, but you also don’t want to get hit for huge incremental costs to add cabinets/power down the road. Contracts with “first right of refusal” clauses built into them (on additional space/capacity) are common.
  • Think about how difficult it will be for you to pick up and move at a later date. Some of the most “sticky” items are storage area networks. It might be easy to move a few servers at a time, but if you are all dependent on a single Storage Array, everything connected to it must move at once.
  • Telecommunication circuits also increase your “stickiness”. They are generally under term commitments and can be difficult to coordinate a move at a specific time. If you have a circuit from XO and move to a facility that does not have XO fiber, you might have to switch providers, or pay someone else for the local loop.
  • If you are purchasing Internet connectivity from your datacenter you are most likely being assigned IP addresses from their address space. When you move or change providers you will need to re-number. Depending on your network design and use cases, this might be easy, or an extremely difficult task.

Final Words

While there are numerous factors to consider, the reality is that there are likely a number of providers in town that can meet your needs successfully.  The reliability level of Portland’s power grid and of datacenter equipment is getting so high that we are really “chasing nines” to get ever so slightly more uptime (for dramatically higher cost).  For most organizations, being in a datacenter with only a single generator provides plenty of uptime.  Is that extra 0.009% uptime really worth it to go from “four nines” to “five nines”?  That is an increase of 47 minutes of uptime.  Is that worth doubling your costs?

Perhaps one of the most important aspects to your decision is the relationship you build with the owners, management, and staff of your colocation facility.  You want to have as much of a “partnership” as possible, and not merely a buyer/seller relationship.  Finding a facility with a long history of treating their customers well will increase your chances of success.

If you have any comments/questions feel free to post below, or shoot me an email.

-Eric

Categories: Colocation, Network, Telecom, Wireless Tags:

Map of all Portland Telecommunication and Colocation Facilities

April 6th, 2009 2 comments

Over the past couple weeks I have been working on something special to go along with my previously posted list of all the telecom and colocation facilities in town.

After *way* too many hours looking at satelite photos, I have created a Google map of every telecom and colocation facility that I know of in the Portland Metro Area.  This includes Colocation facilities, Verizon Central Offices, Qwest Central Offices, CLEC facilities, long-haul facilities, major fiber splice points, Wireless Provider CO’s, and even some notable private datacenters.

You can even download the .kml file and fly around the map in Google Earth (though Google Maps is probably a better interface for this).

I am *sure* that I am missing some facilities, so please, as always, email me with updates.

-Eric

Categories: Colocation, Network, Telecom Tags:

Qwest DSL Installation With Actiontec M1000

April 1st, 2009 13 comments

I did a DSL installation today of a Qwest DSL line in downtown Portland.  I have to say that I am pleasantly impressed with the quality of the DSL service, with the Actiontec M1000 modem, and with the Qwest tech who installed the service.  If you are in downtown Portland and need Internet access for a small business (or as in my case a guest network, etc…) it is hard to go wrong (the price is certainly right at $74.25/mo for 7 megabit).

Actiontec M1000 Front

Actiontec M1000 Front

I am surprised (more like shocked) that they do not have the 20 megabit service available to offer me in downtown Portland.  I am served out of the PTLDOR69 Qwest CO which is *the* downtown Central Office.  My building could literally fall over and land on the CO (I think I have more vertical feet of cable than I do horizontal distance in the road).

Traditionally, Qwest has deployed their DSL service with DSLAM racks that are attached to their ATM cloud.  You are then connected across that ATM cloud to an access concentrator at your ISP (in my case, Qwest is my ISP).  This is actually quite cool as there are a lot of different providers that you can go with (some with special features like content filtering, etc…), though, if your just looking for Internet access, it is hard to beat Qwest for speed and quality of their network.

C:\Users\eric.rosenberry>tracert 4.2.2.1

Tracing route to vnsc-pri.sys.gtei.net [4.2.2.1]
over a maximum of 30 hops:

  1    <1 ms    <1 ms    <1 ms  home.domain.actdsltmp [192.168.0.1]
  2    39 ms    38 ms    49 ms  ptld-dsl-gw29-221.ptld.qwest.net [207.225.84.221
]
  3    38 ms    38 ms    38 ms  ptld-agw1.inet.qwest.net [207.225.85.225]
  4    39 ms    37 ms    38 ms  por-core-01.inet.qwest.net [205.171.130.25]
  5    59 ms    59 ms    63 ms  sjp-brdr-03.inet.qwest.net [67.14.34.10]
  6    65 ms    59 ms    59 ms  63.146.27.26
  7    71 ms    65 ms    73 ms  vlan79.csw2.SanJose1.Level3.net [4.68.18.126]
  8    60 ms    60 ms    69 ms  ge-11-0.core1.SanJose1.Level3.net [4.68.123.38]

  9    60 ms    59 ms    60 ms  vnsc-pri.sys.gtei.net [4.2.2.1]

Trace complete.

C:\Users\eric.rosenberry>

Depending on who your ISP is, there are a number of different ways you might have to configure the DSL modem, though with Qwest’s internet service you historically have used “PPPoA” (Point to Point Protocol over ATM).  What I noticed about this installation that differs from previous installations is that they now by default configure your modem for “PPPoE” (Point to Point Protocol over Ethernet) when you go through the “Quick Setup”.  This makes me wonder if their new DSLAM racks that support the ADSL2+ (20 megabit) speeds no longer use ATM as their backend transport and as such require PPPoE instead of PPPoA.

This to me is a bit disappointing as when you utilize PPPoE your maximum packet size is cut down to 1492 bytes instead of your standard 1500 bytes, due to the PPP overhead.  There are certain circumstances in which having reduced MTU capability can bite you (i.e. PMTU discovery fails).  The good news is that they still seem to support PPPoA (at least on the DSLAM I am attached to).  I went ahead and set my modem to PPPoA and all was good.  1500 byte frames work perfectly.  It is possibly that there is some specific reason they want you to use PPPoE over PPPoA, but until I have issues or learn something new, I am sticking with PPPoA.

My ping times to my first hop gateway averaged 37ms which is excellent for a DSL line.  DSL lines introduce latency intentionally in order to avoid bursts of interference.  By spreading the datastream over time it is more likely to be able to recover from bit errors.  Compare that with 4ms round trip to my first hop on Verizon FiOS and 105ms round trip to my office Internet router on ClearWire (let’s say 70ms of that is to the first hop since I could not run a traceroute sucessfully to determine what my first hop was on Clear).

C:\Users\eric.rosenberry>ping 207.225.84.221 -t

Pinging 207.225.84.221 with 32 bytes of data:
Reply from 207.225.84.221: bytes=32 time=37ms TTL=254
Reply from 207.225.84.221: bytes=32 time=37ms TTL=254
Reply from 207.225.84.221: bytes=32 time=38ms TTL=254
Reply from 207.225.84.221: bytes=32 time=37ms TTL=254
Reply from 207.225.84.221: bytes=32 time=38ms TTL=254

Ping statistics for 207.225.84.221:
    Packets: Sent = 5, Received = 5, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 37ms, Maximum = 38ms, Average = 37ms
Control-C
^C
C:\Users\eric.rosenberry>

I am also happy to report that Qwest has no trouble issuing you a static IP (or a subnet of static IP’s), and even has a fully automated online system with which to do this.  You can only request static IP’s once the line is installed, but it is fast and pretty painless.  You just need to know the username and password that the modem uses to “dial up” to the Internet when it connects so that you can login to the qwest.net control panel.  It is even possible to set custom reverse DNS entries (which is necessary if you want to run a mail server on the connection).  You do have to pay for the static IP’s (the rate varies depending on how many you need) and there is a one time setup fee as well (which is kind of lame considering the process is fully automated).

It is worth noting that if you get a single static IP, it just simply set’s Qwests access concentrator to always assign your modem the same IP address (no modem re-configuration required).  You can then map ports though to servers on the inside to make use of that static IP, or you can set your modem to bridging mode and run PPPoE on some device (server or another router/NAT device).

If you get a block of static IP’s, the modem still does PPPoA/PPPoE to the Qwest network, though it runs the PPP session in “un-numbered” mode and binds a real Internet IP to the LAN side of the modem.  The modem must be set into routing mode in this case.

My modem (the Actiontec M1000) came with firmware version QA02-3.60.3.0.8.2-M1000 which was not the latest version.  Since the device was brand new I wanted to start out with the latest-and-greatest so I upgraded to QA02.5-3.60.3.0.8.6-M1000 (available on www.qwest.net since it runs custom Qwest firmware).  Kudos to Actiontec and Qwest to a very smooth upgrade process (the utility is really simple, though I guess I would prefer just a web form on the admin page).  I should also mention that the admin interface of this modem is very nice looking and extremely fast for being an embedded device.

Actiontec M1000 Back

Actiontec M1000 Back

The quick setup list for deploying a Qwest DSL line using a block of static IP’s is as follows:

  1. Upgrade firmware
  2. Restore factory defaults
  3. Run the Quick Setup to set username and password for PPP dial up
  4. Switch to PPPoA from PPPoE
  5. Set your modem username and password to prevent un-authorized access
  6. Register for block of static IP’s on www.qwest.net
  7. Set modem up for static IP mode
  8. Shut off DHCP server
  9. Shut off NAT mode
  10. Reboot to ensure your settings took effect

It is also worth noting that there are some very cool things you could do with DSL lines attached to the Qwest ATM cloud.  Say you are a company that needs moderate speed WAN connectivity to a lot of remote locations (say for retail POS applications or for Citrix).  You could plug a couple T-1’s (or a fractional DS-3) into the Qwest ATM cloud in Portland, and then turn up DSL lines all over the LATA for very small per-site costs (this is what they call their MegaHost product).  A 256k DSL line is as low as $25/mo if I remember correctly (or even $15 a month in a residence)!

To test your Qwest DSL line the best speedtest is going to be one residing on Qwests network: http://speedtest.qwest.net  I find the speedtests from DSLReports and others to be frequently too busy to give an accurate reading.

The bottom line is that Qwest has done an excellent job making their DSL service shine (since I suppose they don’t have the money to do fiber like Verizon).  I find it funny that they advertise it as Fiber Optic Internet (when really it is just fiber to the node a.k.a. DSLAM rack).  They don’t even call it DSL anymore due to all the negative PR around DSL (thanks Comcast).  I am getting a full 7 megabit at my office, though it might be disappointing if you are a long way from the CO, are in an old neighborhood with poor quality cable, or if you are provisioned off a remote DSLAM rack that has slow backhaul links to the CO.

-Eric

Categories: Network, Telecom Tags: