Dark Fiber in the Metro for Enterprises

May 17th, 2016 No comments

As competition increases in the metro for fiber transport, exciting new options are becoming available for “Clueful Enterprises” ™ from forward thinking non-traditional telco’s.

Fiber to the tower is driving huge buildouts in fiber networks to areas that were previously infeasible economically. That coupled with high count fiber cables (288 to 864 count) being deployed cost effectively means that fiber is not such a scarce resource.

The trend I am seeing is for Enterprises to lease a pair of dark fiber from their offices back to a datacenter within the same metro area (usually on both sides of a ring for redundancy) and then to purchase all other telecom services out of the datacenter where a competitive market exists. (i.e. IP Transit, MPLS, SIP trunks, etc…)

Once you have dark fiber on a metro path, you can “light it” with as much bandwidth as necessary and upgrade capacity over time when needed without having to re-contract in a disadvantaged negotiating position (due to remaining term length). Optics to push 1 gigabit down 20km of fiber can be purchased for as little as $7 each (you read that correctly!). 10 gigabit over 100km can be as low as $350/each. Stepping up modestly in cost, you can push 40x 10g waves over that same pair of fiber for 400 gigabit total (on each side of the ring).

One thing that makes this kind of deployment so tenable is the lack of need for specialized transport equipment. You can now put long distance (1550nm) and even DWDM optics in regular old desktop switching platforms. The only equipment needed at an office building is now basic network switches – 100% of the servers can be located in datacenters or in the cloud.

When making use of dark fiber (with diverse paths) there are actually significantly less single points of failure than if that exact same fiber was “lit” with Ethernet Private Line services from a provider. In a classic EPL scenario the CPE device is a single point of failure, the client facing optic on that CPE device is a single point of failure, and the aggregation router back at some hub site is a single point of failure. Additionally, you are dependent on power reliability at that hub site, plus depending on ring architecture, potentially dual failure scenarios with many other customer premises on either side of the ring (i.e. during a widespread power outage).

With dark fiber, no amount of typos from someone in the NOC, software upgrades, or automation gone wrong can take down your connectivity (your fiber has to actually be physically broken to make it stop working).

While many see dark fiber deals as a negative for telecom companies, I actually see many upsides:

  1. Contract terms are longer – I advise enterprises to co-terminate fiber leases with their office space leases – typically 5-7 years vs. a maximum of 3yrs for lit services.
  2. Deal size is typically larger as Enterprises do see the value in having effectively unlimited bandwidth that allows them to access other services in the datacenter where market economies exist. This architecture can allow the elimination of traditional “server rooms” which are large portions of many Tenant Improvement budgets and take away expensive office square footage.
  3. CAPEX and OPEX for equipment is ZERO. Splicing costs may be more than for a traditional lit services deal, but that is easily offset by the longer term length and the lack of need for expensive equipment.
  4. Building right of entry agreements may be easier to come by and less costly when you have no powered equipment on-site that requires 24×7 access.
  5. Competition is lower when offering dark fiber solutions. In a traditional lit services transport deal there are often 5-6 carriers available, but due to fiber and policy constraints taking it to a dark fiber deal may reduce that to 2-3 options (or less!).


Categories: Uncategorized Tags:

Republic Wireless Service and LG Optimus Review

November 26th, 2011 11 comments

When I first heard of Republic Wireless a few weeks ago I just could not resist ordering a phone and service to try out.  I have long felt that the phone industry (both wired and wireless) has been devoid of innovation and ready for disruption.  At $99 for the phone and $19 a month for unlimited service (with no contract), it was worth it to me just as a technology experiment.

On their launch day the site was a bit busy and not able to place orders for some time, however, they handled it with well thought out error messages.  I eventually got though and placed an order.  They were very upfront about the fact that it may take some time to deliver the product (which was no problem for me).  One slightly annoying thing was that they did charge me for the product well before shipping it.

Once they did ship it they sent a note with tracking information which was great.  It arrived today in good order.

Initial thoughts

It came in a small box inside a padded FexEx mailer which was just perfect, nothing more needed.  It is clear though that they are still quite a young company from the first appearances.  The card telling me my phone number was hand-written.  😉  The phone and the box it came in is 100% Sprint branded, except for a sticker with the Republic logo on the box.  They clearly have not had enough time to get a hardware manufacturer to spin them devices with 100% Republic branding.

The phone

Never having seen an LG Optimus before I was very happy with what I got for $99.  Build quality seems excellent.  I like the case coating, the buttons, and just the general form factor.  It even has a camera hard button which I miss greatly on my Droid Bionic!  It does seem tiny though compared to the Bionic.  The down sides are that the screen is not huge (which makes typing noticeably more difficult), it does not have a blinking light to tell you when a message is waiting (I am constantly checking for that light on my Droid), and it does not have a camera flash.

For a $99 phone, the screen is excellent, and the processor seems fast enough.  I was very happy to see that they appear to be using the stock Android UI.  My first Android was the Droid V1 (which was pretty stock), and now I have the Droid Bionic with MotoBlur (which I am starting to hate).  Republic Wireless even seems to have avoided installing any kind of crapware on the phone whatsoever! (ironically they do have a “Dev Tools” app installed which I am wondering if it is a mistake as it does not seem like something intended for your average end user)  They don’t even have any icons setup on your home screen by default (which is a bit weird, yet somehow a bit cool.  It is a clean slate to work with.

For what it is worth, the camera quality seems pretty good from the two pictures I have taken so far.

The phone came pre-activated and ready to rock (if I remember correctly, the battery was even in it already).  The only thing it wanted was for me to attach it to a WiFi network first thing.

The service

Alright, so now for the real test- Can this thing make phone calls on WiFi?  I fire up a call to my employers auto-attendant and sure enough, it works!  Some quick tests later and I have a few initial thoughts.  The voice quality via WiFi is a bit quiet and tinny compared to calls placed on the Sprint network.  This is a bit disappointing as I have a Cisco enterprise wireless access point within 10 feet of where I was testing, and a hugely overbuilt Vyatta box as my gateway, on a 35/35 Frontier FiOS connection.  They need to take the opportunity to leverage some of the major benefits of *not* being on a cellular network (namely the fact that potentially much more bandwidth is available).  Perhaps some of this perceived quality issue is just because the codec is “different” from what I am normally used to (not necessarily worse) and maybe they can make it better by turning up the default volume or with modifying equalization settings.

So next up was a test of the two extra buttons that show up on the call screen during a WiFi call.  One of them places the call on hold.  This is novel for a cell phone, and it is an exciting indicator of features to come in the future.  When you don’t have to play in the world with the limits imposed by the cell base station manufacturers you can actually implement compelling features!  The caller I tried this with did tell me the hold music was very feint.  Perhaps customizable music could be a future feature.  😉

The other button was to transfer the call to the Cell network (presumably to easily work around the WiFi connection flaking out).  This is a great feature, but its implementation was the first indication that things are not yet as well integrated as I would like.  Pressing the button hangs up the call, and places a new call to the same person over the Sprint network.  I would have thought the smart solution here would have been either for the phone to dial out to the Republic Wireless servers via Sprint, and then have them cross connect the existing call to that phone over Sprint instead of via WiFi (i.e. this would avoid having the called party have to answer another call), or to have the Republic Wireless servers call the phone on it’s Sprint number and reconnect the in-progress call.  I am hoping they can make this more seamless in the future.

Now for the real test- What happens when I start a call on WiFi and then walk out of range of the access point?  Well, the answer is that as you might expect, the phone call does drop out for a bit, though after a few seconds, the phone automatically placed a new call outbound over Sprint to the party I had been speaking with.  I should note that anecdotally, I made it quite a ways from the house before it dropped out (I only tested one call).

The technical stuff

So being a network geek, I needed to know what this thing was going out and connecting to on the Internet.  I had noticed on calls that the latency was not that great.  I could tell that conversations were not as real time as I have come to expect on normal wireless calls.  So I tracked down the phone’s IP in the DHCP leases table, and fired up tcpdump on my Vyatta box.  It very quickly became apparent where Republic Wireless is hosted.  My phone is connecting to nodes in the AWS US-East compute zone.

Running the Republic Wireless control software in the “Cloud” makes sense for a company that is expecting potentially massive growth, however, I was shocked to discover that not just control connections were going through Amazon’s cloud.  They apparently are also running the VoIP calls through there as well.  This immediately raised eyebrows with me as I don’t feel that the Amazon the shared infrastructure environment is appropriate yet for VoIP traffic.  Perhaps I am wrong, or they have a deal with Amazon that puts them on dedicated network infrastructure, but the thing with VoIP is that it is massively sensitive to packet loss, latency, and jitter.  These things are hard enough to get right when you have dedicated hardware that is not shared.

The Amazon cloud node my phone was communicating with tonight was 85ms away (round trip) from my home (here in Oregon), under good network conditions.  This probably explains a portion of the large delay period that calls on WiFi were experiencing for me.  I think the VoIP encoding they are using is introducing more delay than I would like to see in order to reduce call dropouts due to flaky network connections.

Being a network architect, I think they need to have the phones connecting back to edge devices that are deployed on dedicated hardware in major peering cities in order to reduce latency as much as possible.  This product will live or die based on the audio quality and the seamlessness of the solution.  They should have nodes in Seattle and also in California to cover the West Coast.

One other thing that I should note is that I don’t think SMS messages are going over WiFi yet.  I suspect they are going out the normal Sprint radio as I don’t clearly see packets on the network associated with when I am sending text messages (though I may be wrong about this – I have not yet done a full protocol decode).  Perhaps that will be a future blog post.


All in all, I love what these guys are doing.  I am a cheerleader.  I think they are going about it the right way, but the product is still very very young…  Could I use this as my full time phone?  Probably not, as I require rock-solid communication at all times for my job.  Would I buy this for my kids?  Absolutely!

Am I seriously considering getting this for my parents that don’t currently have data phones?  Yes indeed!

I am looking forward to seeing how this works out…  $19 a month is almost too good to be true.  I wonder how long it is before these guys get bought out by one of the big boys wanting the technology (or to kill them off)?


P.S. If anyone from Republic reads this, feel free to reach out.  I am always willing to provide constructive feedback!

Categories: Network, Telecom, Wireless Tags:

Motorola Droid Bionic Review

September 9th, 2011 2 comments

So my wait for a new phone finally came to an end today as I was able to snag a new Droid Bionic on Verizon Wireless.  I had been increasingly frustrated with the slowness of my Droid version 1 phone that seemed to get slower and slower.  A special thanks to my corporate data rep who went the extra mile to make sure I got one on launch day after having my morning wasted by the wireless kiosk idiots at Costco who claimed to have them in stock the evening before when I stopped by.

After only having it for about four hours I must say, thus far I am impressed.

Things I like:

  1. It is fast (responsiveness wise).  It lives up to my expectations so far.
  2. It is fast (network wise). Verizon Wireless LTE is absolutely amazing (I am in Beaverton, OR currently). It is also fast on my 802.11g wireless via Frontier FiOS.  Using Google Maps is no longer frustrating.
  3. The screen seems very good. (not like strikingly great, but certainly good)
  4. The touch screen seems more accurate than my Droid V1.
  5. The OS is Gingerbread of course (which I did not have on my Droid V1)
  6. The camera seems to be better quality, though it’s odd that the app does not rotate it’s menu’s.  Also- The location tagging outputs number coded errors on the screen while taking pictures which seems like poor spit and polish (though it’s way faster than the Droid V1 so I am happy)
  7. The form factor works well for me thus far.  I think it is lighter than my Droid V1 and certainly much much thinner.  I tend to carry it in my front shirt pocket and so while it does stick out the top a bit, I think the reduced weight does not make my shirt look funny as much.

Things I don’t like:

  1. Any kind of crap installed by the Carrier (i.e. Verizon Wireless) – vCast, their paid Navigation app, their paid Visual Voicemail app, etc…  cMon people…  Google does most of these way better than you and they made them free.  Deal with it and go on with life.
  2. Just about anything written by Motorola (i.e. Motoblur).  I think that having all the phone manufacturers write their own UI’s is stupid in many ways.  I find that they don’t do it any better than Google, and it just makes software upgrade cycles slower, and user training a pain due to the differences.  I understand that all the manufacturers don’t want to become a commodity (and this is a way to provide unique “”value””), but as an educated consumer, I would 100% of the time buy a vanilla Android device over one with aftermarket UI’s any day (if one were available).
  3. Verizon has some “backup” application for local contacts.  That’s dumb.  Google provides that feature for all my contacts and settings.  Perhaps it makes sense in the context of moving from “feature phones” over to Android based smart-phones.
  4. I have only made one or two calls so far and quality was good- Though I did get some feeling that the earpiece does not get incredibly loud, and it started clipping a bit at max volume.  This may not bode well for use in datacenters (TBD).
  5. So far the car mount kit I bought seems a bit flaky at detecting that the phone should launch the handsfree app as I bought a rubber cover for the phone.  It’s supposed to handle the covers OK after removing an insert.
  6. Some of the rumors had made me think it would support GSM for international roaming.  That would have been very nice.  Also, it sounded like wireless charging was a default feature, but instead, it sounds like it requires a special backplate that is not yet available.
  7. All the notifications and ringtones come set to incredibly annoying “DROID” sounds.  It’s totally a branding thing, I get it, but it is awful.

Things I am worried about:

  1. As mentioned before, we will see how well it works in the datacenter audio wise (though with the extra mic’s for noise canceling presumably, perhaps it will do well for the remote end caller)
  2. Battery life.  It get’s warm under heavy use- Which can’t bode well for battery consumption.  Also- It would appear that Verizon has only installed LTE on a patchwork of the towers in the metro area and skipped a bunch in between.  Since LTE propagates well at 700mhz they can somewhat get away with this (as LTE device density is pretty low right now) – Though I am sure this is a massive contributor to battery drain!  i.e. your phone must communicate with a tower farther away because your closest tower has no LTE panels/sectors/gear…

Overall this phone is a great win for me, though I suspect I will be unhappy with it before 24 months is up at the current innovation pace.  If for no other reason than for the fact that more efficient LTE chipsets will come out.

If your looking for a new phone on Verizon Wireless right now (and your not deeply entrenched in the Apple ecosystem) I don’t think there is really any question that the Droid Bionic is the way to go.


Categories: Uncategorized Tags:

A Cassandra Hardware Stack – Dell C1100’s – OCZ Vertex 2 SSD’s with Sandforce – Arista 7048’s

October 24th, 2010 15 comments

Over the past nine months I have delved into the world of providing hardware to support our applications teams use of the Cassandra datastore.  This has turned out to be a somewhat unique challenge as the platform is rapidly evolving along with our use case of the platform.  Cassandra is a very different beast compared to your traditional RDBMS (as you would expect).

I absolutely love the fact that Cassandra has a clear scaling path to allow massive datasets and it runs on very cheap commodity hardware with local storage.  It is built with the expectation of underlying hardware failure. This is wonderful from an operations perspective as it means I can buy extremely cheap “consumer grade” hardware without having to buy “enterprise grade” (whatever that really means) servers and storage for $$$.

Before I dive into my findings, I should point out that this is not one size fits all solution as it greatly depends on what your dataset looks like and what your read/write patterns are.  Our dataset happens to be billions of exceedingly small records.  This means we do an incredible amount of random read i/o.  Your milage may vary depending on what you do with it.

Finding the optimal node size

As usual, spec’ing out hardware for a given application is a matter of balancing five variables:

  • CPU capacity (taking into account the single/multi threaded aspects of the application)
  • RAM capacity (how much working space does the application need and how much cache is optimal)
  • Disk capacity (actual disk storage space)
  • Disk i/o performance (the number of read and write requests per second that can be handled)
  • Network capacity (how much bandwidth is needed)

If you run into a bottleneck on any of these five items, any additional capacity that is available within the other four categories is wasted.  The procedure to determine optimal is as follows:

  1. Determine which of the five variables is going to be your limiting factor through performance testing
  2. Research the most cost-effective price/performance point for the limiting variable
  3. Spec out hardware to meet the other four variables needs relative to the bottleneck

Note that this is a somewhat iterative process as (for example) it may make sense to buy a CPU significantly beyond the price/performance sweet spot (when looking at CPU pricing in a vacuum) as paying for that higher end CPU may allow you to make much better use of the other pieces of the system that would otherwise sit idle.  I am not suggesting that most Cassandra shops will be CPU bound, but this is just an example.

There is also fuzziness in this process as there can be some interdependencies between the variables (i.e. increasing system RAM can reduce disk i/o needs due to increased caching).

Nehalem platforms

If you are at all familiar with the current server-platform market then you know that Nehalem microarchitecture (you need to read the Wikipedia article) based servers are the platform of choice today with the Westmere processors being the current revision within that series.  In-general, the most cost effective solution when scaling large systems out on Nehalem platforms is to go with dual processor machines as this gives you twice the amount of processing power and system memory without doubling your costs (i.e. you still only need one motherboard, power supplies, etc…)

All of the major OEMs have structured their mainline platforms around this dual processor model.  Note that there ARE situations where dual processors don’t make sense including:

  1. Single threaded applications that can not make use of all those cores and that do NOT need the additional memory capacity.
  2. Applications that are purely Disk i/o or network bound where the additional CPU and memory would be wasted (perhaps a file server).
  3. Applications that need less than a “full” machine (i.e. your DNS/DHCP servers).

In general, I don’t think Cassandra falls into these special use case scenarios, unless your just completely i/o bound or network bound and can’t solve them in another way other than adding more nodes.  You may need that second processor however just for the memory controllers it contains (i.e. it gives you twice as many ram slots).  If you are i/o bound you can consider SSD’s, and if you are network bound you can leverage 10 gigabit network interfaces.

In looking at platforms to run Cassandra on, we wanted a vanilla Nehalem platform to run on, without too many bells and whistles.  If you drink the Cassandra kool-aid you will let Cassandra handle all the reliability needs and purchase hardware without node level fault tolerance (i.e. disk RAID).  This means putting disks in a RAID 0 (for optimal speed and capacity) but then letting the fact that Cassandra can store multiple copies of the data across other nodes handle fault recovery.  We are currently using linux kernel RAID, but may also test hardware RAID 0 that is available on the platform we ended up choosing.

It is shocking to me to see how many OEM’s have come up with platforms that do not have equal numbers of RAM slots per memory channel.  News flash folks- In Nehalem it is critical to install memory in equal sets of 3 (or 6 for dual processor) in order to take advantage of memory interleaving.  Every server manufactured should have a number of memory slots divisible by three as the current crop of processors has three memory controllers per processor (this may change in the next generation of processors).

A note about chipsets – The Intel 5500 vs. 5520 – The main difference here is just in the number of PCIe paths the chipset provides.  They should both provide equivalent performance.  The decision point here is made by your OEM and is just based on the number of PCI devices your platform supports.

Our platform choice

In looking at platform options, the following options were lead contenders (there are of course many other possible options, but most are too focused on the enterprise market with features we do not need that just drive costs up):

At first we were looking at 1U machines with 4x 3.5 inch bays (and in fact bought some C1100’s in this configuration) though it turned out that Cassandra was extremely i/o bound which made a small number of large SATA disks impractical.  Once we realized we were going to need a larger number of drives we decided to go with 1U platforms that supported 2.5 inch bays as we can put eight to ten 2.5 inch drives in a 1U to give us more spindles (if we go with disks), or more SSD’s (for the disk capacity rather than iops) if we go with SSD’s.  It’s also worth noting that the 2.5 inch SATA drives draw a lot less power than the 3.5 inch SATA disks of the same capacity.

We ended up going with the Dell C1100 platforms (over the Supermicro offering) as we already had purchasing relationships with Dell and they have a proven track record of being able to support systems throughout a lifecycle (provide “like” replacement parts, etc…), though on this particular order they fell down in numerous ways (mostly related to their recent outsourcing of production to Mexico) which has caused us to re-evaluate future purchasing plans.  In the end, the C1100’s have worked out extremely well thus far, but the speed-bumps along the way were painful.  We have not physically tested any Supermicro offerings so perhaps they have as bad (or worse) issues as well.

What we like:

  • Inexpensive platform
  • Well-targeted to our needs
    • Have 18 RAM slots (only populating 12 of them right now with 4 gig sticks)
    • Dual Intel nic’s not Broadcom
    • They include out of band controllers
    • Dual power supplies available (this is the only “redundancy” piece we do purchase)
  • Low power consumption
  • Quiet

What we don’t like:

  • Lead time issues
  • Rails with clips that easily break
  • Servers arriving DOA
  • Using a SAS expander to give 10 bays vs only 8 (we would have rathered the option to only use 8 bays)
  • They don’t give us the empty drive sleds to add disks later -> force you to purchase from them at astronomical rates
  • The 2 foot IEC to IEC power cords they sent us were only rated to 125 volts (we use 208 volt exclusively)
  • Lack of MLC SSD option from factory

OCZ Technology Vertex 2 MLC SSD’s

After purchasing our first round of Dell C1100’s with four SATA disks (one for boot/commit and three in a RAID 0 for data) we rapidly discovered they were EXTREMELY i/o bound.  Cassandra does an extremely poor job bringing pertinent data into memory and keeping it there (across a four node cluster we had nearly 200 gigs of RAM as each node has 48 gigs).  Things like the fact that Cassandra invalidates cache for any data it writes to disk (rather than writing the data into the cache) make it extremely painful.  Cassandra also (in .6) will do a read on all three nodes (assuming your data is replicated three places) in order to do a read-repair, even if the read factor is only set to one.  This puts extremely high load on the disks across the cluster in aggregate.  I believe in .7 you will be able to tune this down to a more reasonable level.

Our solution was to swap the 1TB SATA disks with 240 gig OCZ Vertex 2 MLC SSD’s which are based on the Sandforce controller.  Now normally I would not consider using “consumer grade” MLC SSD’s for an OLTP type application, however, Cassandra is VERY unique in that it NEVER does random write i/o operations and instead does everything with large sequential i/o.  This is a huge deal because with MLC SSD’s, random writes can rapidly kill the device as writing into the MLC cells can only be done sequentially and editing any data requires wiping the entire cell and re-writing it.

The Sandforce controller does an excellent job of managing where data is actually placed on the SSD media (it has more space available than what is made available to the O/S so that it can shift where things actually get written).  By playing games with how data is written the Sandforce controller is supposed to dramatically improve the lifespan of MLC SSD’s.  We will see how it works out over time.  😉

It is unfortunate that Dell does not have an MLC SSD offering, so we ended up buying small SATA disks in order to get the drive sleds, and then going direct to OCZ Technology to buy a ton of their SSD’s.  I must say, I have been very happy with OCZ and I am happy to provide contact info if you shoot me an email.  I do understand the hesitation Dell has with selling MLC SSD’s, as Cassandra is a very unique use-case (only large sequential writes) and a lot of workloads would probably kill the drives rapidly.

It is also worth noting that our first batch of C1100’s with the 3.5 inch drives were using the onboard Intel ICH10 controller (which has 6 ports), but the second batch of C1100’s with the 10 2.5 inch bays are using an LSI 2008 controller (available on the Dell C1100) with a SAS expander board (since the LSI 2008 only has 8 channels).  We are seeing *much* better performance with the LSI 2008 controllers, though that may be simply due to us not having the disks tuned properly on the ICH10 (using native command queueing, DMA mode, etc…) in CentOS 5.5.  The OCZ Sandforce based drives are massively fast.  😉

If you are going to have any decent number of machines in your Cassandra cluster I highly recommend keeping spare parts on hand and then just purchasing the slow-boat maintenance contracts (next business day).  You *will* loose machines from the cluster due to disk failures, etc (especially since we are using inexpensive parts)…  It is much easier to troubleshoot when you can go swap out parts as needed and then follow up after the fact to get the replacement parts.


Since Cassandra is a distributed data store it puts a lot more load on the network than say monolithic applications like Oracle that generally have all their data backended on FibreChannel SAN’s.  Particular care must be taken in network design to ensure you don’t have horrible bottlenecks.  In our case, our existing network switches did not have enough available ports and their architecture is 8:1 over-subscribed on each gigabit port, which simply would not do.  After much investigation, we decided to go with Arista 7048 series switches.

The Arista 7048 switches are 1U, 48 port copper 1 gig, and 4 ports of 10 gig SFP+.  This is the same form factor of the Cisco 4948E switches.  This form factor is excellent for top-of-rack switching as it provides fully meshed 1 gig connectivity to the servers with 40 gigabit uplink capacity to the core.  While the Arista product offering is not as well baked as the Cisco offering (they are rapidly implementing features still), they do have one revolutionary feature that Cisco does not have called MLAG.

MLAG stands for “Multi-Chassis Link Aggregation“.  It allows you to physically plug your sever into two separate Arista switches and run LACP between the server and the switches as if both ports were connected to the same switch.  This allows you to use *both* ports in a non-blocking mode giving you full access to the 2 gigabits of bandwidth while still having fault-tolerance in the event a switch fails (of course you would drop down to only 1 gig of capacity).  We are using this for *all* of our hosts now (using the linux kernel bonding driver) and indeed it works very well.

MLAG also allows you to uplink your switches back to the core in such a way as to keep all interfaces in a forwarding state (i.e. no spanning-tree blocked ports).  This is another great feature, though I do need to point out a couple of downsides to MLAG:

  1. You still have to do all your capacity planning as if you are in a “failed” state.  It’s nice to have that extra capacity in case of unexpected conditions, but you can’t count on it if you want to always be fully functional even in the event of a failure.
  2. When running MLAG one of the switches is the “master” that handles LACP negotiation and spanning-tree for the pair of switches.  If there is a software fault in that switch it is very possible that it would take down both paths to your severs (in theory the switches can fall back to independent operation, but we are dealing with *software* here).

It is worth noting that we did not go with 10 gig NIC’s and switches as it does not seem necessary yet with our workload and 10 gig is not quite ready for prime time yet (switches are very expensive, the phy’s draw a lot of power, and cabling is still “weird” – either Coax or Fiber or short distance twisted pair over CAT6, or CAT7 / 7a over 100 meters).  I would probably consider going with a server platform that had four 1 gig NIC’s still before going to 10 gig.  As of yet I have not seen any Cassandra operations take over 100 megabit of network bandwidth (though my graphs are all heavily averaged down so take that with a grain of salt).


So to recap, we came up with the following:

  • Dell C1100’s – 10x 2.5 inch chassis with dual power supplies
  • Dual 2.4 ghz E5620 processors
  • 12 sticks of 4 gig 1066mhz memory for a total of 48 gigs per node (this processor only supports 1066mhz memory)
  • 1x 2.5 inch 500 gig SATA disk for boot / commit
  • 3x 2.5 inch OCZ Vertex 2 MLC SSD’s
  • The LSI 2008 optional RAID controller (running in JBOD mode, using Linux Kernel RAID)
  • Dual on-board Intel NIC’s (no 10 gig NIC’s, though it is an option)
  • Pairs of Arista 7048 switches using MLAG and LACP to the hosts


  • We did not evaluate the low power processors, they may have made sense for Cassandra, but we did not have the time to look into the
  • We just had our Cassandra cluster loose it’s first disk and the data filesystem went read-only on one node, but the Cassandra process continued on running and processing requests.  I am surprised by this as I am not sure what state the node was in (what was it doing with writes when it came time to write out the memtables?).  We manually killed the Cassandra process on the node.
  • The Dell C1100’s did not come set by default in NUMA mode in the BIOS.  CentOS 5.5 supports this and so we turned it on.  I am not sure how much (if any) performance impact this has on Cassandra.


This is still a rapidly evolving space so I am sure my opinions will change here in a few months, but I wanted to get some of my findings out there for others to make use of.  This solution is most certainly not the optimal solution for everyone (and in fact, it remains to be seen if is the optimal solution for us), but hopefully it is a useful datapoint for others that are headed down the same path.

Please feel free as always to post questions below that you feel may be useful to others and I will attempt to answer them, or email me if you want contact information for any of the vendors mentioned above.


Categories: Cassandra, Dell, Network, Systems Tags:

Dynect migration from UltraDNS follow-up

October 2nd, 2010 No comments

Several months back I migrated a very high volume site from Neustar UltraDNS over to Dyn’s Dynect service.  I am following up with another post because I believe it is important for folks in the IT community to share information (good or bad) about vendors and technology that they use.  All too often the truth is obscured by NDA’s, marketing agreements, etc…

So here it is:  I have not had  a single issue with Dynect since I made the transition.  There is not much to say other than that…

I have not had any site reachability issues that I can point the finger at DNS for, and I have never had to call Dynect support.  I have not even had any billing snafu’s.

The Dynect admin console still rocks with cool real time metrics.

It just works.  The pricing is reasonable.  And the guys/gals that work there are just cool folks.

P.S.  They even added a help comment to the admin interface to address my concern of not being able to modify the SOA minimum TTL value.  It now says you can contact their Concierge service if you need the value changed.


Categories: Network Tags:

Cogent Eastbound route out of Portland to Boise and a new POP

July 9th, 2010 3 comments

It would appear that Cogent finally has a long-awaited route Eastbound out of Portland.  I just noticed it on their web site today and a quick traceroute confirms there is now connectivity to Boise.

INET-A#traceroute ccr01.boi01.atlas.cogentco.com
Translating “ccr01.boi01.atlas.cogentco.com”…domain server (xx.xx.xx.xx) [OK]
Type escape sequence to abort.
Tracing the route to lo0.ccr01.boi01.atlas.cogentco.com (
1 gi1-xx.10xx.ccr01.pdx01.atlas.cogentco.com (38.104.104.xx) [AS 174] 0 msec 1 msec 0 msec
2 te4-2.ccr01.pdx02.atlas.cogentco.com ( [AS 174] 0 msec 1 msec 1 msec
3 te7-3.ccr01.boi01.atlas.cogentco.com ( [AS 174] 11 msec
te4-3.ccr01.boi01.atlas.cogentco.com ( [AS 174] 12 msec *

I then noticed that traffic Eastbound beyond Boise to Salt Lake still prefers going through Sacramento.

INET-A#traceroute ccr01.slc01.atlas.cogentco.com
Translating “ccr01.slc01.atlas.cogentco.com”…domain server (xx.xx.xx.xx) [OK]
Type escape sequence to abort.
Tracing the route to lo0.ccr01.slc01.atlas.cogentco.com (
1 gi1-xx.10xx.ccr01.pdx01.atlas.cogentco.com (38.104.104.xx) [AS 174] 1 msec 1 msec 0 msec
2 te4-2.ccr01.pdx02.atlas.cogentco.com ( [AS 174] 0 msec 1 msec 1 msec
3 te3-4.ccr01.smf01.atlas.cogentco.com ( [AS 174] 12 msec
te4-3.ccr01.smf01.atlas.cogentco.com ( [AS 174] 13 msec 12 msec
4 te3-3.ccr01.slc01.atlas.cogentco.com ( [AS 174] 25 msec
te4-1.ccr01.slc01.atlas.cogentco.com ( [AS 174] 25 msec *

Further investigation using Cogent’s looking glass tool from Washington DC shows me that either their network map is incorrect, or they currently have a circuit down from Boise to Salt Lake (or for some weird traffic engineering reason my traceroutes are not hitting it).  Routing from DC to Boise through PDX is not exactly what I would consider “optimal”.  😉

Looking Glass Results: Washington, DC
Query: trace
Type escape sequence to abort.
Tracing the route to lo0.ccr01.boi01.atlas.cogentco.com (
1 fa0-8.na01.b005944-0.dca01.atlas.cogentco.com ( 4 msec 4 msec 4 msec
2 vl3507.mpd03.dca01.atlas.cogentco.com ( 4 msec 4 msec 0 msec
3 te0-3-0-0.ccr21.dca01.atlas.cogentco.com ( 0 msec 0 msec 4 msec
4 te0-2-0-4.ccr21.ord01.atlas.cogentco.com ( 28 msec
te0-1-0-4.ccr21.ord01.atlas.cogentco.com ( 20 msec
te0-2-0-4.ccr21.ord01.atlas.cogentco.com ( 24 msec
5 te0-0-0-3.ccr21.mci01.atlas.cogentco.com ( 32 msec
te0-2-0-7.ccr21.ord01.atlas.cogentco.com ( 32 msec 32 msec
6 te4-4.mpd02.den01.atlas.cogentco.com ( 48 msec
te0-3-0-0.ccr21.mci01.atlas.cogentco.com ( 44 msec 44 msec
7 te4-4.mpd02.den01.atlas.cogentco.com ( 56 msec
te3-2.ccr01.slc01.atlas.cogentco.com ( 60 msec 64 msec
8 te3-2.ccr01.slc01.atlas.cogentco.com ( 64 msec
te7-3.ccr01.smf01.atlas.cogentco.com ( 72 msec
te4-2.ccr01.smf01.atlas.cogentco.com ( 76 msec
9 te7-3.ccr01.smf01.atlas.cogentco.com ( 100 msec 80 msec 80 msec
10 te3-3.ccr01.pdx02.atlas.cogentco.com ( 84 msec
te4-3.ccr01.pdx02.atlas.cogentco.com ( 84 msec 84 msec
11 te4-3.ccr01.boi01.atlas.cogentco.com ( 96 msec *
te7-3.ccr01.boi01.atlas.cogentco.com ( 92 msec

Cogent has talked about an Eastbound route for some time now so I am jazzed to see it is finally happening!  Here is to hoping that Boise <-> Salt Lake link comes online very soon!

Whoh, hold the phone- I just noticed that my route to Boise is going the ccr01.PDX02.atlas.cogentco.com.  That’s new!  Previously they only had a single router in Portland in the Pittock building.  A quick check of their POP list in Portland reveals that they are saying 707 SW Washington St is a POP which is the Bank of California building.  That’s even better news as they are now one of (or perhaps the only?) carrier that has multiple routes out of town with core routers in multiple facilities.  I can’t say that I know for sure of any carrier in town with core routers in more than one facility.


Categories: Uncategorized Tags:

Upgrading Qwest DSL to 12 megabit ADSL2+

July 6th, 2010 No comments

Last week I upgraded a Qwest Business DSL (err, High Speed Internet) line in downtown Portland from 7 meg to 12 meg as they are finally offering speeds above 7 meg (though 12 was the max).  It was a nominal additional monthly cost, and the upgrade was free (they even gave a month of free service).

Some interesting notes:

I had previously set my modem to do PPPoA (PPP over ATM) such that it could support full 1500 byte MTU’s (rather than the PPPoE that they have been recommending for quite some time in anticipation of the transition away from ATM).  When you do PPP over Ethernet there is an 8 byte PPP header that cuts your max payload down to 1492.  In order to take advantage of the new service however, I was forced to reconfigure to PPPoE (the 1492 byte max MTU is not a big deal and is pretty common in residential/small biz internet connections these days).  This in combination with the fact that they told me they had to make a wiring change in a “cross box” somewhere tells me that I got moved to a new DSLAM that is not fed by ATM anymore (thank goodness!).

I am particularly happy about this because I am guessing a lot of the ATM based DSLAM’s out there are likely fed by NxT-1 backhaul setups (i.e. a bunch of bonded T-1’s) which seriously limits the amount of aggregate bandwidth available to all the users.  If your providing 100 7 meg DSL lines and you only have 8 T-1’s for backhaul, that’s some serious oversubscription!  I would recommend that anyone out their with Qwest DSL do what you can (i.e. upgrade service tiers) to get hooked to one of the new DSLAM’s, even if you then later switch back to a lower speed service offering as the newer DSLAM’s are likely to be loaded nowhere near as heavily (i.e. they likely have 1 gig ethernet fiber backhaul connections).

Here is a speedtest from Qwest’s speedtest site:

Qwest DSL Speed Test After Upgrade to 12 Megabit

Anecdotally, it would seem that ping times are faster on the new DSL, though I can’t say I actually plugged into the network (wired rather than wireless) and ran the same test before making the change:

erosenbe-mac:~ erosenbe$ ping
PING ( 56 data bytes
64 bytes from icmp_seq=0 ttl=56 time=38.520 ms
64 bytes from icmp_seq=1 ttl=56 time=38.820 ms
64 bytes from icmp_seq=2 ttl=56 time=39.110 ms
64 bytes from icmp_seq=3 ttl=56 time=39.335 ms
64 bytes from icmp_seq=4 ttl=56 time=39.174 ms
64 bytes from icmp_seq=5 ttl=56 time=39.575 ms
64 bytes from icmp_seq=6 ttl=56 time=38.693 ms
64 bytes from icmp_seq=7 ttl=56 time=38.723 ms
64 bytes from icmp_seq=8 ttl=56 time=39.066 ms
64 bytes from icmp_seq=9 ttl=56 time=39.227 ms
64 bytes from icmp_seq=10 ttl=56 time=39.550 ms
— ping statistics —
11 packets transmitted, 11 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 38.520/39.072/39.575/0.333 ms
erosenbe-mac:~ erosenbe$

It is worth noting that the service is indeed ADSL2+ (even though I think technically ADSL2 can go to 12 megabit under the right conditions).  The upload speed still is extremely pitiful.  I expect more in this day and age.  My FiOS can do 25 megabit down, and at least 10 megabit up (I think some plans include up to 25 megabit upload).

In this case, I am only a couple blocks from the downtown Portland Central office (PTLD69) so I am able to sync at the full line rate of 12 megabit:

Qwest DSL Modem Status Linked at 12 Megabit

So overall, it is cool that Qwest is finally offering over 7 megabit of service, but I am disappointed that 12 megabit is the top end they are offering in downtown.  I have heard they are offering 20 megabit elsewhere (perhaps out in the suburbs where they are competing with cable modems).  I have also heard that they are offering VDSL rather than ADSL2+ in some areas.  I can not think of any reason to not offer speeds in excess of 12 megabits downtown, other than to keep from competing with their metro ethernet over copper/fiber products and other high margin services.

Integra Telecom offers pretty high speed DSL offerings these days, and they are even now offering MPLS over DSL (or at least that is what I have heard).  Qwest needs to catch up.

It still is disappointing that Qwest can’t muster the cash to deploy a real broadband network (i.e. fiber to the home/business).  They are getting their butts kicked by Comcast in residential, and by all the CLEC’s in commercial.  Hopefully when they get taken over by CenturyLink things will change, but at the moment I am not holding my breath.  I am glad to be out in Verizon (err, Frontier) FiOS land.  We shall see how that transition goes as well…


Categories: Uncategorized Tags:

Review of moving from NeuStar UltraDNS to Dynect Managed DNS Service

May 21st, 2010 7 comments

For many years I have used the UltraDNS service from NeuStar on behalf of several companies I have worked for, as it has been incredibly reliable and easy to use.  I cannot, however, say it has been exactly inexpensive, and in recent years innovation has seemingly slowed to a crawl.  Each time in the past that I have evaluated the field of other options, there has not been any “worthy” contenders in the space, until now that is.

After recently completing an evaluation and trial run of dyn.com’s Dynect service, we went ahead and switched over to their service for some very high volume domains that generate millions of queries a day.

A few notes on the transition process

The Neustar zone export tool had issues and was truncating zone file output on some of our zones (and loosing records in the process!).  This is a serious bug (though one they may not be too heavily incentivized to fix).  I have reported this bug to NeuStar and they informed me they were already aware of the issue.

So next up, I tried enabling the Dynect servers IP address to be allowed to do a zone transfer from UltraDNS, but it turned out, Dynect had a bug where they could not do zone transfers directly using AXFR from UltraDNS (they are actively working to fix this they tell me).

I ended up doing an AXFR out of UltraDNS from my desktop PC using DIG (after allowing my IP to do the transfer in the NeuStar control panel) and then pasting it into Dynect’s import tool.  This process was slightly annoying, but in the grand scheme of things not a big deal (it took more time to validate all the data got moved over properly than anything else).

Notes on the Dynect platform

The real time reporting of queries per second is awesome functionality that I now consider to be critical.  This is available from Dynect on a per zone, per record type, or per individual record basis.  I did not know what I was missing before.  It has allowed me to find a couple “issues” with my zone records that I would have otherwise been unaware of.  With UltraDNS I had no idea how many queries I had used until the end of the month came around and I got a bill that included almost no detail.

One of these issues was the lack of AAAA (IPv6) records on one particular host entry that gets millions of queries per day.  Newer Windows Vista and Windows 7 machines will attempt an IPv6 lookup in addition to (or before?) the IPv4 lookup as IPv6 is enabled by default.  Since this site is not yet IPv6 enabled, we do not serve out an AAAA record and so instead the remote DNS server uses the SOA (Start of Authority) “minimum” value as the TTL (Time To Live) on the negative cache entry it adds to it’s system.  The net result of this is that IPv4 queries get cached for the 6 hour TTL we have set, but IPv6 queries which result in a “non existant” answer only get cached for 60 seconds (which is the SOA minimum value Dynect uses).  This results in huge query volumes for IPv6 records in addition to the IPv4 records, and this issue will only get worse as more end clients become IPv6 enabled but the site in question remains IPv4 only.

Dynect does not allow end users to muck with the SOA values (other than default TTL) which is highly unfortunate in my mind.  NeuStar UltraDNS did allow these changes to be made by the end user on any zone.  The good news is that Dynect was able to manually change my SOA minimum values to a longer interval for me (somewhat begrudgingly).  They claim the lack of user control is by design (to keep people from messing something up that then gets cached for a long interval), though in my mind there needs to be an advanced user mode for those ready and willing to run that risk.

The other issue Dynect’s real time reporting shed light on for me was a reverse DNS entry that I was missing on a very high volume site, which was again causing high query volume to that IP as the negative cache interval was 60 seconds.  I rectified this by adding an appropriate PTR record.

I do have to point out that I am not so thrilled with either the simple editor or the expert editor that Dynect provides.  The tree control with leafs for every record is seemingly clunky to me, and the advanced editor is not the end all be all either (as certain functionality does not exist there, and it leaves you to edit certain records like SRV with multiple data values in a single text box).  But these don’t really get in my way of being very happy with the service.

Perhaps of more concern to me is Dynect’s lack of a 24×7 NOC.  Granted they have an on-call engineer 24×7, though for something as critical as DNS I would encourage them to staff a NOC as soon as their business can support it.  This is a service offering  UltraDNS has that I have utilized and been happy with in the past.

Another feature Dynect seems to do well is the ability to see what changes have been made to your zones (auditing ability).  I have not dove into it too much with Dynect or UltraDNS, but it seems to exist as a core feature in a more useful fashion than I have seen on UltraDNS.  One thing that I never could figure out on UltraDNS was how to go back and look at audit history for deleted records (not to mention confirmation of record modification or deletion).

I should note at this point one major difference between the pricing mechanisims for UltraDNS and Dynect.  My experience with Ultra has been that they do things on a per bucket of 1000 queries basis.  Dynect on the other hand bills on a 95th percentile basis of Queries Per Second (QPS) on a 5 minute interval, similar to what ISP’s bill for bandwidth.  Depending on your usage patterns, either one of these billing models could be more adventagious to you.

Also, I am not going to dive into too much detail here, but UltraDNS and Dynect both offer gloabal server load balancing solutions that differ in one very key way- UltraDNS has a new solution that uses a Geolocate database to direct queries to a desired server based on source IP address, where as Dynect’s offering only provides the ability to do this based on their Anycast node locations.  There are pro’s and con’s to each, perhaps that will become a future blog post.

Wrapping it up

UltraDNS is a great service that has proven itself reliable in the long run.  I would recommend their service to others in the future.  They do need to keep up with the changing technology however (new releases to the admin console indicate they are starting to head in this direction).

Dynect has assembled a fully competative (and better in some ways) offering that I would now classify as a viable option for most UltraDNS customers.  My migration to their solution was very smooth and so far there have been no issues.  I welcome Dynect to the Managed External DNS Service space and the healthy competition they provide.

I should also note that their sales and support team has treated us/me well.  They genuinely seem to care about this stuff and I don’t come away with the slimy feeling after talking to them.


Categories: Network Tags:

Cisco ASA 8.0.5 TFTP Unspecified Error using PumpKIN

December 10th, 2009 2 comments

I have run into a problem on two separate ASA’s now downloading code to them using the PumpKIN TFTP server.  It get’s part way through the download and dies (at different places each time so it’s an intermittent error).

I was running a 7.0.8 release on these devices, and then upgraded to 8.0.5 (copying the file off a PumpKIN TFTP server with no issue), but then I was reloading 8.0.5 onto the devices (while booting off 8.0.5 already) and it could not access the exact same file properly.

ciscoasa# copy tftp flash
ciscoasa# copy tftp flash Address or name of remote host []? Source filename []? asa805-k8.bin Destination filename [asa805-k8.bin]? Accessing tftp://!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! WARNING: TFTP download incomplete! ! %Error reading tftp:// (Unspecified Error) ciscoasa#

I ended up using the SolarWinds TFTP server instead and it worked like a champ.  I am not sure what the issue is here, but it looks like some kind of bug in PumpKIN or in the ASA code (or some combination thereof).


Categories: Cisco, Network Tags:

Google Recursive DNS Speed Test

December 8th, 2009 1 comment

So a few days ago Google announced their new public DNS service that will answer recursive queries for any host.  There has been a lot of coverage of this elsewhere so I did not feel compelled to post anything about it until I saw this post discussing how fast Google’s DNS servers were compared to other ISP’s servers.  I felt that I am in a somewhat unique position to provide some test data as I have direct access to an Internet connection from an ISP peered with NWAX which has Google as a member.  The end result of this is that my round trip times to many Google services are 3-4ms.

So I downloaded the same test tool as Jon Radoff and ran the test from my connection.  In the results below you can clearly see that Google is the fastest (or right there with the fastest).

A test of DNS server performance from an Internet connection close to Google

A test of DNS server performance from an Internet connection close to Google

I would conclude that Google’s DNS servers are just as fast as any other out there, but the issue is that of latency.  Your ISP’s servers have an advantage over Google (in most cases) since they sit on the service providers network.  That is not to say Comcast or Verizon may not have their DNS servers on the other side of the country from you, (but that would be just dumb).

All in all, I am very happy that Google now provides this service as it may be really useful from time to time.  Most corporate environments don’t care though since they have internal DNS servers to handle their recursive requests.


Categories: Google, Network, Telecom Tags: