Archive

Archive for July, 2009

Are blade servers right for my environment?

July 15th, 2009 No comments

IT like most industries has it’s “fad”s.  Whether it be virtualization, or SAN’s, or blade servers.  Granted these three technologies play really nicely together, but once in a while you need to get off the bandwagon for a moment and think about what these technologies really do for us. While they are very cool overall and can make an extremely powerful team, as with anything, there is a right place, time, and situation/environment for their use.  Blades are clearly the “wave of the future” in many respects, but you must be cautious about the implications of implementing them today.

Please do not read this article and come away thinking I am “anti-blade” as that is certainly not the case.  I just feel they are all too often pushed into service in situations they are not the correct solution for and would like to point out some potential pitfalls.

Lifecycle co-termination

When you buy a blade center, one of the main selling points is that the network, SAN, and KVM infrastructure is built in.  This is great in terms of ease of deployment and management, however, on the financial side of things you must realize that the life span of these items is not normally the same.  When buying servers I typically expect them to be in service for 4 years, KVM’s (while becoming less utilized actually), can last much longer under most circumstances (barring changes in technology from PS/2 to USB, etc…), network switches I expect to use in some capacity or another for seven years, and SAN switches will probably have a similar life-cycle to the Storage Arrays they are attached to which I generally target at 5 year life spans.

So what does this mean?  Well, if your servers are showing their age in 4 years you are likely to end up replacing the entire blade enclosure at that point which includes the SAN and network switches.  It is possible the vendor will still sell blades that will fit in that enclosure, however, you are likely to be wanting a SAN or network upgrade before the end of those second set of servers life-cycles which will likely result in whole new platforms being purchased anyway.

Vendor lock

You have just created vendor lock such that with all the investment in enclosures you can’t go buy someone elses servers (this really sucks when your vendor fails to innovate on a particular technology).  All the manufacturers realize this situation exists and will surely use it to their advantage down the road.  It is hard to threaten not to buy Dell blades to put in your existing enclosures when that would mean throwing away your investment in SAN and network switches.

San design

Think about your SAN design – Most shops hook servers to a SAN switch which is directly attached to the storage array their data lives on.  Blade enclosures encourage the use of many more smaller SAN switches which often requires hooking the blade enclosure switches to other aggregation SAN switches which are then hooked to the Storage Processor.  This increases the complexity, increases failure points, decreases MTBF, and increases vendor lock.  Trunking SAN switches together from different vendors can be problematic and may require putting them in a compatibility mode which turns off useful features.

Vendor compatibility

Vendor compatibility becomes a huge issue- Say that you buy a blade enclosure today with 4 gig Brocade SAN switches in it for use with your existing 2 gig Brocade switches attached to an EMC Clarion CX500, but then next year you want to replace that with a Hitachi array attached to new Cisco SAN switches.  There are still many interop issues between SAN switch vendors that make trunking switches problematic.  If you had bought physical servers you may have just chosen to re-cable the servers over to the new Cisco switches directly.

Loss of flexibility

Another pitfall that I have seen folks fall into with blade servers is the loss of flexibility that comes with having a stand alone physical server.  You can’t hook up that external hard drive array full of cheap disks directly to the server, or hook up that network heartbeat crossover cable for your cluster, or add an extra NIC or two to a given machine that needs to be directly attached to some other network (that is not available as a VLAN within your switch)….

Inter-tying dependencies

You are creating dependencies on the common enclosure infrastructure so for full redundancy you need servers in multiple blade enclosures.  The argument that the blade enclosures are extremely redundant does not completely hold water to me.  I have needed to completely power cycle entire blade enclosures before to recover from certain blade management module failures.

Provisioning for highest common denominator

You must provision the blade enclosure for the maximum amount of SAN connectivity, network connectivity, and redundancy that is required on any one server within the enclosure.  Say for instance you have a authentication server that is super critical, but not resource intensive.  This requires your blade center to have fully redundant power supplies, network switches, etc…  Then say you have a different server that needs four 1 gig network interfaces, and yet another DB server that needs only two network interfaces, but it needs four HBA connections to the SAN.  You now need an enclosure that has four network switches and four SAN switches in it just to satisfy the needs of three “special case” servers.  In the case of the Dell M1000 blade enclosures, this configuration would be impossible since they can only have six SAN/Network modules total.

Buying un-used infrastructure

If you purchase a blade center that is not completely full of blades then you are wasting infrastructure resources in the form of unused network ports, SAN ports, power supply, and cooling capacity.  Making the ROI argument for blade centers is much easier if you have need to purchase full enclosures.

Failing to use existing infrastructure

Most environments have some amount of extra capacity on their existing network and SAN switches, as when they were purchased, they planned for the future (probably not with blade enclosures in mind).  Spending money to re-purchase SAN and network hardware within a blade enclosure to allow the use of blades can kill the cost advantages of going with a blade solution.

Moving from “cheap” disks to expensive SAN disks

You typically can not put many local disks into blades.  This is in many cases a huge loss as not everything needs to be on the SAN (and in fact, certain things would be very stupid to put on the SAN such as SWAP files).  I find that these days many people overlook the wonders of locally attached disk.  It is the *cheapest* form of disk you can buy and also can be extremely fast!  If your application does not require any of the advanced features a SAN can provide then DONT PUT IT ON THE SAN!

Over-buying power

In facilities where you are charged for power by the circuit the key is to manage your utilization such that your un-used (but paid for) power capacity is kept to a minimum.  With a blade enclosure, on day 1 you must provide (in this example) two 30 amp circuits for your blade enclosure, even though you are only putting in 4 out of a possible 16 severs.  You are going to be paying for those circuits even though you are nowhere near fully utilizing them.  The Dell blade enclosures as an example require two three phase 30 amp circuits for full power (though depending on the server configurations you put in them you can get away with dual 30 amp 208v circuits).

Think about the end of the life-cycle

You can’t turn off the power to a blade enclosure until the last server in that enclosure is decommissioned.  You also need to maintain support and maintenance contracts on the SAN switches, network switches, and enclosure until the last server is no longer mission critical.

When are blades the right tools for the job?

  • When your operational costs of operations and maintenance personnel far outweigh the cost inefficiencies of blades.
  • When you are buying enough servers that you can purchase *full* blade enclosures that have similar connectivity and redundancy requirements (i.e. each needs two 1 gig network ports and two 4 gig SAN connections).
  • When you absolutely need the highest density of servers offered (note that most datacenters in operation today can’t handle the density of power required and heat that blades can put out).

An example of a good use of blades would be a huge Citrix farm, or VMWare farms, or in some cases webserver farms (though I would argue very large web farms that can scale out easily should be on some of the cheapest hardware you can buy which typically does not include blades).

Another good example would be compute farms (say even lucene cache engines) – as long as you have enough nodes to be able to fill enclosures with machines that have the same connectivity and redundancy requirements.

Conclusion

While blades can be great solutions, they need to be implemented in the right environments for the right reasons.  It may indeed be the case that the savings in operational costs of employees to setup, manage, and maintain your servers far outweighs all of the points raised above, but it is important to factor all of these into your purchase decision.

As always, if you have any feedback or comments, please post below or feel free to shoot me an email.

-Eric

Categories: Cisco, Dell, HP, IBM, Network, Sun, Systems Tags:

Cisco Netflow to tell who is using Internet bandwidth

July 4th, 2009 1 comment

When working with telecom circuits that are slow and “expensive” (relative to lan circuits), the question frequently comes up “What is using up all of our bandwidth?”.  Many times this is asked because an over-subscribed WAN or Internet circuit is inducing latency/packet drops in mission critical applications such as Citrix or VoIP.  In other cases a company may be paying for a “burstable” Internet connection whereby they are paying for a floor of 10 megabits, but they can utilize up to 30 megabits and just be billed for the overage (at the 95th percentile generally).

So how do you tell which user/server/application is chewing up your Internet or WAN circuits?  Well Cisco has implemented a technology called “netflow” that allows your router to keep statistics on each TCP or UDP “flow” and then periodically shove that data into a logging packet and ship it off to some external server.  On this server you can run one of a variety of different software packages to analyze the data and understand what is using up your network bandwidth.

The question is, what software package should you utilize?  I have not gone and evaluated all of the available options, but I do have experience with a couple of them.  I have used Scrutinizer from Plixer in the past and not been very impressed.  Part of it may have been that the machine it was running on was not very fast, but I just did not like the interface or capabilities much.

More recently I have downloaded and run NetFlow Analyzer from ManageEngine and I have been very impressed!  It is free for only two interfaces and they have an easy-to download and install demo that will run unlimited interfaces for 30 days.  It runs on Linux or Windows (I tried the Linux version) and is is dirt simple to install and configure.  There really is nothing of note to configure on the server itself, you just need to point your router at the server’s IP and it will automatically start generating graphs for you.

I should also mention that Paessler has some kind of netflow capabilities (in PRTG), but I have not checked it out.  I note it here since I use their snmp monitoring software extensively and I have been happy with it.

To get your router to send NetFlow data to a collector, you need to set a couple of basic settings (including which version of NetFlow to use and where to send the packets), and then enable sending flows for traffic on all interfaces.  Note that it used to be you could only collect netflow data upon ingress to an interface and so in order to collect data on bi-directional traffic you needed to enable it on every single router interface in order to see the traffic in the opposite direction.  This was done with the “ip route-cache flow” command on each interface.

Now “ip route-cache flow” has been replaced with “ip flow ingress” and you can also issue “ip flow egress” command if you were to not wanting to monitor all router interfaces.  I have just stuck with issuing “ip flow ingress” on all my interfaces since I wanted to see all traffic anyway (and I am not quite sure what would happen if you issue both commands on two interfaces and then had traffic flow between them, it might double count those flows).

Here are the exact commands I used on plunger to ship data to Netflow Analyzer 7:

plunger#conf t

Enter configuration commands, one per line.  End with CNTL/Z.

plunger(config)#ip flow-cache timeout active 1

plunger(config)#ip flow-export version 5

plunger(config)#ip flow-export destination x.x.x.x 9996

plunger(config)#int fastEthernet 0/0

plunger(config-if)#ip flow ingress

plunger(config-if)#int fastEthernet 0/1

plunger(config-if)#ip flow ingress

plunger(config-if)#end

plunger#write mem

Building configuration…

[OK]

plunger#exit

Happy NetFlowing!

-Eric

Categories: Cisco, Network Tags: