Archive

Archive for the ‘Sun’ Category

Advanced PING usage on Cisco, Juniper, Windows, Linux, and Solaris

September 15th, 2009 1 comment

As a network engineer, one of the most common utilities I use is the ping command.  While in its simplest form it is a very valuable tool, there is much more knowledge that can be gleaned from it by specifying the right parameters.

Ping on Cisco routers

On modern Cisco IOS versions the ping tool has quite a few options, however, this was not always the case.

Compare the options to ping in IOS 12.1(14)

EXRTR1#ping ip 4.2.2.1 ?
  <cr>

EXRTR1#ping ip 4.2.2.1

To that in IOS 12.4(24)T

plunger#ping ip 4.2.2.1 ?
  data      specify data pattern
  df-bit    enable do not fragment bit in IP header
  repeat    specify repeat count
  size      specify datagram size
  source    specify source address or name
  timeout   specify timeout interval
  validate  validate reply data
  <cr>

plunger#ping ip 4.2.2.1
plunger#ping ip 4.2.2.1 ?
data      specify data pattern
df-bit    enable do not fragment bit in IP header
repeat    specify repeat count
size      specify datagram size
source    specify source address or name
timeout   specify timeout interval
validate  validate reply data
<cr>
plunger#ping ip 4.2.2.1

When I am running a basic connectivity test between two points on a network I will generally not specify any options to ping (i.e. “ping 4.2.2.1”), however, once I have verified connectivity I will most often then want to verify what MTU size the path will support without fragmentation, and then also run an extended ping process with a thousand or more pings to test the reliability/bandwidth/latency characteristics of the link.

Here is an example of the most basic form.  Note that by default it is sending 100 byte frames:

plunger#ping 4.2.2.1

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 4.2.2.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 24/26/28 ms
plunger#

If I am working on an Ethernet network (or PPP link), it is most common that my target goal is for 1500 byte frames to make it through.  I will use the “size” parameter to force ping to generate 1500 byte frames (note that in Cisco land this means 1472 byte ICMP payloads plus 8 bytes ICMP header and 20 bytes IP header).  I also use the df-bit flag to set the DO NOT FRAGMENT bit on the generated packets.  This will allow me to ensure that the originating router (or some other router in the path), is not fragmenting the packets for me.

plunger#ping 4.2.2.1 size 1500 df-bit 

Type escape sequence to abort.
Sending 5, 1500-byte ICMP Echos to 4.2.2.1, timeout is 2 seconds:
Packet sent with the DF bit set
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 24/26/28 ms
plunger#

If the first ping command worked, but the command above did not, then try backing down the size until you find a value that works.  Note that one common example of a smaller MTU is 1492 which is caused by the 8 bytes of overhead in PPPoE connections.

The next command to try is to send a large number of pings of the maximum MTU your link can support.  This will help you identify packet loss issues and is just a good way to generate traffic on the link to see how much bandwidth you can push (if your monitoring the link with another tool).  I have frequently identified bad WAN circuits using this method.  Note that looking at the Layer 1/2 error statistics before doing this (and perhaps clearing them), and then looking at them again afterwards (on each link in the path!) is often a good idea.

plunger#ping 192.168.0.10 size 1500 df-bit repeat 1000

Type escape sequence to abort.
Sending 1000, 1500-byte ICMP Echos to 192.168.0.10, timeout is 2 seconds:
Packet sent with the DF bit set
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!
Success rate is 100 percent (1000/1000), round-trip min/avg/max = 1/2/4 ms
plunger#

Now the final ping parameter I often end up using is the “source” option.  A good example is when you have a router with a WAN connection on one side that has a /30 routing subnet on it, plus then an Ethernet connection with a larger subnet for your users devices.  Say that users on the Ethernet are reporting that they can not ping certain locations on the WAN, though you can ping it just fine from the router.  This is often because the return path from the device you are pinging back to your users subnet on the Ethernet is not being routed properly, but the IP your router has on the /30 WAN subnet is being routed correctly.  The key here is that by default a Cisco router will originate packets from the IP of the Interface it is going to be sending the traffic out (based on it’s routing tables).

To test from the interface your router has in the users subnet, use the source command like this:

plunger#ping 4.2.2.1 source fastEthernet 0/0

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 4.2.2.1, timeout is 2 seconds:
Packet sent with a source address of 173.50.158.74
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 24/25/28 ms
plunger#

Note that you can also use these command together depending on what you are trying to do:

plunger#ping 192.168.0.10 size 1500 df-bit source FastEthernet 0/0 repeat 100

Type escape sequence to abort.
Sending 100, 1500-byte ICMP Echos to 192.168.0.10, timeout is 2 seconds:
Packet sent with a source address of 173.50.158.74
Packet sent with the DF bit set
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Success rate is 100 percent (100/100), round-trip min/avg/max = 1/2/4 ms
plunger#

Ping on Juniper routers

The options to ping on Juniper (version 9.3R4.4 in this case) are quite extensive:

root@INCSW1> ping ?
Possible completions:
  <host>               Hostname or IP address of remote host
  bypass-routing       Bypass routing table, use specified interface
  count                Number of ping requests to send (1..2000000000 packets)
  detail               Display incoming interface of received packet
  do-not-fragment      Don't fragment echo request packets (IPv4)
  inet                 Force ping to IPv4 destination
  inet6                Force ping to IPv6 destination
  interface            Source interface (multicast, all-ones, unrouted packets)
  interval             Delay between ping requests (seconds)
  logical-system       Name of logical system
+ loose-source         Intermediate loose source route entry (IPv4)
  no-resolve           Don't attempt to print addresses symbolically
  pattern              Hexadecimal fill pattern
  rapid                Send requests rapidly (default count of 5)
  record-route         Record and report packet's path (IPv4)
  routing-instance     Routing instance for ping attempt
  size                 Size of request packets (0..65468 bytes)
  source               Source address of echo request
  strict               Use strict source route option (IPv4)
+ strict-source        Intermediate strict source route entry (IPv4)
  tos                  IP type-of-service value (0..255)
  ttl                  IP time-to-live value (IPv6 hop-limit value) (hops)
  verbose              Display detailed output
  vpls                 Ping VPLS MAC address
  wait                 Delay after sending last packet (seconds)
root@INCSW1> ping

While there are a lot more options here, I am generally trying to test the same types of things.  A very important note however is that in the Juniper world, the size parameter is the payload size and does not include the 8 byte ICMP header and 20 byte IP header.   The command below is the same as specifying 1500 bytes in Cisco land.

root@INCSW1> ping 4.2.2.1 size 1472 do-not-fragment
PING 4.2.2.1 (4.2.2.1): 1472 data bytes
1480 bytes from 4.2.2.1: icmp_seq=0 ttl=53 time=25.025 ms
1480 bytes from 4.2.2.1: icmp_seq=1 ttl=53 time=24.773 ms
1480 bytes from 4.2.2.1: icmp_seq=2 ttl=53 time=24.757 ms
1480 bytes from 4.2.2.1: icmp_seq=3 ttl=53 time=25.045 ms
1480 bytes from 4.2.2.1: icmp_seq=4 ttl=53 time=24.911 ms
1480 bytes from 4.2.2.1: icmp_seq=5 ttl=53 time=25.152 ms
^C
--- 4.2.2.1 ping statistics ---
6 packets transmitted, 6 packets received, 0% packet loss
round-trip min/avg/max/stddev = 24.757/24.944/25.152/0.145 ms

root@INCSW1>

Here is an example of sending lots of pings quickly on the Juniper to test link reliability:

root@INCSW1> ping 10.0.0.1 size 1472 do-not-fragment rapid count 100
PING 10.0.0.1 (10.0.0.1): 1472 data bytes
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
--- 10.0.0.1 ping statistics ---
100 packets transmitted, 100 packets received, 0% packet loss
round-trip min/avg/max/stddev = 1.230/2.429/5.479/1.228 ms

root@INCSW1>

Ping on Windows XP/Vista/2003/2008

While not as full featured, the Windows ping tool can at least set the packet size and the do-not-fragment bit:

Microsoft Windows [Version 6.0.6001]
Copyright (c) 2006 Microsoft Corporation.  All rights reserved.

C:\Users\eric.rosenberry>ping ?
^C
C:\Users\eric.rosenberry>ping

Usage: ping [-t] [-a] [-n count] [-l size] [-f] [-i TTL] [-v TOS]
            [-r count] [-s count] [[-j host-list] | [-k host-list]]
            [-w timeout] [-R] [-S srcaddr] [-4] [-6] target_name

Options:
    -t             Ping the specified host until stopped.
                   To see statistics and continue - type Control-Break;
                   To stop - type Control-C.
    -a             Resolve addresses to hostnames.
    -n count       Number of echo requests to send.
    -l size        Send buffer size.
    -f             Set Don't Fragment flag in packet (IPv4-only).
    -i TTL         Time To Live.
    -v TOS         Type Of Service (IPv4-only).
    -r count       Record route for count hops (IPv4-only).
    -s count       Timestamp for count hops (IPv4-only).
    -j host-list   Loose source route along host-list (IPv4-only).
    -k host-list   Strict source route along host-list (IPv4-only).
    -w timeout     Timeout in milliseconds to wait for each reply.
    -R             Use routing header to test reverse route also (IPv6-only).
    -S srcaddr     Source address to use.
    -4             Force using IPv4.
    -6             Force using IPv6.

C:\Users\eric.rosenberry>

So your basic ping in Windows claims to send 32 bytes of data (I have not verified this), but I suspect that is 32 bytes of payload, plus 8 bytes ICMP, and 20 bytes of IP for a total of 60 bytes.

C:\Users\eric.rosenberry>ping 4.2.2.1

Pinging 4.2.2.1 with 32 bytes of data:
Reply from 4.2.2.1: bytes=32 time=27ms TTL=53
Reply from 4.2.2.1: bytes=32 time=25ms TTL=53
Reply from 4.2.2.1: bytes=32 time=28ms TTL=53
Reply from 4.2.2.1: bytes=32 time=27ms TTL=53

Ping statistics for 4.2.2.1:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 25ms, Maximum = 28ms, Average = 26ms

C:\Users\eric.rosenberry>

So a common set of flags I use will be to create full 1500 byte frames (note that it takes 1472 as the parameter for this) and then tell it not to fragment (-f) and to repeat until stopped (-t).

C:\Users\eric.rosenberry>ping 4.2.2.1 -l 1472 -f -t

Pinging 4.2.2.1 with 1472 bytes of data:
Reply from 4.2.2.1: bytes=1472 time=29ms TTL=53
Reply from 4.2.2.1: bytes=1472 time=30ms TTL=53
Reply from 4.2.2.1: bytes=1472 time=31ms TTL=53
Reply from 4.2.2.1: bytes=1472 time=29ms TTL=53
Reply from 4.2.2.1: bytes=1472 time=30ms TTL=53
Reply from 4.2.2.1: bytes=1472 time=28ms TTL=53
Reply from 4.2.2.1: bytes=1472 time=29ms TTL=53

Ping statistics for 4.2.2.1:
    Packets: Sent = 7, Received = 7, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 28ms, Maximum = 31ms, Average = 29ms
Control-C
^C
C:\Users\eric.rosenberry>

Ping on Linux

Hey look, it is a ping command that is not ambiguous about what size frames it is generating!!!  It clearly shows that the payload is 56 bytes, but that the full frame is 84.  Note that this is from an Ubuntu 2.6.27-7-generic kernel box.

ericr@eric-linux:~$ ping 4.2.2.1
PING 4.2.2.1 (4.2.2.1) 56(84) bytes of data.
64 bytes from 4.2.2.1: icmp_seq=1 ttl=52 time=21.8 ms
64 bytes from 4.2.2.1: icmp_seq=2 ttl=52 time=21.4 ms
64 bytes from 4.2.2.1: icmp_seq=3 ttl=52 time=21.4 ms
64 bytes from 4.2.2.1: icmp_seq=4 ttl=52 time=21.6 ms
64 bytes from 4.2.2.1: icmp_seq=5 ttl=52 time=21.8 ms
64 bytes from 4.2.2.1: icmp_seq=6 ttl=52 time=21.8 ms
^C
--- 4.2.2.1 ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5019ms
rtt min/avg/max/mdev = 21.435/21.700/21.897/0.190 ms
ericr@eric-linux:~$

To set the packet size use the -s flag (it is asking for payload size, so 1472 will create a 1500 byte frame).  Now if you want to turn off fragmentation by setting the do-not-fragment bit (DF), the parameter is a bit more obscure “-M on”.  Here is an example using both:

ericr@eric-linux:~$ ping 4.2.2.1 -s 1472 -M do
PING 4.2.2.1 (4.2.2.1) 1472(1500) bytes of data.
1480 bytes from 4.2.2.1: icmp_seq=1 ttl=52 time=23.8 ms
1480 bytes from 4.2.2.1: icmp_seq=2 ttl=52 time=24.1 ms
1480 bytes from 4.2.2.1: icmp_seq=3 ttl=52 time=31.4 ms
1480 bytes from 4.2.2.1: icmp_seq=4 ttl=52 time=23.7 ms
1480 bytes from 4.2.2.1: icmp_seq=5 ttl=52 time=23.5 ms
^C
--- 4.2.2.1 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4017ms
rtt min/avg/max/mdev = 23.589/25.369/31.469/3.057 ms
ericr@eric-linux:~$

And there is another highly useful parameter that we have not seen yet in any of our previous ping utilities.  The linux ping has a “flood” option that will send pings as fast as the machine can generate them.  This is great for testing network links capacity, but can make for unhappy network engineers if you use it inappropriately.  Note that you must be root to use the -f flag.  Output is only shown when packets are dropped:

ericr@eric-linux:~$ sudo ping 10.0.0.1 -s 1472 -M do -f
PING 10.0.0.1 (10.0.0.1) 1472(1500) bytes of data.
.^C
--- 10.0.0.1 ping statistics ---
2763 packets transmitted, 2762 received, 0% packet loss, time 6695ms
rtt min/avg/max/mdev = 2.342/2.374/3.204/0.065 ms, ipg/ewma 2.424/2.384 ms
ericr@eric-linux:~$

Ping on Solaris

Here is the ping options from a Solaris 10 box (I forget what update this super-secret kernel number decodes too):

SunOS dbrd02 5.10 Generic_125100-07 sun4v sparc SUNW,Sun-Fire-T200

I find the basic ping command in Solaris to be annoying:

[erosenbe: dbrd02]/export/home/erosenbe> ping 4.2.2.1
4.2.2.1 is alive
[erosenbe: dbrd02]/export/home/erosenbe>

I want Ping to tell me something more useful than that a host is alive.  Come on Sun, like round trip time at least?  Maybe send a few additional pings than just one?  The -s command makes this operate more like the ping command in other OS’s:

[erosenbe: dbrd02]/export/home/erosenbe> ping -s 4.2.2.1
PING 4.2.2.1: 56 data bytes
64 bytes from vnsc-pri.sys.gtei.net (4.2.2.1): icmp_seq=0. time=21.6 ms
64 bytes from vnsc-pri.sys.gtei.net (4.2.2.1): icmp_seq=1. time=21.5 ms
64 bytes from vnsc-pri.sys.gtei.net (4.2.2.1): icmp_seq=2. time=21.6 ms
64 bytes from vnsc-pri.sys.gtei.net (4.2.2.1): icmp_seq=3. time=21.4 ms
64 bytes from vnsc-pri.sys.gtei.net (4.2.2.1): icmp_seq=4. time=21.1 ms
^C
----4.2.2.1 PING Statistics----
5 packets transmitted, 5 packets received, 0% packet loss
round-trip (ms)  min/avg/max/stddev = 21.1/21.5/21.6/0.22
[erosenbe: dbrd02]/export/home/erosenbe>

With the Solaris built in Ping tool you can specify the packet size, but it is very annoying that you can’t set the do-not-fragment bit.  Come on SUN, didn’t you like invent networking???  So in this example I had it send multiple pings, and I told it to use a size of 1500 but I could not set the DF bit so the OS must have fragmented the packets before sending.  I got responses back that claim to be 1508 bytes which I am assuming means that the 1500 bytes specified was the payload amount and the returned number of bytes includes the 8 byte ICMP header, but not the 20 byte IP header…  Go SUN.

[erosenbe: dbrd02]/export/home/erosenbe> ping -s 4.2.2.1 1500
PING 4.2.2.1: 1500 data bytes
1508 bytes from vnsc-pri.sys.gtei.net (4.2.2.1): icmp_seq=0. time=36.7 ms
1508 bytes from vnsc-pri.sys.gtei.net (4.2.2.1): icmp_seq=1. time=23.2 ms
^C
----4.2.2.1 PING Statistics----
2 packets transmitted, 2 packets received, 0% packet loss
round-trip (ms)  min/avg/max/stddev = 23.2/30.0/36.7/9.6
[erosenbe: dbrd02]/export/home/erosenbe>

Conclusion

Well, I hope this rundown on a number of different ping tools is useful to folks out there and as always, if you have any comments/questions/corrections please leave a comment!

-Eric

Categories: Cisco, Linux, Network, Sun Tags:

Are blade servers right for my environment?

July 15th, 2009 No comments

IT like most industries has it’s “fad”s.  Whether it be virtualization, or SAN’s, or blade servers.  Granted these three technologies play really nicely together, but once in a while you need to get off the bandwagon for a moment and think about what these technologies really do for us. While they are very cool overall and can make an extremely powerful team, as with anything, there is a right place, time, and situation/environment for their use.  Blades are clearly the “wave of the future” in many respects, but you must be cautious about the implications of implementing them today.

Please do not read this article and come away thinking I am “anti-blade” as that is certainly not the case.  I just feel they are all too often pushed into service in situations they are not the correct solution for and would like to point out some potential pitfalls.

Lifecycle co-termination

When you buy a blade center, one of the main selling points is that the network, SAN, and KVM infrastructure is built in.  This is great in terms of ease of deployment and management, however, on the financial side of things you must realize that the life span of these items is not normally the same.  When buying servers I typically expect them to be in service for 4 years, KVM’s (while becoming less utilized actually), can last much longer under most circumstances (barring changes in technology from PS/2 to USB, etc…), network switches I expect to use in some capacity or another for seven years, and SAN switches will probably have a similar life-cycle to the Storage Arrays they are attached to which I generally target at 5 year life spans.

So what does this mean?  Well, if your servers are showing their age in 4 years you are likely to end up replacing the entire blade enclosure at that point which includes the SAN and network switches.  It is possible the vendor will still sell blades that will fit in that enclosure, however, you are likely to be wanting a SAN or network upgrade before the end of those second set of servers life-cycles which will likely result in whole new platforms being purchased anyway.

Vendor lock

You have just created vendor lock such that with all the investment in enclosures you can’t go buy someone elses servers (this really sucks when your vendor fails to innovate on a particular technology).  All the manufacturers realize this situation exists and will surely use it to their advantage down the road.  It is hard to threaten not to buy Dell blades to put in your existing enclosures when that would mean throwing away your investment in SAN and network switches.

San design

Think about your SAN design – Most shops hook servers to a SAN switch which is directly attached to the storage array their data lives on.  Blade enclosures encourage the use of many more smaller SAN switches which often requires hooking the blade enclosure switches to other aggregation SAN switches which are then hooked to the Storage Processor.  This increases the complexity, increases failure points, decreases MTBF, and increases vendor lock.  Trunking SAN switches together from different vendors can be problematic and may require putting them in a compatibility mode which turns off useful features.

Vendor compatibility

Vendor compatibility becomes a huge issue- Say that you buy a blade enclosure today with 4 gig Brocade SAN switches in it for use with your existing 2 gig Brocade switches attached to an EMC Clarion CX500, but then next year you want to replace that with a Hitachi array attached to new Cisco SAN switches.  There are still many interop issues between SAN switch vendors that make trunking switches problematic.  If you had bought physical servers you may have just chosen to re-cable the servers over to the new Cisco switches directly.

Loss of flexibility

Another pitfall that I have seen folks fall into with blade servers is the loss of flexibility that comes with having a stand alone physical server.  You can’t hook up that external hard drive array full of cheap disks directly to the server, or hook up that network heartbeat crossover cable for your cluster, or add an extra NIC or two to a given machine that needs to be directly attached to some other network (that is not available as a VLAN within your switch)….

Inter-tying dependencies

You are creating dependencies on the common enclosure infrastructure so for full redundancy you need servers in multiple blade enclosures.  The argument that the blade enclosures are extremely redundant does not completely hold water to me.  I have needed to completely power cycle entire blade enclosures before to recover from certain blade management module failures.

Provisioning for highest common denominator

You must provision the blade enclosure for the maximum amount of SAN connectivity, network connectivity, and redundancy that is required on any one server within the enclosure.  Say for instance you have a authentication server that is super critical, but not resource intensive.  This requires your blade center to have fully redundant power supplies, network switches, etc…  Then say you have a different server that needs four 1 gig network interfaces, and yet another DB server that needs only two network interfaces, but it needs four HBA connections to the SAN.  You now need an enclosure that has four network switches and four SAN switches in it just to satisfy the needs of three “special case” servers.  In the case of the Dell M1000 blade enclosures, this configuration would be impossible since they can only have six SAN/Network modules total.

Buying un-used infrastructure

If you purchase a blade center that is not completely full of blades then you are wasting infrastructure resources in the form of unused network ports, SAN ports, power supply, and cooling capacity.  Making the ROI argument for blade centers is much easier if you have need to purchase full enclosures.

Failing to use existing infrastructure

Most environments have some amount of extra capacity on their existing network and SAN switches, as when they were purchased, they planned for the future (probably not with blade enclosures in mind).  Spending money to re-purchase SAN and network hardware within a blade enclosure to allow the use of blades can kill the cost advantages of going with a blade solution.

Moving from “cheap” disks to expensive SAN disks

You typically can not put many local disks into blades.  This is in many cases a huge loss as not everything needs to be on the SAN (and in fact, certain things would be very stupid to put on the SAN such as SWAP files).  I find that these days many people overlook the wonders of locally attached disk.  It is the *cheapest* form of disk you can buy and also can be extremely fast!  If your application does not require any of the advanced features a SAN can provide then DONT PUT IT ON THE SAN!

Over-buying power

In facilities where you are charged for power by the circuit the key is to manage your utilization such that your un-used (but paid for) power capacity is kept to a minimum.  With a blade enclosure, on day 1 you must provide (in this example) two 30 amp circuits for your blade enclosure, even though you are only putting in 4 out of a possible 16 severs.  You are going to be paying for those circuits even though you are nowhere near fully utilizing them.  The Dell blade enclosures as an example require two three phase 30 amp circuits for full power (though depending on the server configurations you put in them you can get away with dual 30 amp 208v circuits).

Think about the end of the life-cycle

You can’t turn off the power to a blade enclosure until the last server in that enclosure is decommissioned.  You also need to maintain support and maintenance contracts on the SAN switches, network switches, and enclosure until the last server is no longer mission critical.

When are blades the right tools for the job?

  • When your operational costs of operations and maintenance personnel far outweigh the cost inefficiencies of blades.
  • When you are buying enough servers that you can purchase *full* blade enclosures that have similar connectivity and redundancy requirements (i.e. each needs two 1 gig network ports and two 4 gig SAN connections).
  • When you absolutely need the highest density of servers offered (note that most datacenters in operation today can’t handle the density of power required and heat that blades can put out).

An example of a good use of blades would be a huge Citrix farm, or VMWare farms, or in some cases webserver farms (though I would argue very large web farms that can scale out easily should be on some of the cheapest hardware you can buy which typically does not include blades).

Another good example would be compute farms (say even lucene cache engines) – as long as you have enough nodes to be able to fill enclosures with machines that have the same connectivity and redundancy requirements.

Conclusion

While blades can be great solutions, they need to be implemented in the right environments for the right reasons.  It may indeed be the case that the savings in operational costs of employees to setup, manage, and maintain your servers far outweighs all of the points raised above, but it is important to factor all of these into your purchase decision.

As always, if you have any feedback or comments, please post below or feel free to shoot me an email.

-Eric

Categories: Cisco, Dell, HP, IBM, Network, Sun, Systems Tags:

Sun SPARC Ultra 25 Boot Fails at Probing I/O buses

May 27th, 2009 3 comments

So I have burned *way* too many hours on and off over the last couple weeks trying to get an Ultra 25 Sparc box I inherited working.  This box came to me with the video card not in the machine and some comment about it not working.

After putting the video card back in the box, when I booted the box it would not give any video output to the monitor.  I hooked into the serial console (9600-8-N-1 of course) and it appeared to be hanging with the last output being: Probing I/O buses

reset reason: 0000.0000.0000.0004
@(#)OBP 4.25.9 2007/08/23 14:17 Sun Ultra 25 Workstation
Clearing TLBs
Power-On Reset
Membase: 0000.0000.0000.0000
MemSize: 0000.0000.0004.0000
Init CPU arrays Done
Init E$ tags Done
Setup TLB (small-footprint mode) Done
MMUs ON
Init Fire JBUS Control Register… 
Find dropin, Copying Done, Size 0000.0000.0000.7260
PC = 0000.07ff.f000.6178
PC = 0000.0000.0000.6228
Find dropin, Copying Done, Size 0000.0000.0001.1440
Diagnostic console initialized
Configuring system memory & CPU(s)

CPU 0 Memory Configuration: Valid
CPU 0 Bank 0 1024 MB Bank 1 <empty> Bank 2 1024 MB Bank 3 <empty>

reset reason: 0000.0000.0000.0005
@(#)OBP 4.25.9 2007/08/23 14:17 Sun Ultra 25 Workstation
Clearing TLBs
Loading Configuration

Membase: 0000.0002.0000.0000
MemSize: 0000.0000.4000.0000
Init CPU arrays Done
Init E$ tags Done
Setup TLB Done
MMUs ON
Init Fire JBUS Control Register… 
Block Scrubbing Done
Find dropin, Copying Done, Size 0000.0000.0000.7260
PC = 0000.07ff.f000.6178
PC = 0000.0000.0000.6228
Find dropin, (copied), Decompressing Done, Size 0000.0000.0006.4530
Diagnostic console initialized
System Reset: CPU Reset (SPOR)
Probing system devices
jbus at 0,0 SUNW,UltraSPARC-IIIi (1336 MHz @ 8:1, 1 MB) memory-controller
jbus at 1,0 Nothing there
jbus at 1c,0 Nothing there
jbus at 1d,0 Nothing there
jbus at 1e,0 pci
jbus at 1f,0 pci
Loading Support Packages: kbd-translator obp-tftp SUNW,i2c-ram-device SUNW,fru-device SUNW,asr
Loading onboard drivers: ebus i2c i2c i2c ppm
/ebus@1f,464000: flashprom rtc serial serial env-monitor i2c power
/ebus@1f,464000/i2c@3,80: gpio temperature temperature temperature front-io-fru-prom sas-backplane-fru-prom dimm-spd psu-fru-prom hardware-monitor
/i2c@1f,520000: dimm-spd dimm-spd dimm-spd dimm-spd
/i2c@1f,530000: motherboard-fru-prom gpio clock-generator
/i2c@1f,462020: nvram idprom
Probing memory
CPU 0 Bank 0 base          0 size 1024 MB
CPU 0 Bank 2 base  200000000 size 1024 MB
Probing I/O buses

Based on the fact that it stopped working at “Probing I/O buses” and there was the possibility of an issue with the video card, I tried removing the card and booting headless.  In this configuration the system came up fine, with access from the serial port.

I eventually discovered that the issue was an impacted pin in the external dongle that splits the high density dual-dvi port in to two separate DVI ports.  The important note here for anybody searching for this issue is that when you have a video card in the machine, the last thing you will see on the serial console is Probing I/O buses since once it finds the video card, all future output is redirected to the video console.  So if you don’t get any output on the screen make sure to double check your video dongle, cables, and monitor!

Also, another unexpected behavior I ran into while troubleshooting- If I left the USB keyboard hooked to the machine while booting, it will assign that as the input device, and it won’t accept input on the serial console, even though that is where all output is going!  It is very odd typing on a keyboard and having your output go to a serial console…

-Eric

Categories: Sun Tags: