Migrating CSM Serverfarms to Other Server VLANs

A coworker brought an interesting problem to me the other day.  He wanted to move a serverfarm from one server VLAN to another without taking an outage.  Since I didn’t want to have to come into the office late at night to do work, I decided to see what we could do.

It turned out to be pretty easy.  We tend to think of CSM VLANs as pairs – you have the client VLAN for the web servers where the vserver sits and the server VLAN where the serverfarm sits.  The CSM doesn’t know about these relationships; all it cares about is whether the servers are in a server VLAN, and we can use that to our advantage here.

CSCtd31622 - CSM, Cookies, and the year 2010

It seems that we have another piece of evidence that Cisco doesn’t like the CSM.  From what I’m able to creatively interpret, the software developers didn’t think anyone would be running the CSM for very long, so they set a variable that expires CSM-inserted cookies at 01:01:50GMT on 1 January 20101.  If you’re using cookies to make connections sticky, that means you may see some unexpected results; this shouldn’t affect the web servers’ cookies.

Using SSH to Run Commands on a Router or Switch

SSH is more than just a shell.  You can copy files from and to a server or piece of network gear with it.  You can use it to tunnel traffic.  Possibly my favorite, though, is to use SSH to run a command on a remote box without interacting with a shell.

One of my biggest pet peeves with IOS (or pretty much any Cisco OS) is the lack of complex filtering.  Let’s say I want to look at all the downed ports and interfaces on modules 3 and 6 of my 6509.  I can’t easily do that with command from the IOS, but, on my Linux box, I can use multiple grep commands to get exactly what I want really easily.  Let’s work through the example, shall we?

An Interesting Problem with Multiple DCs on a Stick

We talked about running multiple data centers on a stick back in August, which is where you have multiple logical pairs of client and server VLANs on a single CSM for different tiers or functions.  The big point of the article was that you had to do some fancy forwarding to get a server-initiated connection from one server VLAN to appear out the appropriate client VLAN.  Well, we ran into an interesting issue with the given solution.

CSM Probe Status of ???

I must be bored since I’m posting again.

A colleague asked me to change the failed value of a TCP probe today.  It was no big deal, but, when I looked to see the status of the change, I noticed interesting stati of the RIPs.

switch#sh mod csm 7 probe name TCP80-PROBE detail
probe           type    port  interval retries failed  open   receive
---------------------------------------------------------------------
TCP80-PROBE  tcp     80    20       3       120     10
Description: Quick fail recovery
recover = 3
real                  vserver         serverfarm      policy          status
------------------------------------------------------------------------------
192.168.1.45:80       VS01            FARM01        (default)       ???
192.168.1.44:80       VS01            FARM01        (default)       ???
192.168.1.43:80       VS01            FARM01        (default)       ???
192.168.1.42:80       VS01            FARM01        (default)       ???

It seems that when a change is made to a probe, the CSM discards the state of the probe and starts over.  If you catch it before the first probe is finished, you’ll get a status of “???".  I’m just picturing the CSM saying “Uhh…I…don’t…know”.

Fail Actions on CSM Serverfarms

I’ve talked about probes and stuff on the CSM, but I never mentioned what happens to the connections to a server that fails.  That is, if I’m connected to server A in a cluster and that server suddenly commits ritual seppuku, what happens to my connection through the CSM?

Remember how the CSM works?  You connect to the VIP, some state tables are updated, your packet’s destination IP is changed to a RIP, and the packet is forwarded.  The point I want to emphasize this time is the state table.  If you were to send another packet to the same VIP on the same port, the CSM would look in its state table and see that you’re already connected to a server and just forward you on over after a NAT.  What if that server has suddenly died?

Configuring Dedicated Trunks for the CSM

Did you catch the article on setting up fault tolerance on the CSM?  In that article, I mentioned that Cisco recommends a dedicated trunk for the FT VLAN if you have two HA CSMs in two chassis.  Discuss amongst yourselves while I drone on.

Why should you set up a dedicated trunk for this stuff?  The most obvious reason is to be sure that normal traffic doesn’t step on the syncing traffic.  Since we’re syncing state information as well as configuration, the frames need to arrive in a timely manner.  Any errors could potentially disrupt the FT process, which is bad.  You surely don’t want the primary to fail only to find out that the standby doesn’t have the complete or current config.

Using Probes on the CSM

There are three different ways that a CSM checks for the health of the servers – active probes, inband health checking, and inband HTTP monitoring.  Let’s talk about active probes.

Active probes (or just probes) typically send traffic to one of the RIPs of a serverfarm, do some stuff, and give a pass or fail grade.  If the probe fails a certain number of times in a row, that server is considered sick and taken out of the pool for use.  The CSM keeps checking the unhealthy until it passes a number of times in a row, at which point it is placed back in the pool for use.  Almost everything is configurable, of course, so let’s look at some of those settings.

Configuring Fault Tolerance on the CSM

Like (nearly) everything in the Cisco world, you can set up your CSM to fail over to another module when the primary dies a horrible death.  You can have two in the same chassis or even have them in separate chassis – the process is the same no matter how you have it set up.  Either way, you have a primary and a secondary module in fault tolerance (FT) mode.

Running Multiple Data Centers on a Stick with the CSM

That’s an awesome title, eh?  I’ve mentioned a router-on-a-stick before but not a data-center-on-a-stick (DCOAS).  This is one of those Cisco terms I ran across a while ago and is a group of servers sort of sticking out on their own behind a load balancer and/or firewall.  Connections to and from the server group go through a single spoke – kinda like stubby routing.  Here’s a pretty picture.

Backup Servers on the CSM

On the CSM, you can configure a vserver to use a main and backup serverfarm which is used if a serverfarm is toast.  If all the RIPs in the main farm are out-of-service, the CSM will start to treat the backup farm just as if it’s configured to be the main one.  Once one or more of the main farm RIPs have recovered, the CSM reverts back and uses those again.  “Give me an example when I’d use it!,” you say?  Since the CSM is made for HTTP connections, we’ll assume that you are using it for such.

Intro to Policies on the CSM

The CSM is pretty bad little box.  It not only watches layer 4 items like TCP connections, but also talks HTTP, which you can use to do some custom, or policy-based, load balancing.

Policies are the objects that make custom balancing work.  Like everything else (it seems) on the CSM, a policy is an object made up of other objects – maps and serverfarms.  A map matches patterns based on a number of things including the URL and HTTP header values, while the serverfarm directive tells where to send traffic that matches the map.  If, for example, you want to send all requests with “/admin” in the URL to a management server instead of the regular web servers, you can do it with a policy.

Getting Something Out of the CSM

My buddy told me that my site is the only place on the web with documentation on the Cisco Content Switching Module (CSM). I also noticed a few months ago that every TAC case I’ve opened on the CSM has been handled by the same guy. I seriously think that the only people in the world that really know about these things are me and him. Cool. I better get some more content up.

Monitoring the CSM with SNMP

I had an article a few weeks ago about the Cisco CSM, which is a load-balancer module for the 6500 series switches. This thing is a pretty good device, but monitoring the connections to each VIP and RIP is not very straightforward. If you have an SNMP monitoring system like Cacti or MRTG, you need to know the OID to monitor, but it doesn’t work like anything else in the world.

Getting Started with the Cisco CSM

Cisco’s Content Switching Module (CSM) is an application accelerator. Or is it an application networking service module? I hate those fancy buzzwords – it’s a load balancer. It’s a module for the 6500 series switches that lets you load balance services in any VLAN and can also be set up for high-availability. I could go on for a while about the features, but let’s keep it simple for now. A short tutorial, if you will.