In responding to incidents there is one thing that stands out that I felt deserved a post and that is the topic of network taps and visibility. While some large companies often have the necessary resources (i.e. money, time, engineers, other tools which require visibility into network traffic, etc.) to install and maintain network traffic taps or link aggregators, the number of companies I run into without ANY sort of tap or aggregator infrastructure surprises me. While it depends on the type of incident you’re dealing with, it is quite often the case that you’re going to want, better yet need, a very good view of your network traffic down to the packet level.
If you’re not convinced imagine this scenario: During a routine review of some logs you see that you have traffic leaving your US organization which is going to an IP address that is located somewhere in Asia. It appears to be TCP/80 traffic originating from a host on your network, so you assume it is standard HTTP traffic. But then you remember that you have a web proxy installed and all users should be configured to send HTTP requests through the proxy…so what gives? At this point your only hope is to view the firewall logs (hopefully you have these enabled at the right level), or you can go out and image the host to see what sites it was hitting and why. But, if you had packet level inspection available a simple query for the destination and source address would confirm if this is simply a mis-configured end user system, a set of egress rules the firewall that were left behind that allow users to circumvent the proxy, or if it is C2 traffic to/from an infected host on your network.
Having taps, SPAN/mirror ports, or link aggregators in place PRIOR to an incident is the key to gaining visibility into your network traffic, even if you do not possess the monitoring tools today. It allows response organizations to “forklift” a crate of tools into your environment and gain access to the network traffic they need to begin the investigation. The main benefit of tapping your infrastructure prior to an incident is that you don’t need to go through an emergency change control at the start of the incident just to get these taps, SPAN, or aggregators installed. This is also technology that your network team may not be familiar with configuring, installing, or troubleshooting. So setting up your tapping infrastructure up front and being able to test it under non-stressful conditions is preferred. That being said, it is also important to remember that there are pros and cons on how you pre-deploy your solution, both in terms of technology and tap location.
A couple of questions should be answered up front when considering how to approach this topic:
Taps vs. Link Aggregators vs. SPAN/mirror
The simplest way to gain access to network traffic is to configure a switch, most likely one near your egress point, to SPAN or mirror traffic from one or more switch ports/VLANs to a monitor port/VLAN which can be connected to the network traffic monitoring tool(s). The downside of SPAN ports is that you can overwhelm both the port and/or the switch CPU depending on your hardware. If you send three 1G ports to a 1G SPAN port, and the three 1G links are all 50% saturated at peak, you will drop packets on the SPAN port shortly after you surpass the bandwidth of the 1G port (oversubscription). The safest way to use a SPAN in this case is to mirror a single 1G port to a 1G mirror port. Also consider how many SPAN or mirror ports are supported by your switching hardware. Some lower end model switches will only support a single mirror port due to hardware limitations (switch fabric, CPU, etc.), while more expensive will be able to support many more SPAN ports. I’m not going to get into local SPAN vs. RSPAN over L2 vs. ERSPAN over GRE…that is for the network engineers to figure out.
Passive and active taps can alleviate some of the issues with dropped packets on a switch SPAN as they sit in-line to the connection being tapped and operate at line speed. The drawback is they may present a single point of failure as you now have an in-line connection bridging what is most likely your organization’s connection to the rest of the world. Also, keep in mind that passive taps have two outputs, one for traffic in each direction so you’ll need to ensure the monitoring tools you have or plan to purchase can accept this dual input/half duplex arrangement. Active taps on the other hand are powered so you’ll want to ensure you have redundancy on the power supply.
The last type of tap isn’t really a tap at all, but a link aggregator which allows you to supply inputs from either active/passive taps or switch SPAN ports which are then aggregated and sent to the monitoring tool(s). The benefit of an aggregator is that is can accept multiple inputs and supply multiple monitoring tools. Some of the more expensive models also have software or hardware filtering, so you can send specific types of traffic to specific monitoring tools if that is required.
Last but not least are the connection types you’ll be dealing with. Most monitoring tools mentioned in this post accept 1G copper up to 10G fiber inputs, depending on the tool and model. You also need to make sure your taps and/or aggregators have the correct types of inputs and outputs that will be required to monitor your network. If you’re tapping the egress point chances are you’re dealing with a 1G copper connection, as most of us rarely have a need for more than 1G of internet bandwidth. If you’re tapping somewhere inside your network you may be dealing with 1G, 10G, or fiber connections or a combination (i.e. 10/100/1000 Base-T, RJ-45 Ethernet, 1000 Base-Sx/Lx/Zx Ethernet multimode or singlemode), so keep this in mind as you specify your tapping equipment.
Location – Outside, Inside, DMZ, Pre or Post Proxy? What About VM Switches?
Next is the issue of location of the network tap and the answer to this really depends on what level of visibility you require. At a minimum I’d want to tap the ingress/egress points for my network, that is any connection between my organization and the rest of the world. But that doesn’t quite answer the question as I still have options such as outside the firewall, directly inside the firewall (my internal segment), or just after my web proxy or IPS (assumes in-line) or inside the proxy.
There are some benefits and drawbacks to each of these options; however I’m most often interested in traffic going between my systems and the outside world. The answer mainly depends on your network setup and the tools you have (or will have) at your disposal. If you tap outside the firewall then you can see all traffic, both traffic which is allowed and that which may be filtered (inbound) by the firewall. The drawback is both noise and the fact that everything appears to originate from the public IP address space we have as I’m assuming the use of NAT, overload NAT, PAT, etc. is in use in 99% of configurations. The next point to consider is just inside the firewall; however that depends on where you consider the inside to be. If we call it the inside interface (that which our end users connect through) then I will gain visibility into traffic pre-NAT which shows me the end-user’s IP address, assuming an in-line (explicit) proxy is not being used which would then make all web or other traffic routed through the proxy to appear to originate from the proxy itself. Not forgetting the DMZ, we may also tap our traffic as it leaves the DMZ segment as well through a tap or SPAN as that will allow for monitoring of egress/ingress traffic but not inter-DMZ traffic.
Pre or post-proxy taps need to be considered based on a few factors as well. If it is relatively simple to track a session that is identified post-proxy back to the actual user or their system, and is it cheaper for me to tap post-proxy, then go for it. If we really need to see who originated the traffic, and what that traffic may look like prior to being filtered by a proxy, then we should consider tapping inside the proxy. In most situations I’d settle for a tap inside the proxy, just inside the firewall prior to NAT/PAT, and just prior to leaving the DMZ segment. To achieve this you may be looking at deploying multiple SPANs/taps and using a link aggregator to aggregate the monitored traffic per egress point.
Finally, what about all the virtual networking? Well, there are point solutions for this as well. Gigamon’s GigaVue-VM is an example of new software technology that is allowing integration with a virtual switch infrastructure. While this remains important if we need to monitor inter-VM traffic, all of these connections out of a VM server (i.e. ESXi) need to turn physical at some point and are subject to the older physical technologies mentioned above.
Limitations
This should be a standard section on encryption and how it may blind the monitoring tools. Some tools can deal with the fact that they “see” encrypted traffic on non-standard ports and report that as suspicious. Some don’t really care as they are looking at a set of C2 destinations and monitoring for traffic flows and amounts. If you’re worried about encryption during a response you probably should be…and if you’re really concerned consider looking into encryption breaking solutions (i.e. Netronome). Outside of the encryption limitation, after you deploy you tapping infrastructure your network diagrams should be updated (don’t care who does this, just get it done) to identify the location, ports, and type of component of your solution along with any limitations on traffic visibility. Knowing what you can’t see in some cases is almost as important as what you can see.
Final Thoughts
Find your egress points, understand the network architecture and traffic flow, decide where and how to tap, and deploy the tapping infrastructure prior to having a need to use it…even if you don’t plan on implementing the monitoring tools yourself. This is immensely beneficial to the incident responders in terms of gaining network visibility as quickly as possible. As time is of the essence in most responses, please don’t make them sit and wait for your network team to get an approval to implement a tap just to find out they put it in the wrong place or it needs to be reconfigured.
If this needs to be sold as an “operational” activity for the network team, tapping and monitoring the network has uncovered many mis-configured or sub-optimal network traffic flows. Everything from firewall rules which are too permissive to clear text traffic which was thought to be sent or received over encrypted channels. Something to keep in mind…who knows, if you ever get around to installing network-based DLP you’re already on your way as you’ll have tapped the network ahead of deployment.
Comments