17 February 2012

Meraki’s resilient out-of-band cloud management (or how to combat the FUD Aerohive push out..)

We recently had a great discussion with the networking gurus from wireless field day about our cloud managed architecture, and how it works under the covers. There was a lot of interest in our out-of-band cloud management: which parts of the network require connectivity to Meraki’s cloud, how customer networks are affected during a WAN failure, and what engineering advancements went into our design. We thought we’d recap the conversation for all of our customers:
At a 10,000 foot level, communication between your network and Meraki’s cloud is for management and configuration data, so if your connection to the cloud is interrupted, your network continues to function and end users won’t notice a difference. All of the features that affect data flow continue uninterrupted. For example:
  • Users stay authenticated
  • New users can authenticate
  • Firewall policies continue to be enforced
  • Data encryption/decryption is maintained
  • Layer 7 traffic shaping rules continue to be enforced
  • Wireless mesh routing operates with full functionality
  • Users can roam between wireless APs
  • VPN tunnels (site to site, teleworker, and client VPN) continue to operate
  • RF features like Dynamic Frequency Selection (DFS) continue
  • Performance remains at 100%
How does Meraki’s out-of-band cloud management work? This functionality would not have been possible 10 years ago, but thanks to Moore’s Law and clever engineers at Meraki, we’ve packed enough computing power and memory on every wireless access point, Ethernet switch, and security appliance to do all of the required packet processing internally, without any back-and-forth communication with the cloud. The packet processing software is also very tight, optimized to run efficiently on Meraki devices (similar to how engineers at Apple and Google write advanced applications for iOS and Android devices.) For some features, such as wireless mesh routing, the Meraki devices even communicate between one another on your local network (bypassing the cloud) in order to configure and optimize.
The traffic separation looks something like this:
Meraki runs multiple datacenters around the world, and every customer network is served by at least three independent datacenters. So if a natural disaster were to take out a datacenter that served your network, we’d simply fail over to another datacenter in a different part of the world. All of the configuration data, historical logs, etc. are mirrored in near-real time (at most 60-second lag, typically much less), so in these unlikely events, everything is the way you left it.
Of course, if you lose connectivity to Meraki’s cloud (say because your ISP has an outage), you will temporarily be unable to access reports or make config changes. But if your network is anything like ours, if your WAN link goes down, you’re in fire-fighting mode, not tweaking your wireless config.


As an aside, if you’re looking for a cost-effective way to improve your WAN availability, check out our MX security appliances – they’ve got built-in WAN link balancing and failover, so you can run 2 WAN connections into your network (e.g. cable + DSL, and even 3G) and the MX will balance traffic between them. If one goes down it’ll simply move all traffic to the healthy connection. Turns out this approach can save cost too…

If you do suffer a WAN outage, there are a small hand-full of end-user facing features on our wireless products that are affected if your connection to the cloud is lost. These are all convenience features, most of which you don’t get with a traditional wireless LAN. If you like the convenience and can tollerate limited functionality in the rare event of a WAN outage, enjoy them! If you’d prefer that there is zero end-user impact if your WAN connection is interrupted, don’t enable them (and use the alternatives listed below instead.) Features that are impacted by WAN failures include:
  • Native Active Directory/LDAP integration (without RADIUS)
    This is a handy feature that allows users to authenticate against your AD/LDAP server without running RADIUS. This is super-easy to configure, and is a feature that isn’t available with traditional solutions like Cisco.

    This feature does require connectivity to the cloud, so if you want to integrate with AD or LDAP but not require cloud connectivity, simply use a traditional RADIUS configuration:
Fault Tolerant AD/LDAP Authentication using RADIUS
  • Meraki-hosted splash pages and captive portal
    Meraki hosts snazzy, mobile-friendly, and customizable splash pages that wireless users can click through (or sign on from) before accessing your network. Since these are hosted on Meraki’s servers, they are super-easy to deploy, without any additional infrastructure in your environment. Since they’re hosted by Meraki, they require WAN connectivity to function, but you can control how new user authentication will be handled in the event that you lose WAN connectivity:

    Controlling Splash Page Behavior in Disconnected Environment
  • Built-in anti-virus scan (aka NAC)
    While Meraki’s LAN-isolation firewall always ensures that untrusted clients cannot spread viruses or compromise your LAN, Meraki offers an extra layer of protection by optionally scanning clients for antivirus software before allowing them onto the network. If a client isn’t protected, they are placed in a quarantine, from which they can download AV software but can’t access any other parts of the network. This feature is unique to Meraki – no other wireless systems, cloud-managed or otherwise, offer it. We find that for many customers, a full-blown, dedicated NAC system is overkill (lots of configuration complexity, 5-6 figure price tag) but Meraki’s built-in solution offers 1-click peace of mind.

    If you run Meraki’s NAC and lose WAN connectivity, you can choose how the network will behave: allow clients on without a scan, or block clients until WAN connectivity is restored. Clients already on the network will be unaffected, and other access control features remain in place (firewall rules, identity-based group policies, etc.) Most of our customers didn’t run NAC at all before they deployed Meraki, so rare interruptions aren’t a major issue. But if antivirus scans during WAN outages are mission-critical, we recommend a dedicated NAC appliance (also be sure to host a downloadable antivirus package behind the firewall, since users won’t be able to go out onto the network if they fail the scan.)
  • Meraki-hosted RADIUS server
    Most enterprise (and even SMB) environments already have a RADIUS server – Microsoft Active Directory, LDAP, FreeRADIUS etc. The vast majority of our customers who use RADIUS authentication (i.e. 802.1x) authenticate against their own server, so that they have one central user database for email, calendaring, wireless LAN authentication, etc. However, Meraki also offers a cloud-hosted RADIUS server for lightweight use. This requires connectivity to Meraki, so if access during a WAN outage is mission-critical, those user accounts should reside on your internal directory server.
There’s a lot of detail about what is affected by loss of connectivity, but in the scheme of Meraki’s features, this is a short list. Our customers find in practice that Meraki’s out of band management significantly improves the reliability and resilience of their networks, combining the centralized management of controller-based systems with the fault-tolerance of a distributed architecture. If you’re already a customer, how has Meraki’s out-of-band architecture benefited your network? What else would you like to know about how Meraki works under the covers? Let us know!

No comments:

Post a Comment