ControlPlane

Router Control Plane
Control Planetroll

-->

ControlPlane is an important app for a large number of people, including myself. I still use ControlPlane on a daily basis but my usage of it is quite limited compared to what others have done with it. The bugs and Read more By Dustin Rue, 3 years. This work was originally developed as part of the DICE Control Plane Working Group. DICE is an acronym for Dante, Internet2, CANARIE, and ESnet. However, there are now a larger set of organizations involved in this work.

APPLIES TO: SQL API Cassandra API Gremlin API Table API Azure Cosmos DB API for MongoDB

Control Plane in Azure Cosmos DB is a RESTful service that enables you to perform a diverse set of operations on the Azure Cosmos account. It exposes a public resource model (for example: database, account) and various operations to the end users to perform actions on the resource model. The control plane operations include changes to the Azure Cosmos account or container. For example, operations such as create an Azure Cosmos account, add a region, update throughput, region failover, add a VNet etc. are some of the control plane operations. This article explains how to audit the control plane operations in Azure Cosmos DB. You can run the control plane operations on Azure Cosmos accounts by using Azure CLI, PowerShell or Azure portal, whereas for containers, use Azure CLI or PowerShell.

The following are some example scenarios where auditing control plane operations is helpful:

You want to get an alert when the firewall rules for your Azure Cosmos account are modified. The alert is required to find unauthorized modifications to rules that govern the network security of your Azure Cosmos account and take quick action.
You want to get an alert if a new region is added or removed from your Azure Cosmos account. Adding or removing regions has implications on billing and data sovereignty requirements. This alert will help you detect an accidental addition or removal of region on your account.
You want to get more details from the diagnostic logs on what has changed. For example, a VNet was changed.

Disable key based metadata write access

Before you audit the control plane operations in Azure Cosmos DB, disable the key-based metadata write access on your account. When key based metadata write access is disabled, clients connecting to the Azure Cosmos account through account keys are prevented from accessing the account. You can disable write access by setting the disableKeyBasedMetadataWriteAccess property to true. After you set this property, changes to any resource can happen from a user with the proper Azure role and credentials. To learn more on how to set this property, see the Preventing changes from SDKs article.

After the disableKeyBasedMetadataWriteAccess is turned on, if the SDK based clients run create or update operations, an error 'Operation 'POST' on resource 'ContainerNameorDatabaseName' is not allowed through Azure Cosmos DB endpoint is returned. You have to turn on access to such operations for your account, or perform the create/update operations through Azure Resource Manager, Azure CLI or Azure PowerShell. To switch back, set the disableKeyBasedMetadataWriteAccess to false by using Azure CLI as described in the Preventing changes from Cosmos SDK article. Make sure to change the value of disableKeyBasedMetadataWriteAccess to false instead of true.

Consider the following points when turning off the metadata write access:

Evaluate and ensure that your applications do not make metadata calls that change the above resources (For example, create collection, update throughput, …) by using the SDK or account keys.
Currently, the Azure portal uses account keys for metadata operations and hence these operations will be blocked. Alternatively, use the Azure CLI, SDKs, or Resource Manager template deployments to perform such operations.

Enable diagnostic logs for control plane operations

You can enable diagnostic logs for control plane operations by using the Azure portal. After enabling, the diagnostic logs will record the operation as a pair of start and complete events with relevant details. For example, the RegionFailoverStart and RegionFailoverComplete will complete the region failover event.

Use the following steps to enable logging on control plane operations:

Sign into Azure portal and navigate to your Azure Cosmos account.
Open the Diagnostic settings pane, provide a Name for the logs to create.
Select ControlPlaneRequests for log type and select the Send to Log Analytics option.

You can also store the logs in a storage account or stream to an event hub. This article shows how to send logs to log analytics and then query them. After you enable, it takes a few minutes for the diagnostic logs to take effect. All the control plane operations performed after that point can be tracked. The following screenshot shows how to enable control plane logs:

View the control plane operations

After you turn on logging, use the following steps to track down operations for a specific account:

Sign into Azure portal.
Open the Monitor tab from the left-hand navigation and then select the Logs pane. It opens a UI where you can easily run queries with that specific account in scope. Run the following query to view control plane logs:

The following screenshots capture logs when a consistency level is changed for an Azure Cosmos account:

The following screenshots capture logs when the keyspace or a table of a Cassandra account are created and when the throughput is updated. The control plane logs for create and update operations on the database and the container are logged separately as shown in the following screenshot:

Identify the identity associated to a specific operation

If you want to debug further, you can identify a specific operation in the Activity log by using the Activity ID or by the timestamp of the operation. Timestamp is used for some Resource Manager clients where the activity ID is not explicitly passed. The Activity log gives details about the identity with which the operation was initiated. The following screenshot shows how to use the activity ID and find the operations associated with it in the Activity log:

Control plane operations for Azure Cosmos account

The following are the control plane operations available at the account level. Most of the operations are tracked at account level. These operations are available as metrics in Azure monitor:

Region added
Region removed
Account deleted
Region failed over
Account created
Virtual network deleted
Account network settings updated
Account replication settings updated
Account keys updated
Account backup settings updated
Account diagnostic settings updated

Control plane operations for database or containers

The following are the control plane operations available at the database and container level. These operations are available as metrics in Azure monitor:

SQL Database Created
SQL Database Updated
SQL Database Throughput Updated
SQL Database Deleted
SQL Container Created
SQL Container Updated
SQL Container Throughput Updated
SQL Container Deleted
Cassandra Keyspace Created
Cassandra Keyspace Updated
Cassandra Keyspace Throughput Updated
Cassandra Keyspace Deleted
Cassandra Table Created
Cassandra Table Updated
Cassandra Table Throughput Updated
Cassandra Table Deleted
Gremlin Database Created
Gremlin Database Updated
Gremlin Database Throughput Updated
Gremlin Database Deleted
Gremlin Graph Created
Gremlin Graph Updated
Gremlin Graph Throughput Updated
Gremlin Graph Deleted
Mongo Database Created
Mongo Database Updated
Mongo Database Throughput Updated
Mongo Database Deleted
Mongo Collection Created
Mongo Collection Updated
Mongo Collection Throughput Updated
Mongo Collection Deleted
AzureTable Table Created
AzureTable Table Updated
AzureTable Table Throughput Updated
AzureTable Table Deleted

Diagnostic log operations

Router Control Plane

The following are the operation names in diagnostic logs for different operations:

RegionAddStart, RegionAddComplete
RegionRemoveStart, RegionRemoveComplete
AccountDeleteStart, AccountDeleteComplete
RegionFailoverStart, RegionFailoverComplete
AccountCreateStart, AccountCreateComplete
AccountUpdateStart, AccountUpdateComplete
VirtualNetworkDeleteStart, VirtualNetworkDeleteComplete
DiagnosticLogUpdateStart, DiagnosticLogUpdateComplete

For API-specific operations, the operation is named with the following format:

ApiKind + ApiKindResourceType + OperationType
ApiKind + ApiKindResourceType + 'Throughput' + operationType

Example

CassandraKeyspacesCreate
CassandraKeyspacesUpdate
CassandraKeyspacesThroughputUpdate
SqlContainersUpdate

The ResourceDetails property contains the entire resource body as a request payload and it contains all the properties requested to update

Diagnostic log queries for control plane operations

The following are some examples to get diagnostic logs for control plane operations:

Query to get the activityId and the caller who initiated the container delete operation:

Query to get index or ttl updates. You can then compare the output of this query with an earlier update to see the change in index or ttl.

output:

Next steps

Control-Plane Protection & Control-Plane Policing

Control Planetroll

It’s not very common to see people jump on the idea of configuring Control-Plane Policing/Protection, a part of me thinks people avoid this subject like the plague because they feel it causes more problems then it is worth. Well, let’s be honest if you have had to troubleshoot a CoPP or CPPr policy you know it is fun process. Especially since checking the control-plane is not usually the first thing everyone looks at and half the time issuing a ‘sh run’ is just not an option at first.

The first thing we should probably clear up is, why should we protect the control-plane what is this going to do for us. Well let’s consider a few things the control-plane does:

Handles packets that are not CEF switched, meaning the CPU has to take some time handle these packets.
1. This is more important than you think, if the CPU is getting bombarded with a large number of packets the CPU must handle each one individually & it is possible the CPU will get too busy it start dropping other traffic.
Maintains keep-alives for routing adjacencies.
Handle traffic directed at the device itself
1. SNMP/SSH, management traffic.

The control plane does a bit more then that but the three points above should get the point across.

The next thing I want to mention is how Control-Plane Protection (CPPr) differs from Control-Plane Policing (CoPP). Probably the main difference is the fact with CoPP you control access and limit access to the entire control-plane. This sounds good and simple but the control-plane is slightly more complex then that (go figure right). CPPr on the other hand allows us to control access to the individual control-plane sub-interfaces, providing us with more direct control. Here is a diagram from Cisco.com that lays out the control-plane:

As you can see from the above diagram, applying a control-plane policy (CoPP) applies an aggregate policer to all traffic destined for the CPU. Reaching out of that aggregate you can see three addition sub-interfaces of the control-plane:

Host – The host sub-interface handles traffic destined for the router or one of its own interfaces. IE: Mgmt related traffic and some routing protocol traffic. (EIGRP iBGP)
Transit – This sub-interface handles software switched IP traffic. (I think also ICMP unreachable/redirects but I need hammer away at the ‘transit sub-interface’ a but more in the lab)
CEF-Exception – This sub-interfaces typical handles non-IP related packets such as ARP, LDP, Layer 2 keepalives along with some routing protocol traffic. (OSPF eBGP)

Now, that we have an understanding of what the control-plane does for us, and the differences between CoPP vs CPPr let’s jump into some configurations. Luckily this configuration follows the framework of a typical QoS policy so if you familiar with the structure of the MQC you should be able to follow right along.

First we are going to create a few ACL’s to match our traffic:

Let’s put those ACL’s inside a few Class-Maps: (Only ACL’s, match ip DSCP/Precedence, & match protocol ARP are supported, hence why I did not do match protocol OSPF/BGP, if you do the command will get rejected upon trying to apply the service-policy to the control-plane)

Now, we reference the Class-Maps within a Policy-Map and define our actions: (I’d like to make note, with a CPPr Policy-Map you can only use the ‘police’ or ‘drop’ actions)

Finally we apply a service-policy referencing the Policy-Map we just created.

Now, that applies a CPPr policy to two different control-plane interfaces, if you simply want to perform CoPP you could do the following:

Apply the Policy-Map to aggregate control-plane

Notice the console message, that CoPP has been enabled on the aggregate path. The CoPP policy shown in the above two pictures just about accomplishes the same thing as our CPPr Policy (With a few exceptions I want you to point out).

A few links to CoPP and CPPr from Cisco.com.

Keeperhunter436