Introducing the New Data Warehouse Security Model | Data at WashU

Analytics is the primary mission of our new data warehouse. To develop an analytics-focused data warehouse, we needed a security model that would not create security-based barriers between data domains for those engaged in analytics, while at the same time removing access to sensitive data for those that do not need to see it.

In the previous warehouse, which was optimized for operational reporting, warehouse data security was “tightly coupled” to the source system. This meant that for a report author to use human resources data, she would need to be given access to the HR application first, and then the security from the HR application would flow into the Human Resource datamart, allowing her access to that data in the warehouse. However, this model can prove limiting in an analytics-focused warehouse.

Another reason we needed a different security approach was that previously data security was accomplished using a hybrid approach, with Cognos Framework Models playing a key role alongside some security data stored in tables in the warehouse. While this did the job, it prevented us from opening up new, primarily self-serve reporting & analytics tools like Tableau or PowerBI (as well as from allowing users to query the data warehouse directly), since the primary security constructs were built into Cognos only.

Thus, we needed to try a different method of securing the new warehouse, and the approaches chosen were Role Based Access Control and Row Level Security to be implemented within the data warehouse itself.

What is Role Based Access Control (RBAC) and Row Level Security (RLS)?

RBAC is primarily an interplay between “Roles” and their “Entitlements” or specific fields in the warehouse. The methodology calls for the creation of Roles in the warehouse, and assigns read permissions on specific fields to those roles. Users are then assigned to these roles. Sensitive fields get their own RBAC Roles, and special permissions are required to become members of these “regulatory” roles.

RLS on the other hand, restricts users to see only a certain “slice” of a sensitive dataset. One way to imagine this is as an automatic filter or a “where clause” (for those familiar with SQL) that gets applied to any reads from a report (or data set) with sensitive data. This is again administered using specific RLS Roles and some of the internal Azure structures & processes that work together to show the appropriate data to the user.

Office of Data Governance plays a key role in deciding what data is considered sensitive, and who can see all sensitive data, or only some of it – for example, currently we have enabled RLS on Compensation data since it is deemed sensitive, and the restriction logic for compensation from Governance was that a reporting user, when allowed to see compensation, can only see the data of her own school.

Benefits of RBAC/RLS Approach:

“Loose coupling”: A loose coupling with source system security allows us to be expansive and creative – guided by source security, but not controlled or limited by it.
By bringing the security closer to the data layer, we remove the dependency on the Cognos (or other) applications to implement security constructs.
Using this approach, a user connected to any application accessing a particular dataset from the warehouse gets the same base dataset across all applications, including straight SQL queries.

In summary, we consciously decoupled warehouse data security from the source systems, and to ease the transition of the University into this new chapter of Analytics, introduced a flexible, modern and efficient data security plan that addresses the concerns of the University leadership without limiting broad access to the data.

See overview information about security in the Data Warehouse

Back to Data Blog