Data scientists and information security practitioners have long operated in their own independent spheres of influence. When you look at their responsibilities, however, you begin to see they are more alike than different. A data scientist is someone who analyzes and interprets complex data to assist a business in its decision making, which is similar to an InfoSec practitioner who uses data to detect threats to an organization. Both take business data generated within the organization – whether blog hits or firewall logs – then analyze that data to generate an outcome that has a measurable business value. Though the inputs, analysis, and outcomes may differ, the steps are the same: Look at data, analyze that data, then derive business value from that data.
If we start treating InfoSec practitioners like the data scientists they are, the entire organization benefits. Here are the top five things InfoSec practitioners need to do their job as data scientists:
- Adopt the latest tools to remain nimble and flexible.
InfoSec practitioners need speed and flexibility because the enterprise is on the line. Customer data, trade secrets, financial data, and more are at risk if threats are not quickly analyzed and addressed. Data scientists have always had access to the latest and greatest when it comes to their toolbox. InfoSec practitioners also need these same cutting edge tools to stop imminent threats or quickly test new controls against historical data and would benefit greatly from access to AI/ML, Hadoop, NoSQL, Kafka, Kubernetes, and others. Fraudsters get to use whatever tools they want. InfoSec practitioners are at an inherent disadvantage if they are not armed with the most sophisticated tools when protecting the organization.
- Reduce time from idea to implementation.
When InfoSec practitioners have a great idea on how to protect the enterprise, time is the enemy. They don’t have 4-6 weeks to get internal IT approval then work with the ETL team to get new data just to go through the research process to prove whether the idea has value. And if the ETL team asks what value the idea has in order to prioritize the data, the InfoSec practitioner is not going to be able to provide an answer because they haven’t yet tested the data – a classic chicken and egg scenario. Then once the idea has been tested, it often takes too long to operationalize and implement.
From the initial spark of inspiration to proven outcome represents time to value, which an organization should be looking to compress as much as possible. InfoSec practitioners need a data platform that reduces that friction to bring the best ideas to fruition to protect the enterprise.
- Secure system by default.
For InfoSec practitioners to be nimble and effective, they need a platform that is secure by default and includes strong authentication, access controls, auditing, and high availability/redundancy. Perhaps most important is very strong security on the data itself. Successful InfoSec practitioners are those that have the ability to access production data that is secure and audited so that unvetted research data can be looked at alongside production data. Joining research data for queries without affecting the production data is key here. Though many think research data and production data shouldn’t be together, for an InfoSec practitioner this is vital to securing the enterprise, but only if it’s done securely.
- Adopt modern dev/ops practices.
Waiting days or weeks to deploy code due to slow internal processes puts an enterprise at risk. Security practitioners must respond to external threats quickly and cannot tolerate slow internal processes. Organizations must adopt modern dev/ops practices that benefit data scientists and InfoSec practitioners alike, including containers and orchestrators like Kubernetes, in combination with easy data access that is audited and controlled. Agile development, where tight-knit teams are deploying code as soon as it’s ready, is another example the rapid and flexible dev/op approach that needs to be made available to InfoSec practitioners so they can respond to threats in real time.
- Expand data retention.
Enterprises are often forced to make decisions on what data they keep and what they don’t based solely on cost. Deciding now what data you might need later is a recipe for future frustration. To save costs, InfoSec practitioners need a platform that allows both cloud and on-premises data retention and distinguishes between hot, cold, and frozen data. This flexibility allows better exploration and research on datasets, often leading to finding a diamond in the rough.
Moving Forward
Because threats move quickly, InfoSec practitioners need a robust platform that also moves fast. Though other data scientists may not have that same time pressures, they too can benefit from a data platform that provides all the features mentioned above.
If we embrace the idea that these two jobs are the same, if we start treating InfoSec practitioners like the data scientists they are, then we need only one data platform to meet the demands of both InfoSec practitioners and traditional data scientists – and having one robust data platform benefits the entire organization.
About the Author
John Omernik
As Distinguished Technologist at MapR, John Omernik brings an analytical approach to big data, utilizing modern tools to identify patterns to facilitate security program improvements and reduce risk to organizations. Prior to MapR, John was SVP Security Innovations at Bank of America. Previously, he was the lead for the Counter Threat Unit Data team at Dell SecureWorks and the VP of Big Data Analytics and Fraud Center of Excellence at Zions Bancorporation. John has an MS in Information Assurance from Norwich University and graduated cum laude from the University of Wisconsin-Stevens Point with a BS in Computer Information Systems.