On Monday the cybersecurity firms ReversingLabs and Sophos joined forces in order to release the first-ever production-scale dataset of malware research to be available to the public. They released the dataset in a bid to drive industry-wide improvements in security detection, as well as build defences against attacks.
The dataset is called SoReL-20M, which is short for Sophos-ReversingLabs – 20 Million. The dataset contains labels, metadata, and features for 20 million Windows Portable Executable files- this also includes 10 million disarmed malware samples. Sophos and ReversingLabs are hoping that by releasing this data to the public they will be able to create machine-learning approaches which will be able to detect malware far better than we can now.