Talent data is in a lot of places and a lot of formats. Typical medium to large companies have at least three of: HRMS; ATS; CRM; LMS; Testing; Video Interviewing; Compensation; warehouse; etc. That's a lot, but they also have spreadsheets, resumes, online communities and lots more. Then, there's copious talent data available from the internet. All these places contain useful data you should be using.
Requiring unique identifiers can be a show stopper. Most data connection methods need you to have a unique identifer on every record - and there is often not one. There are identifiers, but they aren't unique. We used to think that not having a matching field meant we couldn't connect the data - not any more.
Data gets stale. Older records tend to get written off, even in the ATS. Then those older records get left out, which is painful if you're trying to do any sort of longitudinal work.
Data gets left behind. It's not only staleness that causes us to lose historical data. It's also that in many projects we chose not to migrate data because it was too hard. So it's sitting there, in a file on a server that we can't use. Can we stop throwing away perfectly good data?
We can't access easily everything we need. Data problems aren't just for analytics! Practitioners need all the available data to make minor and major decisions every day. Right now they have to spend far too much time looking in many places to try to get that data - minutes per candidates, hours per talent question.
Unstructured data needs to be in play. Sometimes talent data is structured, like data from a relational database. Sometimes it's unstructured, like a blog post or a paragraph in a resume. Using unstructured data is much harder, so all too often we leave it out.
One size does not fit all. A lot of "HR warehouse" implementations enforce a universal dataset. That means one set of common fields for everyone, which is often limited. In reality, many of your sites, departments or countries have specialized data needs you should be trying to meet.
New data needs and opportunities emerge often. Whether it's new systems, emerging skils, external data sets, or a spreadsheet of useful values, we often get new sources of data. We need to be able to include them in our talent dataset asap.
None of this needs to be true any more. Technology, storage, processing power and algorithms have come a long, long way. Artificial Intelligence allows automated methods to overcome all these data problems, and more. Methods include supervised and unsupervised Machine Learning, correlation, disambiguation, topic modeling, LSA, Word2Vec, customer vector models, deep neural and semantics. This technology creates and manages a clean, usable dataset with ALL the talent data, in what is increasingly thought of as a data lake. No more silos!