How we manage skills @ Beamery - Part II

The disaggregation of skill types….

Henri Egle Sorotos
Beamery Hacking Talent

--

Photo by Tim Mossholder on Unsplash

Modelling the learned abilities and attributes of a person is a fundamental part of solving key talent business problems facing enterprise. It’s something I get to do as part of my job in Edge — an in-house team of applied data scientists, software engineers, knowledge engineers and product people here at Beamery. As a team, we treat skills as first class citizens.

It’s because of this that the provenance of skills is considered more important than people, companies or experiences in the Beamery Talent Graph. If you haven’t read the first part of this blog series discussing the representation of skills in a knowledge graph already, do so before reading on! There I discuss a) why modelling skills is so important and b) what we can learn from other open and closed source skills ontologies that have come before. It can be found here.

Recall the first part of this two-part blog, we asserted that Beamery should model skills in a manner that ensures:

  1. multiple distinct linked skills concepts
  2. a high volume of accurate instances populating these concepts

Makes sense, right? Well, obviously this is a little easier said than done. Here I’m going to take you on a journey to show you how we’ve made this possible.

Skills as a single concept

Imagine a hiring manager called Alex. They have a new role in your team they need to fill — lucky them! But before advertising the position, Alex would draw up a job description with a set of required, and perhaps some nice-to-have criteria for the successful candidate. For example, take this excerpt of a current advert for a Senior Software Engineer at Beamery:

  • Strong experience building and operating backend systems in complex environments.
  • Expertise with Javascript/Typescript and Node.js and exposure to MongoDB.
  • Ensuring quality and reliability through unit, integration and end-to-end tests.
  • A basic understanding of caching strategies for backend services and message queues.
  • Exposure to DevOps, on-call, incident management and CI/CD pipelines.
  • Experience debugging and troubleshooting issues using instrumentation and logs.
  • Demonstrating ownership and initiative with experience driving team projects from conceptualization to successfully delivering business value.
  • An impact-focussed mindset aligned with agile principles.
  • A sense of empathy for users, teammates, colleagues and partners.
  • A genuine desire to help the team work better together with a healthy dose of fun.
  • Knowledge of business Spanish a plus

Whilst this is all written in long-form natural language, it’s possible for us to manually annotate this text to identify instances of skills. We will demonstrate this with just six of the bullet points from the advert:

Bear in mind that I’ve just eyeballed this data to annotate skills — this is by no means a scientific or reproducible approach. But it does serve to demonstrate a point: it’s possible to annotate natural language from the recruitment process with relevant skills. This would be particularly powerful if it could be done using a machine trained to do such a task. More on that later.

The same exercise could be done to other corpus of texts in the talent lifecycle: CVs, candidate profiles, employee data in a HRIS.

Elliot and Tash, two of the lovely recruiters I get to work with
Elliot and Tash (L+R respectively). They are two of the lovely in-house recruiters I get to work with at Beamery. They are my work fwends ❤

Now, let us return to Alex the imaginary hiring manager. They are still trying to fill the position Senior Software Engineer at Beamery. Imagine they are reviewing a series of CVs or candidate profiles that have been sourced by the recruitment team. What the hiring manager and recruiter are doing here is looking at the profiles to determine if the candidates have the instances of skills we have identified in the original job posting above. They are rationalising the profile into these instances of distinct skills.

The process of inferring skills from HR artefacts is beneficial for a number of reasons:

  • It’s a common language — as we discussed in the first blog in this series, having agreed language for skills is crucial. Canonical skill terms mean we can compare and contrast job descriptions, people, companies and many other HR artefacts from disparate sources.
  • Summarisation is useful — quick and simple skills labels can be used to simplify a HR professional’s life when reviewing candidates or specifying job descriptions.
  • It provides a language that can be read by machines — common skills terms are another feature that can be used to generate AI models.
  • Benefits of semantic web and knowledge graphs can be realised — modelling skills as an RDF standard ontology means we can reap the benefits of linked data.

Right now in our example, we are modelling skills as a single concept. If we think about a candidate and associated skills, the relationship could be visualised something like this in a pseudo-ontology:

Note this is abridged — I haven’t included all skills from the examples above.

The same could be done for the job description. Again, a heavily abridged version that doesn’t include all skills mentioned above:

But are all of these skills conceptually equivalent? Are all skills born equal? I don’t think so. Let’s take some of the skills from the above examples and try to deconstruct each of them:

  • MongoDB — proper noun. This is the label for a specific software product. MongoDB is a document store database provided by MongoDB Inc. We assume that by including this skill in a job advert or other entity, what we really mean is ‘the ability to operate/interact with MongoDB’.
  • Javascript — noun (once a proper noun). A label for the open source object-oriented programming language. It is one of many programming languages available. We assume that by including this skill in a job advert or other entity, what we really mean is ‘the ability to use Javascript’.
  • Spanish — noun. Natural language is very widely spoken across the world. We assume that by including this skill in a job advert or other entity, what we really mean is ‘the ability to speak/read/write/understand Spanish’.
  • Teamwork — abstract noun. A broad term which is ambiguous. Can be inferred differently in different contexts. We assume that by including this skill in a job advert or other entity, what we really mean is ‘possessing teamwork as an attribute or behaviour’.

Just taking these four examples, it’s clear that these are quite different types of skills. We have a product, a software tool, a natural language and an abstract instance of collective human behaviour. If we just represent all of these as a type of skill, it is correct, but it’s not particularly accurate. Think of it like describing Manchester United as a team of people, specifically they are a football team, or like treating all companies and education institutions as the same type of entity (forgive the football analogy).

In detail, it’s sub-optimal for these reasons:

  • It’s an aggregation — A single concept, skill, is an aggregated proxy concept for multiple other sub-classes of the concept ‘skill’.
  • End users lack nuance and richness — a user viewing skill labels cannot differentiate between different types of skill
  • Machines lack nuance and richness — AI models built using these skills terms are not benefitting from richer features that would be available if skills were conceptually disaggregated.

In our paradigm of people, skills, experiences, companies and educations in the Beamery Talent Graph, it’s possible for us to assign these instances of skills to more granular concepts which have clear definitions and defined relationships.

Skills as multiple concepts

As we discussed in part 1 of this blog series on skills, defining different types of skills is a right old fiddle! Few people agree on a common shared language, and everyone has an opinion. What’s the difference between a soft skill and a transversal attribute, ehhh? But, as discussed above, it’s important that we model skills as multiple distinct concepts. We stipulated that we require:

  • skills modelled as multiple related concepts, with clear definitions and relationships
  • the ability to add, remove and modify instances of these skills as we obtain more data and knowledge.
  • ideally, a model that isn’t proprietary or closed source
  • ideally a model that is already defined in semantic web standards, with linked concepts to other ontologies
  • simplicity is key. Anything we create must be understood by HR practitioners.

They say one of the best ways to design a new ontology is not design a new ontology at all, but see if anyone else has already done something you can adapt or borrow. The semantic web community, much like the wider software community, are a very caring and sharing bunch, so we were in luck.

First off we considered O*NET, ESCO and the Canadian Skills and Competencies Taxonomy. Whilst all are very interesting, there were various reasons we didn’t use any of these models as a starting point. O*NET and the Canadian Skills and Competencies Taxonomy are rather complex and had many different concepts for types of attribute and skill. This was always going to make sharing this work with HR practitioners difficult. See the diagram from the Canadian model:

Next, it became clear that tech and software skills were becoming so important in the Beamery Talent Graph that it was becoming a requirement for these types of skills to have their own multiple concepts. Again, this wasn’t something that these models could fulfil without modification.

But then, enter the Skills and Recruitment Ontology (SARO). This is a new model for skills created after the European Data Science Academy, part of Horizon 2020 funding programme. It is brilliantly simple, links into other well known ontologies, and has precise definitions for each of the concepts available. The full ontology with context to other models can be seen visually:

Note that there are links in the following open source ontologies:

The academics who authored the ontology provide the following definitions for the skill concepts:

  1. saro:JobSpecificSkill: representing technical skills related to a particular job role, further sub-classed into:
  • Product: competence using a particular product (e.g., “Hadoop”).
  • Topic: capability in a domain- and/or role-specific topic required to achieve an observable result (e.g., “Data Analytics”).
  • Tool: competence in the use of a tool specifically for carrying out technical tasks, e.g., a specific programming language (e.g., “Java”, “Python”).

2. saro:TransversalSkill: sector and occupation-independent skills foundational to personal development, often referred to as soft skills (e.g., “team-working”).

Now we don’t need to utilise the entire SARO ontology. That would be unnecessary at this stage, and evaluating the entire model was not in scope for this project. Initially, we just need to use the entities ‘saro:skill’, ‘saro:TransversalSkill’, ‘saro:JobSpecificSkill’, ‘saro:Product’, ‘saro:Topic’ and ‘saro:Tool’. We can visualise the ontology again, and isolate just the concepts we are interested in:

Notice how the transversal and job specific skills are sub-concepts of ‘saro:Skill’. Omitted from the diagram is ‘saro:Language’ — this concept is a sub-concept of ‘saro:TransversalSkill’.

In my opinion, one of the biggest issues with ontology design is that we spend too long theorising about them, and not enough time populating them with instances! Well, let’s put this into practice and try to put it to use. Recall the example we discussed earlier with Alex the hiring manager trying to fill a role at Beamery for Senior Software Engineer. Well, we can identify skills in the job advert again, but this time we will categorise the skills using the definitions from SARO:

We could also visualise these instances using nodes and edges as the SARO ontology is depicted above.

The Beamery Edge Approach

No ontology is perfect — semantic web ontologies are all slight compromises of reality. I think SARO is brilliant, and I’m extremely grateful to the authors, however, I would be lying if I thought SARO was perfect for our use case. It isn’t quite suitable, and it’s because of this that we ended up iterating a little on the design that was originally created. To test SARO, we stress tested the definitions against our existing ontology of skills. We found that:

  • Skills representing the ability to use and operate physical machinery, manufacturing apparatus and construction equipment were not adequately represented conceptually. Beamery works with multiple enterprise customers where this is a hard requirement.
  • Some of the definitions provided are not accurate enough for us to be confident instances are correctly categorised into the correct concepts. In particular, there is a lack of conceptual distinction between tools and products.

After extensive iteration, internal discussions, and lots of trial and error, we have agreed to proceed with a slightly revised version of SARO. Visualised as a pseudo-ontology again, it looks like this:

Full definitions, and an accompanying .ttl model are available for internal use. One day we hope to open source this.

Remaining agile to future and past changes is crucial to making this a success. Skills come and go, and new jobs appear as human innovation continues. As part of this cycle, on a regular cadence new unnormalised skills are identified and ingested into the Beamery Talent Graph. Whilst this happens, we check a couple of things:

  1. is there an existing canonical skill this unnormalised skill can be linked to? If not, do we need to make a new canonical skill?
  2. Over time, is the distribution of different skill concepts constant? If this distribution starts changing, do we need to reassess the skills concepts we currently use?

This is all part of maintaining a current and representative ontology.

But hold up! What about measuring how proficient someone is at a particular skill or attribute? How can this be measured? How can we use the magnitude of a skill attribute to inform our understanding? This is also something that we are modelling in the Beamery Talent Graph. Watch out for a future blog on this. I hope to continue this series on skills and semantic web.

Interested in joining our Engineering, Product & Design Team?

We’re looking for Data Scientists, Software Engineers Front & Back Mid/Senior, SRE Platform Engineers, Engineering Managers, Tech Leads, Product Operations/Managers/Designers across all levels and even more roles across London, Remote, USA & Berlin! to join us — apply here!

--

--