Interpretability, Accuracy and Transparency: Using 'Grey-box' Models to Make Artificial Intelligence Intelligible


By Gül Kamışlı

6th July 2021

Interpretability, Accuracy and Transparency: Using 'Grey-box' Models to Make Artificial Intelligence Intelligible

As Artificial Intelligence (AI) based models and technologies have become increasingly prevalent in everyday aspects of our lives, through facial recognition systems or chatbots for example, how to understand and trust the technology has become a challenge. The ‘how’ and the ‘why’ of when an AI arrives at a specific decision - in other words, the ‘explainability’ of that decision - is now a priority of regulators in both the European Union (EU) and United States (US), and has given rise to the concept of explainable AI in recent years, or ‘XAI’, as it is often termed.

These kinds of AI models are distinguishable for the level of transparency in algorithmic decision-making that they can facilitate. ‘White-box’ models, for example, are machine learning (ML) algorithms which both domain experts and users can understand with relative ease. With ‘black-box’ models, by contrast, even experts are usually unable to interpret and explain the decision-making process in a meaningful way. Because such black-box models outperform humans in many tasks however, they continue to be inserted across various industries and sectors, such as healthcare and transportation. Going forward, the ability to ascertain the underlying reasons for the decision-making process will become increasingly important, and is plain to see in the requirements contained in the General Data Protection Regulation (GDPR) and the ‘right to explanation’, as stated in Recital 71(4) and Articles 13, 14, 15 and 22. The construction of AI models that produce transparent results is therefore critical to the future success of AI.

XAI comprises a set of features and processes which aim to convey how black-box models make their decisions. It is considered by many in the machine learning community to be a highly effective mechanism for helping to strike a healthy balance between model accuracy, bias mitigation and transparency. Such mechanisms are crucial to increasing trust and confidence in, and increased uptake of, such models. For example, disease diagnostics tools used to identify disease patterns, (e.g. an anomaly detection algorithm) can help physicians expedite their response to disease progression. In general, XAI methodologies should always strive to ensure AI algorithms answer essential questions such as, ‘why did the AI model end up with a specific outcome?’, ‘why did it not return another outcome?’ and ‘how much should you trust the AI model?’. The US Defence Advanced Research Project Agency (DARPA) characterises XAI as AI models with discernible and trustable algorithms that retain a high degree of prediction accuracy. For DARPA, motivated by the need to understand how semi autonomous systems work, it is critical to evaluate explanation features in line with human-in-the-loop strategy, in order to obtain user feedback, task performance, and perceived trustworthiness.

Similarly, at CaliberAI our ongoing goal is to produce trustworthy, explainable technology that can augment human decision-making. AI systems thrive on capacity and complexity, but continued adoption of AI models and tools depends upon reliability and making such complexity understandable. CaliberAI’s approach has been to create AI that is transparent, carefully inspected for bias and that generally enhances fairness in AI models. We do this through the use of ‘grey-box’ models, which combine ‘black-box’ levels of accuracy with ‘white-box’ levels of transparency.

Attention Weights: a Path to Explainability


AI currently achieves close to or better than human-level accuracy rates in a range of predictive tasks across multiple domains. Take neural networks (NN) in the field of deep learning, which is at the heart of the state-of-the-art ML domain. NNs, composed of mathematical and statistical functions, and inspired by biological brain networks, emulate human brain systems. The data is continually measured and refined, and the error rate greatly reduced by a process called backpropagation. In this way, NNs ‘learn’, for lack of a better word, through trial and error. Their strength and robustness come from thousands, if not millions, of interconnected neurons. This sheer number of neurons, combined with the vast amounts of data that are essential to develop and train such models, produces highly effective but conceptually complex algorithms. An effective way to trace the decision making process with such algorithms is the use of so-called ‘attention mechanisms’, an illustration of which can be seen here.

“Once the politician threw his running mate under the bus when allegations of corruption surfaced, it became clear that his only goal is to look after number one.”

Originally designed with machine translation tasks in mind, and as implied by the name, attention mechanisms direct an AI system’s focus and prompt the user to pay greater attention to specific segments of data. Recurrent NNs are a form of NN, whereby attention mechanisms are employed and the information is processed sequentially, while the output of the current input depends on the previous computation. Therefore, the architecture should have an internal memory to remember important information coming from the beginning of the sequence. For a natural language classification task, consider the sentence, “he is clearly a very damaged young man; hopefully he can get the help he needs before it is too late.” In order to predict whether the sentence is defamatory or not, the information that this person is a ‘damaged young man’ must be retained in order to make a final classification. As the gap between the significant part and the end of the sequence grows, RNN models, even ones designed specifically to handle this issue, require an additional mechanism whereby the model should indicate the importance of the phrases in numerical terms (this is often termed the ‘long term dependency problem’). In short, attention mechanisms help us locate the relevant information within sentences.

Attention mechanisms are yet another layer, and not a separate network, within NN architectures, and are responsible for identifying interdependence between input and outputs as well as providing explainability and interpretability of models. The mechanism is well established and has been crucial to many advances in Natural Language Processing, including popular tools such as GPT-3 by OpenAI and BERT by Google.

At CaliberAI, we take interpretability extremely seriously, providing an explanation and reasoning on why a particular classification decision was made by our models. Not only does this give us an opportunity to create an AI tool that is as close to an accountable white-box model as currently possible, it also helps us to strike a strong balance between alleviation of bias, enhancement of fairness and maximisation of accuracy, in our AI models.


Contact us

Get a closer look at how our solutions work and learn more about how CaliberAI's technology can integrate with your technology stack and editorial workflow.

Get in touch with sales@caliberai.net