Andela Feature Store

Designing an enterprise-level Machine Learning platform that enables Data Scientists & Machine Learning Engineers to build ML models at-scale.

Project Details

  • Product Design

    Primary Practice Area
  • 2022

    Year
  • Head of UX & Product Design

    My Role

About The Project

Andela sought to create a platform that enables Data Scientists & Machine Learning Engineers to build high-quality Machine Learning Models. With Andela Feature Store, Data Scientists & Machine Learning Engineers can create high-quality models, at a fast pace, and utilize existing big data to easily deploy reusable production-level machine learning systems. The goal is to deliver an enterprise-level data platform for machine learning that automates real-time decisions, at-scale, manages the complete feature lifecycle, and creates high-value models rapidly.

Background

The Feature Store product came to Andela via acquisition. When I arrived at Andela it had been tabled for quite a while, due to usability issues. But, the Head of our Data Practice came to me and said she wanted to use it internally, and strongly believed wecould monetize it, & sell to our clients as an accelerator.

The Challenge

Rearchitect Andela Feature Store, enabling the application to be scaled and deployed into full production across the enterprise, and sold as a product accelerator to Andela's clients.

The Team

  • 2 Product Designers
  • 1 Product Manager
  • 2 Engineers

Stakeholders

  • Head of Data Practice
  • VP Practices
  • Chief Revenue Officer, Chief Product Officer

What I Did

  • Research Strategy, Design Strategy, & Product Vision
  • UX Architecture & User Flow Design
  • Design System Integration & Iconography
  • Led Design Thinking Workshops
  • Co-Designed Data Visualization Library
  • Additional Design Execution (as noted in each section below)

The Design Process

Design Thinking

I typically use a combination of design thinking and design sprints. I try to be as flexible as possible in my process approach, so that the design team and I can be agile, and shift any time it's needed. For this project, I primarily leveraged design thinking.

The Users

Data Scientists & Machine Learning Engineers

Data Scientists & Machine Learning Engineers take massive amounts of data & create innovative, revenue-driving Machine Learning models.

Understanding User Problems & Limitations

Research Canvas

I start off every design project with a practical research canvas to establish a clear purpose, set clear goals & measures of evaluation, to help define the problem space, to clearly identify initial assumptions, to lay-out hard deliverables, and to choose the appropriate research methods.

For the initial research for Andela Feature Store:

• Benchmarking usability test of the existing MVP to understand the current usability issues, and establish a basis of comparison for all subsequent iterations.

• Contextual inquiry & user interviews to help contextualize the 'why' & 'how' behind the usability issues, and to uncover additional user pain points.

• Interviews with internal stakeholders to understand & align on the business objectives & product metrics. To help with initial alignment between the design team & the business, I leveraged a set of existing product metrics and KPI's, initially set by the business.

Synthesizing Research Data

Affinity Analysis

I led my team through our research activities, and then led a design thinking workshop where we affinitized the data from our research observations & findings. We evaluated the findings, first on a per-user basis, then organized the insights into several core themes.

Core Research Themes

Below, are the core research themes that shaped the re-design of the Feature Store Product.

Linearizing The User Flow

The process of deploying a machine learning model is linear, but the existing feature store product forced the users into a non-linear experience; users lose their cognitive footing, often. The redesigned experience needs to be linearized, and parity to their actual Machine Learning model deployment process needed to be established.

Maintaining The User’s Sense Of Place

Users have trouble maintaining their sense of place as they progress through the experience. The redesigned experience needs to help establish, and consistently reinforce the user's sense of place.

Designing For A Wide Variety Of Visualizations

Users rely on complex visualizations to monitor multiple aspects of the process. The redesigned Feature Store product needs an entirely new set of data visualization tools, designed specifically for the user's needs.

Augmenting The CLI With A Robust UI

Users want a more robust Web UI to augment the CLI. The redesigned Feature Store experience needs to enable users to be able to seamlessly switch between the UI & CLI.

Framing Key Research Insights

Research Artifacts: Personas, Journey Maps, & Empathy Maps

To bring-life to the core research themes, and drive user empathy within the team, I like to use Personas based on real users, Journey Maps, and Empathy Maps. The existing experience was riddled with friction points that caused users functional usability problems. These real-world problems were significant, & needed to remain consistently in-focus as the new experience was designed.
‍‍
Using high fidelity research assets helps keep the bar for designcraft set high, and helps maintain stakeholders interest when used in the context of reports and presentations.

Design System Integration

Applying Andela Epic Design System

At the time we started working on Andela Feature Store, I had already launched our new design system. However, When auditing the existing Feature Store experience, & taking into account the initial components I knew we needed to build it out, we had about 50% of what we needed from the existing Andela Epic Design System.

Expanding the Design System Library

50% is a lot on paper, but it was all in 5 core areas: the command line interface patterns, code snippet components, data visualization patterns, a new monospaced typeface, & an expanded color system.

When I looked at the upcoming product roadmap, I knew there were more machine learning products coming that would leverage a command line interface & code snippets, as well as additional big data products that could leverage a full library of visualizations.

I made the decision to approach these new components, patterns, and foundations, as full additions to the Epic Design System, as opposed to designing them from within the vacuum of a single product experience.

I chose PT Mono as the new monospaced typeface for the CLI & code snippets. I leveraged Andela's secondary brand palette, called 'productive', for the data visualization library. And finally, I worked with my lead product designer to create new code snippet components to deploy in the forthcoming command line interface design.

Key Problems To Solve

How might we enable users to benefit from both a command line interface, & a graphical user interface?

The CLI is integral to deploying Machine Learning models, but requires users to work through the experience non-linearly, & by memory. Finding the right balance between the CLI & GUI will enable users to work much more efficiently.

How might we establish a clear sense of place within a data-dense product?

To help users stay grounded and find their way within tables and code snippets, I created a set of UX features that establish a clear, reliable way to orient users around their location as they progress through the experience.

How might we create recognizable icons for abstract concepts?

In order to help users find their way around the product, I created icons to represent abstract concepts like transformations, features, sources, & datasets. These icons needed to read as authentic to data scientists, while being memorable enough to reliably represent the concepts in multiple locations throughout the product.

How might we simplify data flows and processing?

In order to help data scientists visualize the data pipelines and materializations (a type of processing), I designed an extensive data visualization library, including a customized data flow visualization which enables users to stay in context when there are hundreds to thousands of nodes displayed.

How might we drive efficiencies through memorable workflows?

Andela measures the success of the tool through a set of usability and velocity metrics, like number of features created, or volume of data processed. In order to help meaningfully impact these KPIs, I wanted to drive muscle memory to make the users, not just the machines, faster. In order to do this, all of the design elements, from workflows to icons, were designed in order to enable users to operate with more speed & efficiency.

Linearizing the User Flow

Based on the contextual inquiry and user interviews, I mapped out the mental model and process users actually use to build and deploy a machine learning model.

The moderated usability test enabled me to map out, & highlight the differences between the user's mental model & how the Feature Store experience forced them into a disparate process.

By mapping the two processes against each other,  I was able to design a new user flow that aligned the user's actual feature building process, with a new flow for the experience.

The original non-linear Feature Store user flow

Here's the original user flow which forced the user into a non-intuitive, circular progression through the experience.

The new, linearized Feature Store user flow

Here's the updated, linearized user flow, which sets the stage for an aligned experience that helps the user maintain their cognitive footing and sense of place as they work their way through the experience.

Establishing The Sense Of Place

Turning Abstract Concepts Into Icons

Once the user flow had been linearized and aligned, the next design challenge was to ensure users could establish and maintain their sense of place as they work their way through the feature store experience.

The first step to helping users establish their sense of place was to create icons that could serve as signposts for users as they work their way through the feature store experience.

Data scientists use Andela Feature Store to execute abstract mathematical and statistical concepts. So these icons needed to be accurate representations for these abstract concepts that read true to the users.

To make sure I was representing these concepts properly, I partnered with one of our data scientists, who was local to me. We met for coffee and had a co-design session to come up with 3D icons that work as visual metaphors for the abstract concepts.

The transformation example

One such abstract concept is a transformation. A transformation is a replacement that changes the shape of a distribution or relationship. The visual metaphor for transformation, seen above, is a glyph showing a transparent rectangle transforming into an opaque circle.

Reinforcing The User's Sense Of Place

Enabling users to maintain their cognitive footing

With the new icons designed and validated through testing, the next design challenge was to ensure users could maintain their sense of place in the new feature store experience.

I worked with my other designer to paper sketch and wireframe 4 initial ideation points: How to leverage the newly designed icons, incorporating breadcrumbs, improving the side navigation, and designing tab groups that support the workflows.

The Final Feature Workflow Design

To bring-to-life the 4 aforementioned ideation points, and seen in the above screenshot:

-We would bring in the custom icons, deployed in key areas of the new Feature Store experience, starting with the feature packages table. Deployed here, the icons help the user contextualize key details about the type and origin of the feature packages.

-Additionally, When a user clicks into a page for more detail, a panel pops out from the top of the page to ensure the context of the previous screen is kept in view. Breadcrumbs at the top of the screen reinforce this context, and also leverage the new icons, and help the user understand where they are within the experience, & take desired actions.

-We'd also deploy the icons on the side navigation to act as a signpost for each workflow. Additionally we'd add an active tab indicator and bold text to remind the user of their current workflow.

-Finally, within the individual workflows themselves, we'd use a tab group, again with bold text and an active tab indicator, to enable users to easily move through each successive step in the feature process.

Integrating The Command Line into the Web UI

The starting point

The next design challenge was to integrate the Command Line Interface into the Web UI. In the original experience, users used the native command line interface in their operating systems to execute commands (things like create new feature packages, create and view data visualizations, and create data sources, among many most other Feature Store operations).

Here's a screenshot (above) from the initial contextual inquiry. This shows how users were using the existing product.

Users had:
• One quadrant for the Feature Store Web UI.
• One quadrant for documentation.
• One quadrant for code editor/notepad.
• One quadrant for terminal/command line.

Users often had multiple command lines open, one for each workflow, and copied pieces of code from notepad to command line, & used the Web UI to view the end result of their processes. This was a core issue that led to many inefficiencies that needed to be resolved through a tight integration between the command line interface and Web UI.

Seamlessly switching between the UI & CLI

I worked with the other designer to come up with these two initial wireframes for the integrated command line interface experience.

First, we tried making the CLI accessible from the side navigation (seen above left). But, that limited the users to a global CLI. From research I knew users could benefit from workflow-specific CLI's, so we placed the CLI into each workflow (seen above right, in the Features workflow).

‍We also tried enabling the users to pop out & detach the CLI, but in testing, it led to the users popping out too many CLI instances, and we started seeing some of the same problems with the users getting lost as they worked through executing commands against different contexts or workflows in multiple windows.

The Final Command Line Interface Design

Here's the final design for the command line interface. The CLI is integrated into each workflow as mentioned above. (The context of the CLI in the features workflow is pictured above.)

CLI Contexts
A key additional feature was enabling the users to tab between the different workflow contexts. This enabled the users to seamlessly switch between, for instance, the features CLI and transformation CLI to be able to quickly execute transformations that immediately impact feature creation. And, if the user fully transitions to the transformations workflow, the existing operations persist.

Autocomplete
Additionally, my final (personal) contribution to the CLI was pushing for autocomplete functionality. As I learned more about how and why machine learning engineers used code editors and code notepads, I strongly believed the experience needed to leverage autocomplete to minimize and/or eliminate the user's dependency on additional applications.

The autocomplete functionality is global, & pulls code from every context & workflow, on a historical basis. In a vast majority of scenarios, this enables the users to easily recall stored code snippets, & eliminates the users need to copy and paste code snippets from a secondary notepad.

This enabled a 60% reduction in time to create a new Machine Learning feature.

Designing A Range Of Complex Visualizations

My overall goal with the new data visualization design was to both achieve better, more robust designs for the existing visualizations, & also to work with the users to uncover additional visualizations that could either add value, context, or impact to the users, helping them achieve new insights in their workflow.

Below are a few examples of the new visualizations we created for the Feature Store experience.

Cell Matrix Chart

Cohort Analysis

Scatter Chart

Wide Hexagon Chart

Bullet Chart

Prediction Chart

Correlation Chart

Radar Chart

Rhombus Matrix

Sankey Diagram

Distribution Plot

Heat Map

Visualizing Data Processing & Monitoring

Two key visualizations in the feature store experience are, Materializations (a type of data processing) and Performance Monitoring.

These two functions enable the users to successfully productionize a machine learning model, using the feature store.

Materialized Feature Data

The screen above shows the Materialized Feature Data visualization. This bar chart displays the scheduling of materialization over time, & sets the stage for the users to visualize the materialization's flow of data through the Andela Feature Store system.

Performance Monitoring

Then, shown above, is performance monitoring, where the user is able to monitor the variables that affect how data reliability, data freshness, and data accuracy affect or impact the materializations, and then take corrective actions, if needed.

Seamlessly switching between Materialization & Monitoring

In the original design, it wasn't possible for Data Scientists to switch between the materialization visualization & monitoring visualizations, they were exclusively built and launched through the command line, in two separate workflows.

The users needed the ability to seamlessly switch between materializations and monitoring, so they could view the critical functions that affect materialization processing.

The tabs enable seamless and quick movement between the two visualizations, without losing the context of the feature being built.

Additionally, I believed it would be a good idea to give the users a set of the most common visualizations by default, so they didn't have to build or select them every time they start building a new feature. Using the dropdowns, the users can quickly call upon different sets of visualizations or change out a single visualization.

Visualizing Highly Complex Data-Flows

The most critical visualization that ties together the users process from end-to-end is the data-flow diagram. It helps users visualize and contextualize the flow of data through the feature store pipeline, from end-to-end.

The initial solution: the sankey diagram

The initial data-flow visualization made use of a traditional sankey diagram. Upon testing, the diagram worked fine for up to 20 nodes, but as the amount of nodes started scaling up, the sankey diagram started breaking down.

We needed a solution that could handle nodes that scaled well into the hundreds.

Unblocking the design team

At this point, the design team was working hard, collaborating to create new design solutions for the data-flow visualization, but were unable to get past their initial block.

I put together a design thinking session, brought in the design team, the product manager, engineering team, & a few members from the data team to participate in a co-design session.

Together, we walked through a competitor analysis to see how other feature store applications had solved similar data-flow issues. Most had not, but a few tidbits of publicly available information helped me steer the initial direction. We looked at how Qwak and Amazon Sagemaker approached their data-flow diagram. This gave us a new direction from which to approach the data-flow challenge.

Once I had established the initial direction, I led exercises like design charettes and crazy 8's to build upon the new direction.

Since we had several  users co-designing with us, I was able to build a quick prototype during a break, and conduct a cursory test on the spot, with actual users, in order to validate the design direction.

Early Iterations

Here are a few early iterations of the new new data-flow diagram. We experimented with, & tested different styles of connector lines.

We also experimented with different ways to group the nodes by type, first leveraging a chip design to show when groups of nodes exceeded three.

The Final Design

Some users build workspaces that have hundreds of Data Sources, Feature Views, and Feature Services. The chip design turned out to not be a great way to show node groupings.

To manage nodes at this scale, I came up with an idea for a summary view that shows the node groupings, automatically, when zoomed out to 100% or more.  The user can then adjust the zoom level closer, to automatically pop out of the grouped summary view.

Bringing Life to the Data-Flow Visualization

Ater we tested the latest iteration of the data-flow diagram, users still felt the need for additional enahancements to contextualize how the data flows from end-to-end in the pipeline.

I came up with an idea to leverage subtle animations to illustrate how the data flows through the feature store pipeline.

Partnering closely with the Engineering team, it was clear, early on, that for large workspaces, animations needed to be disabled by default to improve browser performance. So the interface needed an easy way to toggle on and off the animations. To enable animations, the user clicks on the eye icon, animations begin, & the data-flow legend appears, showing the users what each animation means.

For example:

• When a Feature View has a materialization enabled, solid green animations will show where data is processed in batches & written to an Online feature store.

• A dotted green line animation shows where data is processed in streams & written to an online feature store.

• When a Feature Service has been configured, purple animations show offline retrieval flows.

Based on usability testing, these animations helped users understand & contextualize the full path of data through the system nearly twice as quickly as without the animations.

Lineage Tracing Through the Data-Flow Visualization

The last piece of the data flow diagram that enables users to understand complex data flow much quicker, is our approach to lineage tracing. The Dataflow diagram excels at helping the user understand lineage for every part of the feature pipeline.

When the user clicks on a node in the diagram, they are able to see all the dependencies upstream & downstream in the pipeline, & the nodes & connections that are non-dependencies are muted to the background.

This feature achieved a 70% reduction in the time it takes users to trace dependencies.

Outcomes

New Revenue
$25M
Andela Feature Store generated over $25M in new revenue through newly created data monetization models.
Reduction In Cognitive Load
75%
The rearchitected version of Andela Feature Store achieved a 75% reduction in cognitive load, as compared to the initial MVP.
Increase In Volume Of Data Processed
125%
The rearchitected version of Andela Feature Store enabled Data Scientists to process 125% more data volume per production cycle, as compared to the initial MVP.
Decrease In Time To Productionize ML Model
54%
The rearchitected version of Andela Feature Store reduced the time it takes to productionize a new ML Model by 54%, as compared to the initial MVP.
Increase In Number Of Features Created Per Cycle
89%
The rearchitected version of Andela Feature Store enabled Data Scientists to create 89% more features per production cycle, as compared to the initial MVP.

Have questions? Give me a shout!