
Data Transformation Pipeline Builder for AI Use Cases
This project was one of many I've worked on for IBM’s data lakehouse platform, watsonx.data. I was on a team of three: myself, another researcher, and our lead.
​
This research request was brought to us as a “small ask,” but we quickly realized the scope needed to be broadened. We drew from a variety of methods and worked fast to get feedback to the team as quickly as possible.
Table of Contents
Background
Business Problem
​In preparation for a new feature release, my team was tasked with evaluating a novel workflow to streamline the transformation of unstructured data (such as PDFs) for RAG use cases, particularly the development of AI chat agents.
​
This solution had two workflows that needed to be tested: 1) one for non-technical users who needed a simple, pre-built process to quickly transform data; and 2) one for technical users who would design these pre-built processes for the non-technical users.
Key Outcomes
-
Much-needed clarity around the target users and their needs
-
A deeper understanding of how technical users programmatically transform unstructured data so we can offer an improved experience through our UI-based solution
-
An understanding of where and why users feel this solution falls short; conceptual recommendations to address these concerns and better showcase the value of our solution
-
Design recommendations that the team was able to implement immediately to improve the usability, learnability, and organization of the designs
Methods
-
Persona Research
-
Concept Test
-
Contextual Inquiry ​
While the research request from the design team was only to run a concept test, I felt some of the research questions would be better answered through a contextual inquiry. My team lead was very receptive to this idea, so we wrote up two research plans and protocols. The concept test was intended to capture high-level impressions of the designs while the contextual inquiry would gather technical details about users’ current solution.
While drafting screeners for these studies, we realized our understanding of the users wasn’t as developed as we thought. The design team was generally aware that these solutions would be leveraged by different user groups, but no one could agree on the specific roles and responsibilities these users might have. To address this ambiguity, we conducted a quick proto-persona study.​
Timeline

Research
Persona Research
Data Sources
This rapid research drew from semi-structured interviews with 3 product managers. We supplemented this small sample with a reanalysis of data from 21 JTBD interviews done last year.
Our interviews with PM focused mainly on:
-
who they believed the target users to be, and
-
how they imagined these users would leverage our solution
The JTBD study focused on data lakehouse users and their responsibilities, which contained relevant demographic information and qualitative responses.
Analysis
We did a thematic analysis of the stakeholder interviews directly in our notes document, using a color-coded highlighting scheme to differentiate between themes. We then copied these themes into a Mural board where we’d imported Airtable notes from the JTBD study.
From there, we either 1) sorted these notes into the themes derived from the stakeholder interviews, or 2) generated novel themes. While doing this, we considered both within- and between-subject sentiments, collapsing redundancy within-subject to avoid overestimating theme size.
​

Affinity map of JTBD interview data in Mural; color corresponds with participant and column with theme. Images have been blurred to maintain confidently and protect company data.
This analysis generated a persona that outlined users’ primary and secondary responsibilities, goals and values, and main challenges; each category containing 3-10 of the most frequent themes. From these findings, we hypothesized that the technical users of this solution would most likely be data engineers.
​

A persona card taken from a slide deck we presented to several VPs and design managers.
However, data from and about non-technical users was too sparse to show a clear trend. We therefore opted to split the concept test into two phases, first recruiting only technical users to both evaluate the new solution and gather data on who they believed would use the non-technical solution.
Concept Test: Technical Users
Objectives
The purpose of this concept test was to:
-
Document first impressions of the concepts
-
Gather requirements and requests for design improvement
-
Develop a proto persona of non-technical users
Research Questions
These objectives mapped onto the following research questions:
-
Do participants see value in this solution?
-
Does this solution meet participants’ requirements?
-
Do participants feel the designs make sense?
-
What roles do these participants think would leverage the non-technical solution?
Participants
Our screener received nearly 100 responses. We selected 7 data engineers who we believed, based on their answers, would be most likely to use our solution.
​

Summary of each participant's role, experience, and tools they use.
Protocol
First, we split each workflow into task-specific sections to compartmentalize feedback. For each of these sections we walked participants through a hypothetical scenario, showing them each screen and explaining what the user would do at each step. After sharing some of the initial findings with design, they decided to iterate on a couple sub-flows, which we then incorporated as A/B tests.
​


Slides from our final presentation highlighting key screens for each section of the protocol.
We wanted to get technical users’ feedback on both the technical and non-technical solutions to generate hypotheses about non-technical users.
At the end of each section, we asked the participant a few open-ended questions and scale items. We took notes directly in an Airtable to accelerate data analysis and synthesis.
Analysis
We consolidated data from each participant after each session, collaborating on a brief summary of feedback per protocol section, supported with direct quotes. This kept the final analysis very quick, as we only had to synthesize themes from these 7 summaries. This was again done in a notes document using a color-coded highlighting scheme. I calculated means and 95% CIs for the quantitative data in Excel.
Concept Test: Non-Technical Users
Objectives
The purpose of the concept test with non-technical users was largely the same as that with technical users, however we were now interested in validating the non-technical user persona and taking a closer look at how the non-technical solution could be made even more accessible.
Participants
Using what we learned from the first concept test, we were able to quickly recruit 5 non-technical participants. These participants were mostly line of business users with marketing or administrative use cases, and had very little experience with more technical solutions.
​

Summary of each participant's role, experience, and tools they use.
Research Questions
​We again had a few general research questions designed to evaluate the concept overall:
-
Do participants see value in this solution?
-
Does this solution meet participants’ requirements?
-
Do participants feel the designs make sense?
But we also had a couple specific questions related to ease of use:
-
Do non-technical users understand the terminology used?
-
How can we best visualize the pre-built processes for them?
Protocol
This time we showed participants only the non-technical designs, as we felt they lacked the experience and knowledge to offer useful feedback about the technical designs. Otherwise the protocol was unchanged.
​

Key screens for each section of the non-technical workflow.
Analysis
We followed the same process for this analysis that we did the concept test with technical users. The only addition was a quick statistical analysis of quantitative variables to test for differences between technical and non-technical users. I conducted a mixed ANOVA which found no significant main or interaction effect.
​

While p-values were high and effect sizes small, we still called out non-technical users' lower scores to err on the side of caution and ensure that we adequately support every-day users to the best of our ability.
Contextual Inquiry
As we were wrapping up data collection from the concept test with technical users, we started running the contextual inquiry sessions.
Objectives
The purpose of these sessions was to supplement the concept test findings with a deeper understanding of technical users’ current workflow, ensuring that our UI-based solution truly met their requirements.
Research Questions
-
How do technical users currently ingest and transform unstructured data for their RAG use cases?
-
​Are these users interested in a UI-based solution?
-
How do they feel about their current programmatic approach? ​
Participants
We recruited 3 internal data engineers with experience ingesting, transforming, and leveraging unstructured data for RAG use cases.
Protocol
Before meeting with participants, we asked them to prepare a demonstration of their current workflow for unstructured data ingestion and transformation. Then during the session, they walked us through this demo, running pre-written code, troubleshooting errors, and testing their process.
Analysis
Again, we took notes directly in an Airtable base to speed up analysis and synthesis. We organized the table according to a pre-established framework for the ingestion and transformation process, with each step of this process as a separate row and each participant a separate column. This made it easy to slot data into the appropriate step as we observed participants. We also captured and annotated screenshots of their workflow.
​

Session notes by step (row), per participant (column) in Airtable.
Instead of creating summaries per participant, we instead summarized across participants for each step. We were then able to compare these aggregate steps against our UI-based solution to identify gaps between users’ requirements and offerings.
Findings
Artifacts
We created a variety of artifacts across these studies, the primary one being a slide deck. This is standard practice at IBM. Even in cases where I might prefer something brief like a one-pager, we convert findings to deck format to make the information more digestible for stakeholders.
​


Slides introducing stakeholders to our key insights and supporting evidence.
Other artifacts from these studies included Personas (usually in Figma), Mural boards, and Airtable bases. While our Personas are meant to be leveraged by anyone on the broader team, we maintain Mural boards and Airtable bases primarily for other researchers to reference or build on.
Sharing Knowledge
Presentations
We shared findings from these studies both in real time to closely involved stakeholders and at periodic “playbacks” to the larger team. We did two formal playbacks for this research: one during a standing PM and design meeting, the other on a designated call with several executives.
We usually create a few versions of our slide decks to be shared with different audiences. For example, when sharing this research with execs, we kept the presentation quick and high-level, focusing on big ideas that would be more likely to influence decision-making right now.
For the playback to PM and design, we spared no details, calling out every finding from general themes to one-off concerns about icon placement. For these more detailed presentations, we included a checklist per workflow section that outlined every design request and recommendation, ranked by importance.
Reception
These findings were well-received, especially by PM and design – a number of our recommendations have been implemented, offering users a better onboarding experience and improved observability.
There was disagreement among execs throughout the project, mostly around where to organize design efforts. At times, this made it difficult to feel that we had strong alignment with execs. To keep up with shifting objectives, we met regularly with stakeholders who could update us on these changes.
Reflection
This project revealed several opportunities to improve both my process and the team’s research operations.
Efficiency
The main problem slowing us down was data volume. The concept test protocol was particularly long, running the full hour every session. This increased time spent on analysis and delayed knowledge sharing. When resources are limited and deadlines are tight, the costs of collecting more data quickly outweigh the benefits. I think applying the MoSCoW method might be useful here; working with stakeholders to define exactly what is and isn't necessary, minimizing scope creep and protocol bloat.
Alignment
We encountered some friction during this research. Most of our blockers were consequences of the org structure. Issues like infrequent communication between teams, lack of alignment on goals, and limited knowledge sharing made it difficult to move as quickly as we would have liked. Even though we didn’t have control over the factors causing these blockers, we were able to buffer their impact by developing close relationships with stakeholders and being persistent when communicating with less responsive parties.
​​​
​​​​
All in all, I was very happy with this work. Even though this ask turned out to be more complex than anticipated, we quickly scoped the research and leveraged a variety of methodologies that offered the team a clear path forward.