SLB NLP Analysis: Advanced Item Categorization for Cost Optimization

April 20, 2025

Overview

In this project, I developed a comprehensive NLP pipeline to analyze and categorize millions of Material & Supply entries from SLB's Reda Production procurement database. The analysis included sentiment analysis, topic modeling, document similarity detection, and automated tagging systems. By leveraging advanced machine learning techniques, this solution enabled efficient knowledge discovery and improved information retrieval across SLB's research teams.

Key Features

BERT-based Document Clustering: Engineered an unsupervised clustering pipeline using contextual embeddings to classify 1.2M+ multilingual material descriptions across SLB's global operations.
Dimensionality Reduction: Applied PCA and t-SNE techniques on high-dimensional text embeddings to visualize document relationships and identify procurement patterns.
Automated Classification: Developed K-Means clustering algorithms to extract the top 100 procurement-relevant categories, enabling supplier consolidation and cost-control strategies.
Interactive Visualizations: Created comprehensive data visualizations including word clouds, similarity matrices, and cluster projections for stakeholder presentations.

Technologies Used

Python: Core programming language with NumPy, Pandas, and Scikit-learn for data processing and analysis.
BERT & Transformers: For generating contextual embeddings and semantic understanding of technical documents.
Machine Learning: K-Means clustering, PCA, t-SNE for dimensionality reduction and pattern discovery.
Data Visualization: Matplotlib, Seaborn, and custom plotting libraries for creating insightful visualizations.

Challenges and Learnings

One of the biggest challenges was handling multilingual technical documentation with domain-specific terminology from the oil and gas industry. This required extensive preprocessing and custom tokenization strategies. Additionally, managing the computational complexity of processing over 1 million documents while maintaining clustering accuracy demanded careful optimization of algorithms and efficient memory management.

Outcome

Successfully processed over 1.2 million technical documents with high accuracy in classification tasks. The system identified key procurement patterns and enabled inventory optimization strategies that informed supplier consolidation decisions. This project reduced manual document categorization time by 80% and significantly improved knowledge discovery efficiency across SLB's research teams, demonstrating the practical value of NLP in industrial applications.

This project demonstrates advanced NLP capabilities and the ability to extract actionable business insights from large-scale technical documentation in the oil and gas industry.