Objective

This guide presents a complete workflow for exploring customer support interactions and generational trends in a real banking marketing dataset, supplemented with AI-powered reasoning enabled by an offline large language model (LLM) using Llama.cpp. The process encompasses data acquisition, cleaning, visualization, and the integration of natural language question-answering directly within a Jupyter notebook.

Learning Outcomes

  • Construction of an advanced Python environment for analytics and offline AI using conda and pip.
  • Download and preparation of an open-source banking dataset featuring explicit customer interaction channels.
  • Execution of quantitative analysis on generational differences and service touchpoints.
  • Integration of a local LLM for context-driven, response-based explanation of data insights.
  • Creation of an interactive notebook featuring reproducible code and exploration prompts.

Dataset Overview

The UCI ML “Bank Marketing” dataset (ID: 222) is utilized, which contains:

  • A broad mix of demographic columns: age, job, marital status, education, etc.
  • Measurable interaction data: contact (telephone/cellular), campaign (number of contacts during campaign), previous (number of contacts before campaign), poutcome (outcome of previous campaign).
  • Customer response column: whether the client subscribed to a term deposit (y).

Prerequisites

  • Python 3.11+ via Anaconda.
  • A recent version of condapip, and a terminal (Anaconda Prompt).
  • At least 8GB RAM (16GB+ recommended for local LLM inference).

Step 1: Environment and Library Setup

Open the Anaconda Prompt and execute (not regular command prompt):

bashconda create -n bankingllm python=3.11
conda activate bankingllm
pip install pandas numpy matplotlib seaborn plotly requests ucimlrepo llama-cpp-python

This creates an isolated environment and installs all dependencies required for analytics and offline inference.

Step 2: Download and Place the Llama GGUF Model

  • Download a quantized GGUF file such as llama-2-7b-chat.Q4_K_M.gguf from TheBloke/Llama-2-7B-Chat-GGUF on HuggingFace.
  • Move the file into a directory named models located in the same folder as the notebook, for instance:textC:\Users\username\Documents\bankingllm\models\llama-2-7b-chat.Q4_K_M.gguf

Step 3: Jupyter Notebook Analysis and AI Reasoning Workflow

# Print the Python interpreter and version to ensure compatibility and reproducibility.
import sys
print(sys.executable)
print(sys.version)
# Install and import all necessary libraries for data handling, visualization, and LLM integration.
# Installation is only required for first-time setup.

# !pip install pandas numpy matplotlib seaborn plotly ucimlrepo requests llama-cpp-python

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

# Attempt to import the dataset fetcher and LLM interface, prompting installation if missing.
try:
    from ucimlrepo import fetch_ucirepo
except ImportError:
    print("Install ucimlrepo with: pip install ucimlrepo")
try:
    from llama_cpp import Llama
except ImportError:
    print("Install llama-cpp-python with: pip install llama-cpp-python")

%matplotlib inline
# Download the official UCI Bank Marketing dataset and prepare it for analysis.
# The dataset is split into features (X) and target (y).
# All column names are stripped of spaces for consistent access.
try:
    bank_marketing = fetch_ucirepo(id=222)
    X = bank_marketing.data.features
    y = bank_marketing.data.targets
    print(f"Features shape: {X.shape}, Target shape: {y.shape}")
    bank_df = pd.concat([X, y], axis=1)
    bank_df.columns = bank_df.columns.str.strip().str.replace(" ", "_")
    display(bank_df.head())
except Exception as e:
    print(f"Error loading dataset: {e}")
# Examine missing values, datatypes, and generate descriptive statistics for all dataset columns.
try:
    print("Null values per column:")
    print(bank_df.isnull().sum())
    print("\nColumn types:")
    print(bank_df.dtypes)
    print("\nStatistical summary:")
    display(bank_df.describe(include='all'))
except Exception as e:
    print(f"Data cleaning/inspection error: {e}")
python# Add a generational cohort column to enable age-based segmentation (e.g., Gen Z, Millennial, Gen X, Boomer, Silent).
def assign_generation(age):
    try:
        age = int(age)
        if age < 25:
            return "Gen Z"
        elif age < 40:
            return "Millennial"
        elif age < 55:
            return "Gen X"
        elif age < 75:
            return "Boomer"
        else:
            return "Silent"
    except:
        return "Unknown"

try:
    bank_df['generation'] = bank_df['age'].apply(assign_generation)
    print("Generation column created.")
except Exception as e:
    print(f"Error creating generation column: {e}")

# Display available support or sales contact channels in the dataset.
if 'contact' in bank_df.columns:
    print("Available contact channels:", bank_df['contact'].unique())
# Visualize the distribution of customer generations across the dataset.
try:
    sns.countplot(data=bank_df, x='generation', order=bank_df['generation'].value_counts().index)
    plt.title('Customer Base by Generation')
    plt.show()
except Exception as e:
    print(f"Error plotting generation distribution: {e}")

# Generate a table comparing generational cohorts with contact channel preference or usage.
try:
    gen_contact_table = pd.crosstab(bank_df['generation'], bank_df['contact'])
    print("Generation x Contact Table:")
    display(gen_contact_table)
except Exception as e:
    print(f"Gen vs Contact tabulation error: {e}")

# Display an interactive histogram showing contact channel usage per generation.
try:
    fig = px.histogram(bank_df, x='generation', color='contact', barmode='group',
                       title='Support/Contact Channels Used by Generation')
    fig.show()
except Exception as e:
    print(f"Channel by generation plot error: {e}")

# Pivot table to quantify the average number of contacts during and prior to the campaign per generation.
try:
    contact_activity = pd.pivot_table(
        bank_df,
        index='generation',
        values=['campaign', 'previous'],
        aggfunc=np.mean
    )
    print("\nMean contacts per campaign/previous by generation:")
    display(contact_activity)
except Exception as e:
    print(f"Error creating interaction pivot: {e}")
# Analyze interaction outcomes and durations by generation and channel.
# Summarize previous campaign outcomes by generation.
try:
    outcome_table = pd.crosstab(bank_df['generation'], bank_df['poutcome'])
    print("Generation vs Outcome of Previous Campaign:")
    display(outcome_table)
except Exception as e:
    print(f"Error in previous outcome table: {e}")

# Visualize interaction duration stratified by contact channel and generation.
try:
    fig2 = px.box(bank_df, x='generation', y='duration', color='contact',
                  title='Interaction Duration by Generation & Contact Channel')
    fig2.show()
except Exception as e:
    print(f"Error in duration by channel/gen plot: {e}")
# Set up and load the offline Llama.cpp LLM. Model weights must be placed in the specified directory.
MODEL_PATH = "./models/llama-2-7b-chat.Q4_K_M.gguf"
LLAMA_N_CTX = 4096
LLAMA_N_THREADS = 4

try:
    llm_reasoner = Llama(model_path=MODEL_PATH, n_ctx=LLAMA_N_CTX, n_threads=LLAMA_N_THREADS)
    print("Offline language model loaded successfully.")
except Exception as e:
    print(f"LLM Model error: {e}")

# Function to pass any user question about the dataset context to the local LLM and retrieve an answer.
def explain_with_llama_cpp(prompt):
    """
    Sends a prompt to Llama.cpp for context-based explanation. Returns the generated answer.
    """
    try:
        out = llm_reasoner(prompt, max_tokens=256, echo=False, stop=["[end]"])
        return out["choices"][0]["text"].strip()
    except Exception as e:
        return f"AI model error: {e}"

# Accept a question from the user about support channels, generational trends, or campaign outcomes.
user_prompt = input("Enter your data insight/question for the AI model (e.g., 'What support channel do Boomers prefer?'):\n")
if user_prompt.strip():
    explanation = explain_with_llama_cpp(user_prompt)
    print("AI explanation:\n", explanation)
else:
    print("No prompt entered.")
# Here is a list of example prompts that illustrate what types of natural language exploration can be performed.
"""
Example AI reasoning prompts:
- What support channel is the most popular among Gen X customers?
- Are customers contacted by cellular channels more responsive than by telephone?
- Which generational cohort demonstrates the highest average interaction duration?
- Does increased number of previous contacts improve conversion for Millennials?
- What strategies might improve response rates among older customers?
"""

Conclusion

This workflow demonstrates how to acquire, explore, and explain generational and support channel trends in bank marketing data. By augmenting traditional analytics with offline LLM-driven explanations, in-depth reasoning and dynamic business questioning become possible entirely within local compute resources.

Categorized in:

AI Projects,