Where Customers Come From
A visual read of the funnel, so the drop-offs are obvious without reading model metrics.
Restricted to shane@, nick@, and shanemccormick@thematchartist.com. Use the Google account that matches your TMA email.
Forecast of incoming payments based on customer plan classifications. Paythen $800/mo installments roll forward monthly until ~$3200 paid. Half-pay bookings (~$1400-$2000 down) have a balance due ~1 week after their HubSpot shoot date. Coaching revenue (Dani's date-coaching, tagged in Stripe) excluded.
Per-month mix of how new customers paid: full / half-pay / Paythen installment. Trend: Paythen + half-pay growth, full-pay decline through 2026.
Models tightened after a critical review pass. Most notable: Cox PH dropped sales_rep (post-customer field) → concordance fell honestly from 0.86 to 0.72. No-show classifier now trains on first-call-per-prospect only (removed per-email label duplication). Time-aware dedupes added. Win-prob no longer mislabeled as "leaky". Duplicate q_total_answer_words in lead-score features fixed. Booking-latency chart relabeled (it measures days-to-book, not rep response speed).
The 10 most actionable findings from all the models + data joins shipped to date. Ordered by impact. Each headline links to the full panel.
Things that go against conventional sales/marketing wisdom in our data.
Models that didn't work because of a data limitation. Surfaced so we know what to fix next.
Every predictive model trained on TMA data. AUC = a 0.5-to-1.0 score for ranking quality (0.5 = coin flip, 1.0 = perfect). Decile lift = how much better the top 10% of scored prospects do vs the average. Each row shows what it predicts, what the score means, and how to use it operationally.
What this is: Per-photographer refund rates, smoothed using a Beta-Binomial prior so a photographer with 1 refund out of 5 shoots doesn't look like "20% refunds" when one bad luck event would swing it back to 0%.
How to read it: The shrunk rate pulls small-n photographers toward the overall average. The 95% credible interval is wider when we have less data — wide intervals mean "we don't know yet."
How to act on it: Focus retention conversations on photographers whose entire credible interval sits above the org-wide rate, not just whose point estimate happens to be high.
What this is: BG/NBD is a probabilistic model that, given each customer's payment history (count + recency), estimates the probability they'll transact again in the next 365 days. Used for predicting repeat shoots / re-engagement targets.
Caveat: Stripe records each captured payment as a separate "transaction," so customers on multi-month payment plans inflate the repeat-rate. True repeat-shoots (4+ payments separated by months) are ~10% of customers — those are the genuine re-engagement targets in this list.
How to act on it: Pair the highest-P(alive) customers with the longest gap since last payment for the strongest re-engagement DM list.
What this is: Cox Proportional Hazards models the time from lead-creation to becoming a customer, accounting for prospects who haven't bought yet (right-censoring). Each line is a survival curve for a buyer segment — y-axis = probability the prospect has NOT bought yet by day-X.
How to read it: Lower curves close faster. Steeper drops mean a segment is converting quickly. The concordance (0-1, like AUC for survival models) is currently 0.72 — moderate ranking accuracy on time-to-buy.
How to act on it: Slow-decay segments are where extended nurture (SMS, retargeting) pays off; fast-decay segments warrant a same-week call.
What this is: HDBSCAN unsupervised clustering on the five Claude-judge scores (intent / urgency / financial readiness / dating sophistication / effort) — groups customers into natural archetypes without telling the algorithm what to look for.
How to read it: Each row is an archetype. Refund rate + average LTV are computed on customers in that cluster after the fact, so you can see which archetypes monetize best and which leak refunds.
How to act on it: Steer sales scripts toward the high-LTV / low-refund archetypes. The opposite archetypes need either better screening at booking time or a more careful expectation-set on the call.
What this is: For each rep, a 12-week rolling baseline of weekly call volume is computed. Any week that falls more than 2 standard deviations (σ) below that baseline is flagged as an anomaly.
How to act on it: A flag isn't proof of a problem — it's a prompt to check whether the rep was on PTO, switched roles, or actually slowed down.
What this is: Per-rep XGBoost classifiers. For each prospect, predict P(close | this rep). Recommend best rep at routing time. Per-rep training metrics + a historical analysis showing the close-rate lift if we'd actually routed everyone to the model's pick.
What this is: XGBoost classifier predicting P(refund) for paying customers before their shoot. Trained on historical purchase→shoot→refund flow. Use this to intervene 1 week pre-shoot (extra check-in, stylist call, payment-plan adjustment).
Top features: days between purchase & shoot, n_payments, sales rep, buyer_type, q_zero_dates_flag, app_none_flag.
Decile spread: top 10% = 46% refund rate; bottom 10% = 1.4%. 33× lift.
What this is: BERTopic on the full Typeform free-text answers for two questionnaires we send: the Consultation Q (filled by 3,062 leads after they book a sales call) and the Pre-Shoot Q (filled by 1,928 customers after they purchase, before the shoot). Topics are forced via K-Means clustering on sentence-transformer embeddings; stopwords filtered.
How to read it: Each topic is named by its top n-grams. The Δ pp column shows prevalence in the positive outcome (became customer / refunded) minus the negative outcome.
What this is: BERTopic on Dani+Marcos transcripts cross-referenced against the real HubSpot lifecycle_stage = customer label. Joined via local CSV first, then live HubSpot API by E.164 phone + email for everything else. 100% match rate.
Top close signal: The "zero, card, perfect" topic (credit-card collection language) appears in 40% of customer calls vs 7% of non-customer calls. That's the closing-moment language pattern.
What this is: BERTopic discovered themes across 206 Dani/Marcos call transcripts (voicemails excluded). Each topic is then cross-referenced with the per-call AI-summary outcome (customer_paid / interested / declined / etc) to surface which conversation themes correlate with closes.
How to read it: The Δpp column = (% of positive-outcome calls that hit this topic) − (% of negative-outcome calls that hit it). Bigger positive Δ = topic concentrates in wins.
Sample-size caveat: Only 13 calls have a hard-negative outcome (declined + no_show); 61 are positive (customer_paid + interested + follow_up_scheduled). Loss signals are statistically thin — read this as "what wins talk about", not "what loses talk about".
What this is: A separate no-show model trained on the full HubSpot contact record — phone area code, derived state from desired shoot location, time-of-day, day-of-week, every questionnaire question + flag, plus a boolean for whether the questionnaire was even filled out.
How to read it: Feature importance below ranks signals that distinguish attendees from cancels. The decile lift table shows how concentrated no-shows are in the lowest-scored bucket — the bottom decile is roughly 100% no-show, the top decile roughly 0%.
Important caveat: The strongest signals are data-availability signals (blank phone, unfilled questionnaire). The model is mostly learning "do we have data on this person yet?" — operationally, that's still a useful triage: the lowest-decile prospects are the safest to deprioritize.
What this is: SMS is TMA's actual follow-up channel post-call. This bucket the prospect's outbound/inbound SMS volume and looks at close rate per bucket. Price-mention detection is a separate, binary feature.
How to act on it: Reps should escalate texting volume in the early-engagement bucket — that's where lift is largest.
What this is: There is no explicit "refund reason" field in HubSpot, so this panel infers reasons from contact_status + shoot_status_export values among refunded customers vs not.
How to read it: Pre-shoot cancellations dominate. Post-shoot dissatisfaction is rarer but more painful per occurrence.
How to act on it: Most refund mitigation lives in the post-booking / pre-shoot window. Stylist confirmations and shoot-brief PDFs target this exact window.
What this is: Customer count, revenue, and refund rate by geography. Toggle City ↔ State to switch rollup level. Toggle Table / Bubble / Heatmap for visualization. Use the timeframe selector to spot recency trends in specific markets.
How to act on it: Expanding markets = recent surge + lower-than-average refund rate. Refund hot spots warrant a closer look at the local photographer + travel logistics.
What this is: For each acquisition month, what percent of those customers had 2+ / 3+ / 4+ subsequent Stripe payments. Most "2+" is just multi-payment plans (i.e., not retention). True retention is at the 4+ column — those are people who came back for another full shoot package months later.
How to act on it: The 4+ column is the real signal for repeat business; older cohorts give the cleanest read since they've had time to mature.
What this is: For each pending lead, we compute the cosine similarity between their LLM-judge score vector (5 dimensions) and the average vector of top-LTV customers. Higher = "this prospect looks like our best customers on the dimensions we score."
How to read it: Five features is a thin similarity space — treat as a directional prioritization signal, not a verdict.
How to act on it: Cherry-pick these for next-day callbacks before SMS-volume work.
What this is: Among prospects who eventually book, this charts how long they took to book vs whether they ultimately closed. Important: This is NOT speed-to-lead. It's days from lead-creation to first booked call.
How to read it: >4 weeks shows 73% close (n=377), <1hr only has n=3 — too small to compare. Read this as "patient bookers close more often", not "respond slower."
How to act on it: Don't slow follow-up; do invest in long-tail nurture (SMS, retargeting) since slow-bookers are higher-value when they finally book.
What this is: Per-URL clicks from Google + revenue attributed to leads that landed on each URL.
The headline: City pages convert 335× better per click than blog pages, despite blog driving 94% of organic traffic.
How to act on it: SEO investment should heavily skew toward city-page coverage, not blog volume. The top blog pages still bring leads, but $/click is dwarfed by city-page ROI.
Web traffic (Plausible), HubSpot funnel events, and Stripe payments — monthly, 2023-02 → present. The big story: traffic peaked in 2024 and the visitor → lead conversion has compressed.
This is the business story in one row: how many people become leads, how many book, how many show, and how many actually become customers.
A visual read of the funnel, so the drop-offs are obvious without reading model metrics.
How the booked-lead universe currently breaks into follow-up types.
The strongest positive and negative trigger families after stripping out leakage and noisy sales-process fields.
The buyer types that close best after they show up.
Question themes from the questionnaire. This is useful for understanding intent, not just scoring.
High-level traits that keep showing up among stronger buyers.
The most important friction patterns coming out of the queue logic, text analysis, and false-positive audits.
When leads tend to book better. This is safe, top-of-funnel timing, not downstream sales-process noise.
These are the clearest operational buckets right now. Each one comes with real names behind it in the explorer below.
The acquisition and behavior combinations that keep appearing around stronger leads.
Monthly cohorts help keep the story honest when recent leads have not had time to mature yet.
| Cohort | Leads | Booked | Q / Booked | Showed / Booked | Close / Showed |
|---|
Predicting net spend per paying customer at first-payment time. Useful for ranking — TMA's tier distribution is narrow, so this is package-tier prediction (basic vs premium), not whale-finding.
Customers ranked by predicted LTV, bucketed into 10 deciles. Each bar is the average actual net spend in that decile. The wider the spread, the more useful the model is for prioritization.
From all 2,121 paying customers. The top decile predicts $2,726 mean actual spend vs $1,856 in the bottom decile (1.47×).
| Name | Predicted | Actual | Decile | Buyer |
|---|
Joins the new Stripe customer data to the HubSpot funnel by email. Where does our $4.79M actually come from — and which segments quietly leak refunds?
Reset buyers (post-divorce / long-relationship) lead at $2,527 ARPC. Ready buyers, surprisingly, have the lowest ARPC AND highest refund rate.
The pre-call question they asked. Process / profile-help themes monetize highest; price-question theme still pays full tier ($2,260 ARPC).
Refund-rate spread across card brands is small but real. Missing-brand rows are likely ACH or alt-payment paths.
Where the money actually comes from. State and city ranking, refund hotspots, and the long tail of countries.
Net revenue per state (top 24). Bar length = revenue, dot = ARPC. Click a state to filter the Lead Explorer.
States ranked by refund rate (min n=15). Where customers regret the most.
Mostly US, but the international tail.
From Stripe card metadata at customer creation.
When booked-call activity happens, year-over-year traffic, cumulative revenue.
Each cell = number of strategy-session bookings in that hour-of-week slot. Reds darker where booking activity peaks.
Plausible monthly visitors overlaid by year. The 2024 → 2025 collapse is the headline.
Net revenue accumulating over the dataset window.
Refunded $ as a % of gross monthly revenue. Spikes flag bad-shoot months.
Booking rate around US holidays (top 12 windows by lift).
Stage-by-stage drop-off, monthly cohort close rates, time-from-lead-to-customer.
From the visitors who hit the site to the customers who paid. Bars scaled to the largest stage.
Monthly cohorts × close rate. Rows are creation months, color intensity is the % that became customers.
Days from lead creation → booked sales call, for customers only. Faster = stronger intent.
Days-since-creation for non-customers still in the funnel. Long tail = stale queue.
Honest AUC of the show / show_q / close / close_q models.
How spend, payment patterns, and customer ranks distribute.
Net spend per paying customer, 24 bins from $0 to $4,500. The bimodal shape is real (basic vs premium tier).
Each dot is a held-out validation customer. Diagonal = perfect prediction.
Live ranking from Stripe + LTV scores.
Of paying customers, how many made one payment vs spread across installments?
Where the $281K of refunds is concentrated. Use to prioritize delivery quality fixes.
From the refund-risk model. Low AUC, but the tail is the directional flag.
Which segment costs the most in refunds (absolute dollars).
Refund rate per Stripe card brand. Missing-brand bucket likely ACH or alt-pay.
Credit / debit / prepaid split among paying customers.
Buyer-type bubble, app preferences, goal types, self-description distribution.
Bubble size = n customers, x = ARPC, y = refund rate. Top-left = ideal segment (high ARPC, low refund).
From the pre-call questionnaire.
Hinge / Bumble / Tinder / multi-app split.
Marriage / serious / casual / unspecified.
Which pre-call question themes monetize best after the call.
From the Plausible export — what pages drive entrances, what events convert, where traffic comes from.
Most-entranced URLs across the full window. The Tinder/Hinge/city pages dominate top of funnel.
Tracked events. Submitted-Contact-Form is the dominant lead capture, "Booked A Call" the bottom-of-funnel.
Aggregated visitor source. Google + Direct dominate.
Predicting who will refund using only what's knowable at first payment.
LightGBM gain. The model still ranks customers — just don't trust the absolute probability when AUC is near random.
Customers who haven't refunded yet, ranked by model score. Useful as a triage queue, not as a calibrated probability.
| Name | Risk | Spend | Stage | Buyer |
|---|
Use this to move from the big picture into actual people. The default modes are intentionally plain-English.
The core model metrics and top feature summaries are still here, just moved out of the main operating view.
Full generated text reports behind the dashboard.
Per-photographer lifetime revenue, refund rate, and average net per customer. Filter by individual photographer + time range — all-time, presets, custom dates, or canonical year/quarter/month buckets.
Cities ranked by lifetime revenue and refund rate. Austin and San Francisco dominate volume; smaller markets (Minneapolis, Miami) show highest mean LTV.
Every strategy session, follow-up, and pre-shoot call pulled from each rep's Google Calendar back to 2022. Close attribution joins prospect emails captured on the calendar invite to the customer outcome in live HubSpot.
Per-rep customer count, revenue, and refund rate using the HubSpot Sales Rep field at the customer level. Nate Campbell handles ~70% of historical volume; Dani and Marcos appear once they've sourced customers in the selected timeframe.
Sales calls = strategy + follow-up only. Pre-shoot column is shown for context but it's operational (photographer prep with existing customers), not sales — Marcos's 107 pre-shoot calls are from his photographer role, not him selling. Close rate denominator = sales-attempt calls where the prospect email was on the invite (~15% of older calls; ~100% of newer Calendly-routed calls).
Three side-by-side rankings: total calls, attributed closes, and close rate. Honors the Rep + Range filters above. Bars scale to the leader in each column.
All reps (or filtered rep). Strategy calls vs total. Closed-to-customer overlaid where attribution exists.
One line per rep showing monthly strategy-call volume. Click a legend item to isolate a rep.
Heatmap of call activity by day-of-week × hour (America/Chicago local time). Darker = more calls. Honors the Rep filter. Useful for capacity planning + spotting underused hours.
For the selected rep (or all reps combined), the path from total calls → strategy/follow-up sales attempts → unique prospects → attributed closes.
Running total of closes per rep over time. Steeper slope = faster closing. Nate / Dani / Marcos only (Nick & Shane excluded — their old Calendly events didn't capture prospect emails so attribution is unreliable).
Zoomed view of the two newer reps, scaled to their volumes. Lets you see Dani vs Marcos head-to-head without Nate's volume flattening the chart.
Donut of strategy / follow-up / pre-shoot / other for the selected rep. Tells you whether a rep is mostly running discovery (strategy) or mostly nurture (follow-up).
Read before drawing conclusions.
Photographer roster, upcoming shoots, on-the-books sales calls, and travel availability — pulled from the photographer-booking Firestore (sales.thematchartist.co) on the last pipeline run. Updates nightly.
Currently bookable. Lifetime stats joined from the Stripe + Airtable + HubSpot data.
Per-photographer busy/free calendar pulled from their Google Calendars every 30min (via sales.thematchartist.co Firestore). Red = busy, green = open. Use this to book shoots without double-booking. Click a column header to filter.
Last 30 days + next 120 days. Filter by photographer; toggle calendar / map / table.
Current week schedule per rep, pulled live from sales.thematchartist.co Firestore.
Calendar of recent + upcoming sales calls. Host = sales rep running the call. Kind: strategy = initial discovery, follow-up = post-call decision.