The Project: TGE Logistics Real-Time Dispatch
- The Tech Stack: React (Frontend), Python FastAPI (Backend), MySQL (Persistence), WebSockets (Aqchatter Event Bus), Google Maps API (Visualization).
- The Problem: Tracking 250+ concurrent drivers with high-frequency (5s) telemetry from TED IoT devices.
- The Solution: A hybrid "Hydration + Stream" model where the UI is initialized from MySQL and updated via WebSockets.
Core Performance Hooks (Mention these!)
- Reconciliation: Updating state via Key-Value objects ($O(1)$ lookup) instead of arrays ($O(N)$ lookup).
- State Batching: Using a
useRefbuffer to group WebSocket updates. This reduces CPU usage by 70% compared to direct state updates. - Component Memoization: Using
React.memoto ensure only moving pins re-render, keeping map interactions smooth at 60FPS. - Stale Data Handling: Implementing a client-side heartbeat check to flag "ghost" drivers who stop sending data.
Scaling Strategy (The "FAANG" Answer)
- Ingestion: If scaling to 50k drivers, introduce Kafka as a buffer between the TED devices and the database to prevent write contention.
- Live Cache: Use Redis Geospatial indexes to serve "Live Dispatch" queries, keeping the heavy MySQL instance for historical Audit Reports and Manifest Generation.
- Edge Computing: Use regional WebSocket clusters to reduce latency for global fleets (Global Expansion).
TGE Experience
- Explain about the Real time system to show driver location
- I have implemented Google Maps API to show the 250 drivers for give state with their locations updated in realtime using the Webscoket connection which follows the Hybrid Ingestion Pattern(Reconciliation Code) The life cycle contains 3 phases:
- Initial State Hydration: On component mount, the React app performs a REST API fetch to retrieve the 'Last Known Location' and current status (e.g., active, on break) for 250+ online drivers from the MySQL database.
- WebSocket Handshake: Once the initial state is rendered, the app establishes a persistent WebSocket connection to the Aqchatter event bus.
- Real-Time State Reconciliation: As TED devices transmit telemetry every 5 seconds, Aqchatter broadcasts these events to the frontend. I designed the React state management to perform targeted updates—instead of re-rendering the entire map, I update only the specific driver object in the state array, which triggers a smooth position update of that specific pin on the Google Maps layer."
- I have implemented Google Maps API to show the 250 drivers for give state with their locations updated in realtime using the Webscoket connection which follows the Hybrid Ingestion Pattern(Reconciliation Code) The life cycle contains 3 phases:
[!IMP] Mention that you "batch" or "throttle" UI updates so the map only refreshes its pins once every 500ms, even if the data comes in faster. This saves CPU and battery for the operator.
- How do you prevent performance lag when updating 250 pins every 5 seconds?
- To maintain performance, I used
React.memofor individual Marker components to avoid re-renders of entire Map. Additionally, I ensured that I only update the specific coordinate data, keeping the map tiles static.
- To maintain performance, I used
- What happens if a driver’s GPS sends 'noisy' data and the pin jumps 5km away suddenly?
- That is a common telemetry issue. If I were to expand this, I would implement a client-side validation layer. I would check the distance between the 'Last Known' and the 'New' location. If the implied speed is physically impossible (e.g., 500km/h), I would discard that packet as noise or smooth the transition using interpolation (like a sliding window average) to prevent the pin from jumping.
- How do you handle the 'Reconnection Gap' if the user's internet flickers?
- WebSockets are stateful, so a disconnect means missed updates. I implemented a Reconnection Strategy: when the socket reconnects, the app triggers a partial re-fetch of the 'last known' locations from the database to reconcile any changes that occurred while the user was offline.
- Code for the Reconcilliation Pattern to handle the 250+ drivers.
import React, { useState, useEffect, useRef, memo } from 'react';
import { GoogleMap, Marker } from '@react-google-maps/api';
// --- Optimized Marker Component ---
// React.memo ensures this pin ONLY re-renders if its position or status changes.
const OptimizedMarker = memo(({ driver }) => {
return (
<Marker
position={driver.location}
icon={driver.isStale ? "gray-pin.png" : "active-pin.png"}
title={Driver: ${driver.id}}
/>
);
}, (prev, next) => {
return prev.driver.location.lat === next.driver.location.lat &&
prev.driver.location.lng === next.driver.location.lng &&
prev.driver.isStale === next.driver.isStale;
});
// High-level Logic for Real-Time Map Updates
const LiveMap = () => {
const [drivers, setDrivers] = useState({}); // Key-value for O(1) lookups
const bufferRef = useRef({}); // Temporary storage for "batched" updates
useEffect(() => {
// 1. Initial Hydration from MySQL
fetchLastKnownLocations().then(data => {
const driverMap = data.reduce((acc, d) => ({ ...acc, [d.id]: d }), {});
setDrivers(driverMap);
});
// 2. State Batching Interval (Flush updates every 500ms)
// This prevents 250 individual re-renders per second.
const batchInterval = setInterval(() => {
if (Object.keys(bufferRef.current).length === 0) return;
setDrivers(prev => ({
...prev,
...bufferRef.current
}));
bufferRef.current = {}; // Clear buffer after flush
}, 500);
// 3. Heartbeat/Stale Data Check (Run every 10s)
const staleCheckInterval = setInterval(() => {
const now = Date.now();
setDrivers(prev => {
const updated = { ...prev };
Object.keys(updated).forEach(id => {
// If no update for 30s, mark as stale
if (now - updated[id].lastSeen > 30000) {
updated[id].isStale = true;
}
});
return updated;
});
}, 10000);
return () => {
clearInterval(batchInterval);
clearInterval(staleCheckInterval);
}
}, []);
// 4. WebSocket Stream Ingestion
useEffect(() => {
const socket = connectToAqchatter();
socket.on("driver_update", (data) => {
// Race condition check: Only buffer if it's the newest data
const currentData = bufferRef.current[data.id];
if (!currentData || data.timestamp > currentData.timestamp) {
bufferRef.current[data.id] = {
...data,
lastSeen: Date.now(),
isStale: false
};
}
});
return () => socket.close();
}, []);
return (
<GoogleMap>
{Object.values(drivers).map(driver => (
<OptimizedMarker key={driver.id} position={driver.location} />
))}
</GoogleMap>
);
};
- When you share this, emphasize the "Efficiency of State".
- Initial Hydration: Explain that you first populate the UI with the "Last Known Location" from the database to ensure the operator isn't looking at a blank map while waiting for the first 5-second pulse.
- Targeted State Updates: Mention that you use a Key-Value Pair (Object/Map) for the driver state instead of an Array. This allows you to find and update a specific driver in $O(1)$ time rather than $O(N)$, which is critical when you have hundreds of drivers.
- Component Optimization: Tell them you used
React.memoor a similar technique for the Map Pins so that when Driver A moves, Driver B’s pin component does not re-render.
- How do you prevent "Race Conditions" in the UI?
- The Problem: Multiple updates for the same driver arriving out of order.
- The Solution: Each telemetry packet includes a server-side timestamp. In the reconciliation logic, I check if the incoming
timestampis newer than thelastSeentimestamp in my state. If it's older (delayed packet), I discard it to prevent the pin from "jumping backward."
- What happens if a dispatcher has a slow internet connection?
- The Problem: A slow client can't process messages as fast as the server sends them, causing a massive backlog.
- The Solution: This is why I used a Ref-based Buffer. Even if the WebSocket fires 100 times, my
setIntervalonly flushes to the UI every 500ms. If the buffer grows too large, I implement LIFO (Last-In-First-Out)—I only keep the most recent location for each driver in the buffer, dropping intermediate stale updates.
- How do you test a real-time WebSocket system?
- The Logic: I used Jest with
jest.useFakeTimers()to verify the batching logic. I would "mock" 50 incoming socket events and then fast-forward time to ensuresetDriverswas called exactly once with the consolidated data.
- The Logic: I used Jest with
The Project: ML-Assisted Thresholding (Splunk ITSI)
- The Tech Stack: React (Frontend), Python/Django (Backend), Splunk Enterprise SDK, Scikit-learn (ML Models).
- The Problem: Traditional IT thresholding was manual and error-prone. Users needed to configure thresholds for multiple KPIs simultaneously based on historical patterns without freezing the UI or overwhelming the backend.
- The Solution: An asynchronous recommendation engine that integrated ML models with UI workflows, allowing for bulk threshold adjustments for up to 10 KPIs at a time.
Core Performance Hooks (Mention these!)
- Asynchronous Inference: Separating the heavy ML calculation from the UI thread to prevent browser "freezing" during complex recommendations.
- Bulk Processing: Designing the API to handle concurrent requests for 10+ KPIs, optimizing the data payload to reduce round-trip latency.
- Data Recalculation Optimization: Achieving a 90% reduction in search time by refactoring database calls and optimizing object processing patterns.
- Feature Leadership: Managing the end-to-end lifecycle of a feature that was recognized as a top ML-driven innovation at Splunk.conf.
ML Integration Deep-Dive (Interview Q&A)
- How did you handle ML model latency in a user-facing application?
- The Problem: ML model inference can take several seconds, which is too slow for a standard synchronous REST API call.
- The Solution: I implemented an Async Job Pattern. When the user clicks "Recommend," the frontend triggers a background task and receives a "Job ID." The UI then shows a loading state while the backend processes the ML logic. Once complete, the results are fetched and applied to the KPI configuration screen.
- How did you optimize the "90% reduction" in search time?
- The Problem: The existing recalculation search was inefficient, making too many individual database calls and processing objects one by one.
- The Solution: I performed a bottleneck analysis and refactored the search to use Bulk Processing. I optimized the SQL-like queries to fetch only the necessary fields and implemented efficient object serialization, which significantly lowered CPU and I/O pressure on the indexers.
- How did you handle "Data Drift" or inaccurate recommendations?
- The Problem: ML models based on old historical data can give "stale" or irrelevant recommendations if the environment changes.
- The Solution: I built the Entity Adaptive Thresholding feature, which included a nightly recalculation mechanism. This ensured that the recommendation policies were dynamic and adjusted automatically based on the most recent historical data trends.
- How did you manage the "User Experience" for complex ML logic?
- The Problem: Users often don't trust "black box" ML recommendations.
- The Solution: I architected the frontend to show "ML Confidence Levels" and allowed users to preview the recommendation against historical charts before applying it. This "Human-in-the-loop" design increased user trust and adoption.
- How did you coordinate with Data Science teams as a Software Engineer?
- The Logic: I acted as the bridge. The Data Science team focused on model accuracy in Python, while I focused on the API contracts, data serialization, and UI responsiveness. We established strict JSON schemas for the model inputs and outputs to ensure that model updates wouldn't break the frontend.
Comparison: TGE vs. Splunk Projects
| Aspect | TGE Logistics (HCL) | Splunk ML (Crest) |
|---|---|---|
| Primary Challenge | High-frequency data (WebSockets) | High-latency logic (ML Inference) |
| Key Pattern | State Batching / Reconciliation | Asynchronous Job / Bulk Processing |
| Metric | Sub-second real-time latency | 90% search-time reduction |
| Scale | 250+ concurrent device streams | 700+ global enterprise customers |