Predicting User Purchases From Clickstream Data: a Comparative Analysis of Clickstream Data Representations and Machine Learning Models

dc.authorwosid Tokuç, A. Aylin/Ixn-5337-2023
dc.authorwosid Dag, Tamer/K-7830-2014
dc.contributor.author Dağ, Tamer
dc.contributor.author Dag, Tamer
dc.contributor.other Computer Engineering
dc.date.accessioned 2025-04-15T23:42:53Z
dc.date.available 2025-04-15T23:42:53Z
dc.date.issued 2025
dc.department Kadir Has University en_US
dc.department-temp [Tokuc, A. Aylin] Kadir Has Univ, Dept Comp Engn, TR-34083 Fatih, Istanbul, Turkiye; [Tokuc, A. Aylin] Valinor AI, London SE9 4HA, England; [Dag, Tamer] Amer Univ Middle East, Coll Engn & Technol, Egaila 54200, Kuwait en_US
dc.description.abstract Predicting purchase events from e-commerce clickstream data is a critical challenge with significant implications for optimizing marketing strategies and enhancing customer experience. This study addresses this challenge by systematically evaluating and comparing multiple data representations - aggregated session attributes, recent user actions, and hybrid combinations - which bridges gaps in the existing literature and demonstrates the superiority of hybrid approaches. Unlike prior research, which typically focuses on single representations, our approach combines aggregated session-level summaries with granular, sequential user actions to capture both long-term and short-term behavioral patterns. Through comprehensive experimentation, we compared multiple machine learning models, including LightGBM, decision trees, gradient boosting, SVC, and logistic regression, using real-world e-commerce clickstream data. Notably, the hybrid representation with LightGBM achieved superior predictive performance, significantly outperforming alternative methods. Feature importance analysis revealed key factors influencing purchase likelihood, such as time since the last event, session duration, and product interactions. This study provides actionable insights into real-time marketing interventions by demonstrating the practical utility of hybrid data representations and efficient tree-based models. Our findings offer a scalable and interpretable framework for e-commerce platforms to enhance purchase predictions and optimize marketing strategies. en_US
dc.description.woscitationindex Science Citation Index Expanded
dc.identifier.doi 10.1109/ACCESS.2025.3548267
dc.identifier.endpage 43817 en_US
dc.identifier.issn 2169-3536
dc.identifier.scopus 2-s2.0-105001064548
dc.identifier.scopusquality Q1
dc.identifier.startpage 43796 en_US
dc.identifier.uri https://doi.org/10.1109/ACCESS.2025.3548267
dc.identifier.volume 13 en_US
dc.identifier.wos WOS:001445086900045
dc.identifier.wosquality Q2
dc.language.iso en en_US
dc.publisher IEEE-Inst Electrical Electronics Engineers inc en_US
dc.relation.publicationcategory Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı en_US
dc.rights info:eu-repo/semantics/openAccess en_US
dc.scopus.citedbyCount 0
dc.subject Predictive Models en_US
dc.subject Data Models en_US
dc.subject Hidden Markov Models en_US
dc.subject Electronic Commerce en_US
dc.subject Computational Modeling en_US
dc.subject Data Visualization en_US
dc.subject Analytical Models en_US
dc.subject Real-Time Systems en_US
dc.subject Random Forests en_US
dc.subject Machine Learning Algorithms en_US
dc.subject Clickstream Data en_US
dc.subject Customer Behavior Modeling en_US
dc.subject Data Representations en_US
dc.subject Feature Importance en_US
dc.subject Gradient Boosting en_US
dc.subject E-Commerce en_US
dc.subject Lightgbm en_US
dc.subject Machine Learning en_US
dc.subject Model Selection en_US
dc.subject Purchase Prediction en_US
dc.title Predicting User Purchases From Clickstream Data: a Comparative Analysis of Clickstream Data Representations and Machine Learning Models en_US
dc.type Article en_US
dc.wos.citedbyCount 0
dspace.entity.type Publication
relation.isAuthorOfPublication 6e6ae480-b76e-48a0-a543-13ef44f9d802
relation.isAuthorOfPublication.latestForDiscovery 6e6ae480-b76e-48a0-a543-13ef44f9d802
relation.isOrgUnitOfPublication fd8e65fe-c3b3-4435-9682-6cccb638779c
relation.isOrgUnitOfPublication.latestForDiscovery fd8e65fe-c3b3-4435-9682-6cccb638779c

Files