Website category classification using fine-tuned BERT language model

dc.contributor.authorDağ, Hasan
dc.contributor.authorDemirkıran, Ferhat
dc.contributor.authorÜnal, Uğur
dc.contributor.authorDağ, Hasan
dc.date.accessioned2020-12-17T18:36:21Z
dc.date.available2020-12-17T18:36:21Z
dc.date.issued2020
dc.departmentFakülteler, İşletme Fakültesi, Yönetim Bilişim Sistemleri Bölümüen_US
dc.description.abstractThe contents on the Word Wide Web is expanding every second providing web users a rich content. However, this situation may cause web users harm rather than good due to its harmful or misleading information. The harmful contents can contain text, audio, video, or image that can be about violence, adult contents, or any other harmful information. Especially young people may readily be affected with these harmful information psychologically. To prevent youth from these harmful contents, various web filtering techniques, such as keyword filtering, Uniform Resource Locator (URL) based filtering, Intelligent analysis, and semantic analysis, are used. We propose an algorithm that can classify websites, which may contain adult contents, with 67.81% (BERT) accuracy among 32 unique categories. We also show that a BERT model gives higher accuracy than both the Sequential and Functional API models when used for text classification.en_US
dc.identifier.citation3
dc.identifier.doi10.1109/UBMK50275.2020.9219384en_US
dc.identifier.endpage336en_US
dc.identifier.isbn978-172817565-2en_US
dc.identifier.scopus2-s2.0-85095717414en_US
dc.identifier.scopusqualityN/A
dc.identifier.startpage333en_US
dc.identifier.urihttps://hdl.handle.net/20.500.12469/3562
dc.identifier.urihttps://doi.org/10.1109/UBMK50275.2020.9219384
dc.identifier.wosWOS:000629055500065en_US
dc.identifier.wosqualityN/A
dc.institutionauthorDemirkıran, Ferhaten_US
dc.institutionauthorÇayır, Aykuten_US
dc.institutionauthorÜnal, Uğuren_US
dc.institutionauthorDaǧ, Hasanen_US
dc.language.isoenen_US
dc.publisherInstitute of Electrical and Electronics Engineers Inc.en_US
dc.relation.journal5th International Conference on Computer Science and Engineering, UBMK 2020en_US
dc.relation.publicationcategoryKonferans Öğesi - Uluslararası - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectBERTen_US
dc.subjectFunctional APIen_US
dc.subjectSequential APIen_US
dc.subjectText classificationen_US
dc.subjectWeb filteringen_US
dc.titleWebsite category classification using fine-tuned BERT language modelen_US
dc.typeConference Objecten_US
dspace.entity.typePublication
relation.isAuthorOfPublicatione02bc683-b72e-4da4-a5db-ddebeb21e8e7
relation.isAuthorOfPublication695a8adc-2330-4d32-ab37-8b781716d609
relation.isAuthorOfPublication.latestForDiscoverye02bc683-b72e-4da4-a5db-ddebeb21e8e7

Files