A recent study by cybersecurity company Surfshark has spotlighted Meta’s new chatbot app, Meta AI, as a leader in user data collection, raising significant concerns for small and medium-sized enterprises (SMEs) that rely on AI tools for business operations.
The study reveals that Meta AI collects an unprecedented 32 out of 35 possible data types over 90% of the total making it the most data-intensive chatbot among those analysed.
Unprecedented Data Collection
According to Surfshark, Meta AI stands out not only for the volume of data it collects but also for its breadth, gathering information across categories such as financial details, health and fitness, and sensitive personal data, including racial or ethnic data, sexual orientation, pregnancy or childbirth information, disability, religious or philosophical beliefs, trade union membership, political opinions, genetic information, and biometric data. No other chatbot analyzed in the study collects such a wide range of sensitive information.
“Meta is an ecosystem that collects user data across platforms like Facebook, Instagram, and Audience Network for displaying third-party ads, and now it’s doing the same through Meta AI. This chatbot learns from public posts, photos, and texts, as well as new data shared by users, which is an example of gross misconduct and mishandling of user data. Generative AI should not be trained on user data, and this highlights why regulations for AI are an urgent necessity,” said Karolis Kaciulis, Leading System Engineer at Surfshark.
Meta AI, alongside Microsoft’s Copilot, is also unique in collecting data linked to user identity for third-party advertising or sharing with third parties for ad purposes. While Copilot uses two data types, such as Device ID and Advertising Data, Meta AI may leverage up to 24 different data types for these purposes, amplifying concerns for SMEs using the platform for customer engagement or marketing.
Implications for SMEs
For SMEs, the implications of Meta AI’s data practices are significant. Many small businesses lack the robust cybersecurity infrastructure of larger corporations, making them particularly vulnerable to risks associated with excessive data collection. Sharing customer or employee data with Meta AI could lead to unintended exposure, especially given its use of data for third-party advertising across Meta’s ecosystem. This raises compliance challenges, particularly for SMEs operating in regions with strict data protection laws, such as Europe’s GDPR.
Moreover, the study highlights the potential for inaccurate AI outputs due to flawed training data. Meta AI, which learns from diverse sources including public posts on Facebook and Instagram, may produce unreliable results. Kaciulis noted, “People should keep in mind that even though these chatbots may provide you with a quick answer, the results they get are mediocre. Why is that? AI chatbots are being fed with all kinds of information and the majority of it can be inaccurate. Every person is responsible for the results they provide at their job, but generative AI is not; it is unaccountable and is not legally subject to the same scrutiny as a human.”
The Surfshark study also points to broader trends in AI data collection. On average, analyzed chatbot apps collect 13 out of 35 possible data types, with 45% gathering user location data and nearly 30% engaging in tracking for targeted advertising or data broker sharing. For SMEs, this underscores the hidden cost of using “free” AI tools, where business and customer data become a currency.
Comparative Data Practices
Other AI chatbots present varying levels of data collection. Google Gemini collects 22 unique data types, including precise location data, contact information, user content, and browsing history. ChatGPT, by contrast, collects 10 data types, such as contact information and usage data, but avoids tracking or third-party advertising. ChatGPT also offers features like temporary chats, which auto-delete data after 30 days, and options to remove personal data from training sets, making it a potentially safer choice for SMEs.
Copilot, Poe, and Jasper collect data for tracking purposes, with Jasper gathering device IDs, product interaction data, and other usage data that could be sold to data brokers or used for targeted ads. Kaciulis emphasized the broader privacy implications: “As a human being, especially in Europe, where GDPR protects user rights, personal data belongs to you, not to corporations or AI systems. Sharing it with generative AI can lead to it being stored, analyzed, and used without your full control, risking targeted manipulation, identity theft, or misuse. Also, people should be aware that things AI learns from your personal data can not be unlearned.”
To mitigate risks, SMEs should limit sharing sensitive data, verify AI outputs for accuracy, and ensure compliance with data protection regulations. As AI tools become integral to business operations, understanding their data practices is critical to safeguarding customer trust and avoiding legal pitfalls.
Link to the study here.
Keep up to date with our stories on LinkedIn, Twitter, Facebook and Instagram.