A recent study by computer scientists at ETH Zurich found that a chatbot can infer personal information about a user, such as where they live, race, gender, and more, based on the content of their conversations. Although the study has not been peer-reviewed, it raises new concerns about privacy on the Internet.
The research team used text from Reddit posts in which users tested whether the LLMs could accurately infer where they lived or where they came from. The research team, led by Martin Vechev at ETH Zurich, found that these models have a disnerving ability to guess accurate information about users based on contextual or linguistic cues alone. At the heart of the paid version of Open AI ChatGPT, GPT-4 is surprisingly accurate at predicting a user's private information 85 to 95 percent of the time.
First, developers should prioritize privacy protection and fully consider user privacy rights when designing and developing chatbots. For example, limit the scope of collection and use of user data, adopt encryption and anonymization techniques to protect user data, and developers can introduce privacy-protecting algorithms to limit chatbots' ability to infer user information.
