Some of the top 100,000 websites collect everything you type, before you hit send

Some of the top 100,000 websites collect everything you type, before you hit send

When you sign up for a newsletter, make a hotel reservation, or pay online, you probably assume that if you misspell your email address three times or change your mind and remove the X from the page, it doesn’t matter. Nothing actually happens until you hit the Submit button, right? Well, maybe not. As with so many assumptions about the web, this isn’t always the case, according to new research: A surprising number of websites collect some or all of your data as you enter it into a digital form.

Researchers from KU Leuven, Radboud University, and the University of Lausanne tracked and analyzed the top 100,000 websites, looking at scenarios where a user visits a site while in the European Union and visits a site from the United States. They found that 1,844 websites collected an EU user’s email address without their consent, and a staggering 2,950 recorded a US user’s email in some way. Apparently, many of the sites are not intended to do data logging, but instead embed third-party analytics and marketing services that cause the behavior.

After tracking sites specifically for password leaks in May 2021, researchers also found 52 websites where third parties, including Russian tech giant Yandex, incidentally collected password data before sending it. The group disclosed their findings to these sites and all 52 cases have since been resolved.

“If there is a submit button on a form, the reasonable expectation is that it will do something, that it will submit your data when you click on it,” says Güneş Acar, a professor and researcher at Radboud University’s digital security group and one of the study leaders. “We were very surprised by these results. We thought maybe we were going to find a few hundred websites where your email is collected before it is sent, but this far exceeded our expectations.”

The researchers, who will present their findings at the Usenix security conference in August, say they were inspired to investigate what they call “leaky forms” by media reports, particularly from Gizmodo., about third parties that collect form data regardless of submission status. They note that the behavior is essentially similar to so-called keyloggers, which are typically malicious programs that record everything a target types. But on a top 1000 top site, users probably won’t expect their information to be registered. And in practice, the researchers observed some behavioral variations. Some sites recorded keystroke-by-keystroke data, but many got complete submissions of one field when users clicked the next.

“In some cases, when you click on the next field, they collect the previous one, like if you click on the password field and they collect the email, or you just click anywhere and they collect all the information right away,” Asuman says. Senol, a privacy expert. and identity researcher at KU Leuven and one of the study’s co-authors. “We didn’t expect to find thousands of websites, and in the US, the numbers are really high, which is interesting.”

The researchers say the regional differences may be related to companies being more cautious about tracking users, and even potentially integrating with fewer third parties, due to the EU’s General Data Protection Regulation. But they stress that this is just one possibility, and the study did not examine explanations for the disparity.

Through a substantial effort to notify websites and third parties that collect data in this way, the researchers discovered that one explanation for some of the unexpected data collection may have to do with the challenge of differentiating a “submit” action. of other user actions on certain websites. pages But the researchers emphasize that, from a privacy perspective, this is not an adequate justification.

Since completing the paper, the group has also discovered the Meta Pixel and TikTok Pixel, invisible marketing trackers that the services embed on their websites to track users across the web and show them ads. Both stated in their documentation that clients could turn on “automatic advanced matching,” which would trigger data collection when a user submitted a form. However, in practice, the researchers found that these tracking pixels captured encrypted email addresses, a hidden version of email addresses used to identify web users across platforms, before sending. For US users, 8,438 sites may have been leaking data to Meta, Facebook’s parent company, via pixels, and 7,379 sites may have been affected for EU users. For TikTok Pixel, the group found 154 sites for US users and 147 for EU users.

Investigators filed a bug report with Meta on March 25, and the company quickly assigned an engineer to the case, but the group hasn’t received any updates since. Investigators notified TikTok on April 21; they discovered TikTok’s behavior more recently, and have received no response. Meta and TikTok did not immediately respond to WIRED’s request for comment on the findings.

“The privacy risks for users are that they will be tracked even more efficiently; they can be tracked across different websites, in different sessions, on mobile and desktop,” says Acar. “An email address is such a useful identifier for tracking, because it’s global, it’s unique, it’s constant. You cannot delete it like you delete your cookies. It is a very powerful identifier.”

Acar also notes that as tech companies look to phase out cookie-based tracking in a nod to privacy concerns, marketers and other analysts will increasingly rely on static IDs like phone numbers and email addresses. email.

Since the findings indicate that deleting data in a form before submitting it may not be enough to protect against all collection, the researchers created a Firefox extension called LeakInspector to detect unauthorized form collection. And they say they hope their findings will raise awareness of the problem, not just for regular web users, but also for website developers and administrators who can proactively check whether their own or third-party systems they’re using they are collecting form data without consent.

Leaky forms are just one more type of data collection to watch out for in an already extremely crowded online field.

This story originally appeared on

Leave a Comment