Onion Service Statistics

The goals of this research project is to learn about how Tor onion service are currently being used across the world. How many onion services are there, how many users do they have and what are they being used for. Research like this has already been conducted in the past for V2 onion services by harvesting information from the hidden service directory. Tor Metrics only shows information on how many V2 onion services exist and how much bandwidth onion services consume.

With the introduction of V3 onion services, the collection of information via hidden service directories was made much harder by deriving blinded keys for every time period and using them to identify descriptors. This prevents hidden service directories from collecting onion service addresses, so it is no longer possible to connect to all published addresses to identify out what they are doing. What is still possible, is running a HSDir node to count how many blinded V3 descriptors are uploaded and how often they are downloaded. Additionally, it is possible to extract old shared random values from past consensuses (archived by the Tor Project) which allows linking blinded keys derived from well-known onion services. This makes it possible to track some onion service over multiple time periods, which reveals statistically significant information about how much they are being used.

Research Questions

The main goal of our research is to gain more insight on how Onion Service are being used in practice.

How many V3 onion service are in operation?
How much are onion services being used by clients?
How many users do the most popular onion services have?
How many users do average onion services have?
How is the correlation between highly used onion services and onion services with publicly announced onion addresses?
- Are there any onion services which are used a lot without ever making their onion address public?

Implementation

We deploy a series of about 50 Tor relays, which fulfill the minimum requirements for obtaining the HSDir flag (stable + fast + uptime >96 hours). The Tor relays are slightly modified to log the blinded public keys of all uploaded and downloaded descriptors. A log listener is attached via the Tor control protocol and stores the blinded public keys of all uploaded descriptors and how often they were downloaded in a central database.

At the start of our research the Tor hidden service directory was made up of roughly 4000 relays, so our 50 nodes are expected to make up about 1% of the entire directory, so the chance of an onion service being seen by us is about 8% (HSDIR_Spread=4 and replicas=2). The chance of our experiment seing a part of the access calls to an onion service is even lower at only 6% (HSDIR_FETCH_Spread=3 not 4). This means we only extract a subset of information every day and can only link information for onion addresses we know.

Privacy/Transparency/Legal considerations

This research project was discussed with the Tor Research Safety Board during the planning stage and tries to implement all the feedback they provided. We do not intend to cause any harm to Tor users, we only want to learn more about how the Tor network is being used. For this purpose we have taken a sequence of comprehensive steps to ensure that our research does not negatively impact the Tor network or any of its users.

Privacy considerations

The main information we extract from the Tor network is a list of blinded V3 onion keys and a count of how often these keys have been requested by clients. Individual requests are not stored any longer than it takes to extract the blinded public key. In order to ensure that the relative order (and thus timing) of requests cannot be inferred from our data, requests are collected at the relay on an hourly basis and sorted randomly before being inserted into the database.

We plan to make our data available to other researchers, but have to apply some limitations to protect Tor users. For blinded keys we could not link together (because we do not know the underlying onion address) we will only publish how many of them there are and how often they are accessed (per service, per day, etc). For blinded descriptors we could link together, we will not publish their underlying onion address (as that might enable malicious actors to prioritize which onion services they should attack). Exceptions will be made for specific onion services were we either obtained permission by the operators or know for certain that publishing their usage number does not negatively impact them. To ensure that rarely used onion services do not reveal too much information about them within our data, we will treat onion services which see little usage like services where we did not know the underlying onion address.

Transparency considerations

Our entire experimental setup runs on our own hardware in a locked room within the Institute of Networks and Security at Johannes Kepler University Linz. Access to the infrastructure is only granted to the researchers responsible for setting up the experiment. Access to the raw data is also limited to the responsible researches, who will be responsible for only publishing data according to our laid out privacy considerations. To ensure transparency all our relays specify their families correctly and link to this page in their description.

Legal considerations

This research project is undertaken by Johannes Kepler University Linz and the legal considerations are adjusted to Austrian law. Therefore the following paragraph is in German:

Unsere Tor Knoten agieren nur als “Middle Nodes”, das heißt sie leiten verschlüsselte Nachrichten von einem Tor Knoten an andere Tor Knoten weiter. Da diese Daten verschlüsselt sind, ist es für unsere Knoten unmöglich zu wissen, wessen Daten sie gerade zu welchem Ziel transportieren und daher gehen wir davon aus, das der Betrieb dieser Tor Knoten an sich keine strafrechtliche Konsequenz hat.

Durch das Erlangen des Status als “Hidden Service Directory” speichern unsere Knoten zusätzlich Informationen über aktuell verfügbare onion services. Da diese Deskriptoren verblindet und verschlüsselt sind, ist es uns auch hier nicht möglich zu wissen, für welche Onion Services auf unseren Relays Service Descriptoren liegen. Daher ist auch hier der Vorwurf der wissentlichen Unterstützung von Straftaten nicht möglich. Sollten wir von einem Descriptor erfahren, der auf eine illegale Seite zeigt, so wird dieser natürlich entfernt. Da die Lebensdauer eines Deskriptors jedoch nur 24h beträgt, ist es in der Praxis fast unmöglich Deskriptoren zu entfernen, bevor Sie Ihre Gültigkeit verlieren.

Die von uns gesammelten Daten werden (wie im Kapitel “Privacy Consideration” ausführlich erklärt) soweit wie möglich möglich reduziert, um zu verhindern das sich unsere Informationen direkt auf einzelne Nutzer des Tor Netzwerks zurückführen lassen. Darüber hinaus überwacht unser Experiment selbst im besten Fall nur 1% des gesamten HiddenServiceDirectory, was die Chance das ein bestimmter Onion service überhaupt von uns erfasst wird, mit weniger als 10% so gering ausfallen lässt, das unsere erfassten Daten auch für gezielte Ermittlungen in diesem Bereich keinen konkreten Nutzen bieten.

Aus Datenschutzperspektive könnte man argumentieren, dass die Information darüber wann genau ein service descriptor heruntergeladen wurde eine personenbezogene Information ist, da es in Sonderfällen vorstellbar ist, das Dritte diese Daten mit anderen Informationen verknüpfen, um Personen eindeutig zu identifizieren. Unsere Anstrengengen, die im Kapitel “Privacy Considerations” beschrieben werden, sollten jedoch ausreichend sein um datenschutzrechtliche Bedenken soweit zu reduzieren, das sie vor unserem berechtigen Forschungsinteresse zurücktreten müssen.

Publications

The data collected during this experiment has been used by the following publications:

Contact

If what you have read so far, has not answered your questions regarding our project, please contact Tobias Höller with any suggestions, complaints, worries you might have.