Datasets

The datasets are accessible from ClickHouse using our HTTP proxy. In the future, we might offer other ways to access the data.

All data produced by nxthdr and as215011 are publicly available and freely usable under Open Data Commons Open Database License (ODbL).

Here is a working example you can try in your terminal:

echo """
     WITH concat(prefix_addr, '/', prefix_len) AS prefix
     SELECT prefix,
            max(length(communities)) AS n_communities
     FROM risotto.updates
     GROUP BY prefix
     ORDER BY n_communities DESC
     LIMIT 5 FORMAT PRETTY
     """ | curl 'https://clickhouse.nxthdr.dev/?user=read&password=read' \
          --data-binary @-
   ┏━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
   ┃ PREFIX             ┃ n_communities ┃
   ┡━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
1. │ 2804:3018::/32     │           107 │
   ├────────────────────┼───────────────┤
2. │ 2804:c70:2000::/36 │           105 │
   ├────────────────────┼───────────────┤
3. │ 2804:c70:1000::/36 │           105 │
   ├────────────────────┼───────────────┤
4. │ 2804:42e0::/32     │           105 │
   ├────────────────────┼───────────────┤
5. │ 2804:c70:3000::/36 │           105 │
   └────────────────────┴───────────────┘

We use a basic username/password authentication to access the data. This is solely to prevent automated scraping, but the credentials are public and can be freely shared.

user: read password: read

Raw Peering Dataset

The raw peering dataset is available in the risotto.updates table. The schema is described in our infrastrcture repository.

Each router of as215011 sends BMP messages to risotto, which records the updates in a ClickHouse database.

Each raw corresponds to an update or a withdraw, capturing prefixes, AS paths, communities, and other attributes.

Raw Probing Dataset

The raw probing dataset is available in the saimiris.results table. The schema is described in our infrastructure repository.

Active measurements using saimiris, whether scheduled via cron jobs or performed on demand, are stored in a ClickHouse database. This data consists of traceroute-like and ping-like measurement results collected from multiple vantage points.

Each row corresponds to a measurement result, capturing the source and destination IP addresses of the sent packet, the reply, the hop count, and other relevant attributes.