yammdb - just another .mmdb

NeoonNeoon OG
edited May 2023 in Technical

Hey,

Today I wanted to present you, yammdb.
Which is another, different, geodatabase, based on real measurements.

You can find it here: https://github.com/Ne00n/yammdb
The Database is build weekly, on fridays.

As long the buildserver doesn't blow up.
The primary use case for me is to compare existing geo data, maybe someone of you will find it useful.

If you have any ideas and feedback, please lemme know.

Enjoy.

«13

Comments

  • Latency is now included from the closest location.

  • Umm I'm stupid and new to all this. What is this???

  • @schism said:
    Umm I'm stupid and new to all this. What is this???

    Put in a IP, it'll tell you where the physical location of the server/host of the IP is.

    Thanked by (2)schism ehab
  • Oh cool!

  • FrankZFrankZ Moderator

    I'm trying this out with my gDNSd servers. The db seems much smaller than the ip-to-city-lite db I have been using, 16MB compared to 99MB. If this works as well it is going to save me a bunch of RAM. :)
    Thank you for this !!

    LES • AboutDonateRulesSupport

  • @FrankZ said:
    I'm trying this out with my gDNSd servers. The db seems much smaller than the ip-to-city-lite db I have been using, 16MB compared to 99MB. If this works as well it is going to save me a bunch of RAM. :)
    Thank you for this !!

    It has way less "useless" data on it, hence its so smol.
    However, It should work with auto_dc maps using geo cords but I have no idea how accurate it is.

    Thanked by (1)FrankZ
  • Thank you! I hope there is an acl version for bind9

  • @Neoon said:

    @FrankZ said:
    I'm trying this out with my gDNSd servers. The db seems much smaller than the ip-to-city-lite db I have been using, 16MB compared to 99MB. If this works as well it is going to save me a bunch of RAM. :)
    Thank you for this !!

    It has way less "useless" data on it, hence its so smol.
    However, It should work with auto_dc maps using geo cords but I have no idea how accurate it is.

    I wrote a smol, 30 lines benchmark script, using a dump of the global routing table.
    65%, 638k from 975k in the routing table, which is good, I expected less.

    Github page said 600k.
    Meanwhile the other .mmdb's have a 99% hit rate.

    TLDR: Yes use it, but not as primary database, build your own, as my primary purpose was.

    Thanked by (1)FrankZ
  • FrankZFrankZ Moderator

    @Neoon said: TLDR: Yes use it, but not as primary database, build your own, as my primary purpose was.

    Fair enough.

    LES • AboutDonateRulesSupport

  • NeoonNeoon OG
    edited May 2023

    @FrankZ said:

    @Neoon said: TLDR: Yes use it, but not as primary database, build your own, as my primary purpose was.

    Fair enough.

    if you build your own, you prob, can drop the memory usage even further.
    Throw everything out and turn it into a flying gas can.

    Thanked by (1)FrankZ
  • Speaking of hit rate, the benchmark I ran yesterday, gave me a 65% hit rate, against a routing table dump.
    Which is more than I expected, roughly 648k from 975k, however, the second benchmark I ran, hit only 35% with 8.5 Million IP addresses.

    This was due, that bigger subnets are splitted into smaller ones for more accurate data, however the ones which didn't respond where not filled, which has been fixed.

    After fixing these bugs, the hit rate is at 74%.
    The next step would be to see, where I can further improve the data I use for all of this, so I end up with higher hit rates.

    Thanked by (1)FrankZ
  • Thanks to https://virtury.com/ we got a new Probe in Pakistan!

    Thanked by (1)Ganonk
  • NeoonNeoon OG
    edited May 2023

    I took a bit longer than expected, however the software is now mostly optimized for more probes.
    Expect more probes in the next weeks.

    Daily test builds, not guaranteed, will be available under https://yammdb.serv.app/test.mmdb
    Weekly build will happen as usual.

    Also, thanks to https://ginernet.com for a new probe in Madrid.

    Thanked by (2)FrankZ someTom
  • @Neoon said:
    I took a bit longer than expected, however the software is now mostly optimized for more probes.
    Expect more probes in the next weeks.

    Daily test builds, not guaranteed, will be available under https://yammdb.serv.app/test.mmdb
    Weekly build will happen as usual.

    Also, thanks to https://ginernet.com for a new probe in Madrid.

    It was supposedly to be already done, however I did a fuck up.
    One function had the build hang for hours over hours.

    This is fixed, thanks to GPT4 once again.
    It should finish in a bit, once this is done, I will make a second test build, including a bunch of new locations.

  • i noticed its now smaller than the previous version i tested. did you reduce the coverage or change the format?
    however, cool and useful idea to crosscheck other geolocators.

  • @someTom said:
    i noticed its now smaller than the previous version i tested. did you reduce the coverage or change the format?
    however, cool and useful idea to crosscheck other geolocators.

    No, but I did noticed it too.
    The only way I can explain it, is how the writer builds the database.

    Basically I tried to aggregate the prefixes, to make the database even smaller, however it seems like the writer already does this.
    So the size did not change after all, a while ago, the database had a lot of gaps, because the way it does ping bigger subnets.

    These gaps have been closed, hence I do assume, that the writer now can optimize / aggregate the database even further, hence its smaller. The code definitely does not or has remove data.

  • I added a few more Locations for this test run.

  • Gonna be the biggest Friday run, yet.

    Thanked by (3)FrankZ someTom Ganonk
  • NeoonNeoon OG
    edited May 2023

    Thanks to some people that followed my github repo, they actually gave me the idea, to make an mtr only geo database.
    I did code it in less than 24 hours, however, the hit rates where to low and my brain did not manage to figure out yet where the fuck up was.

    However, today I found the mapping error.
    db/mtr.mmdb {'fail': 126502, 'success': 849072, 'percentage': 87.03306976200678}

    From 64% to 74% now 87% hitrate, not bad.
    I put the .mmdb as usual on https://yammdb.serv.app/mtr.mmdb

    This database is only 4.2MB in size, only contains geo coordinates, right now.
    I will add the usual info in a later build, such as country, continent etc.

    Plus I will add a combined build later, with geo.mmdb and mtr.mmdb which first uses latency, then mtr for better accuracy.

    Thanked by (2)FrankZ someTom
  • Any plans to release a CSV version of the mmdb?

  • I updated the mtr.mmdb, it does now include continent, country and latency same as the geo.mmdb.
    @somik Sure, I added the csv file: https://yammdb.serv.app/mtr.csv

    Currently they are smaller than the geo.mmdb due to less measurements per subnet, this will change once I run them again.

    Thanked by (2)FrankZ somik
  • I also added geo.mmdb as csv: https://yammdb.serv.app/geo.csv
    There is no compression or anything, hence the file is so big.

    Usually the .mmdb writer does the compression.

  • @Neoon said:
    I also added geo.mmdb as csv: https://yammdb.serv.app/geo.csv
    There is no compression or anything, hence the file is so big.

    Usually the .mmdb writer does the compression.

    Best to have it without compression for maximum compatibility. I'm visiting our neighbouring country for some good foods now, so I'll test it out once I go back to Singapore.

    On that note, seems like a lot of shops closed down over the last pandemic... Sad days.

  • Well, I guess a .mmtr only database with more tests per subnet, won't be happening.
    It takes to long, roughly 1-2 days to finish a build with roughly 8+ million targets.

    Even with 20 probes, running, at the same time.
    Instead I am going to run another test build next week, which does .mtr on subnet's that doesn't ping and combines them with the latency results as mentioned before.

    Thanked by (2)FrankZ someTom
  • @Neoon I think your CSV headers (table titles) are missing for both geo.csv and mtr.csv

  • @somik said:
    @Neoon I think your CSV headers (table titles) are missing for both geo.csv and mtr.csv

    I will change that before the next build tomorraw.

    Thanked by (1)somik
  • It seems that a recent masscan is mandatory, I still used a 2 months old one.
    The build just finished 3 hours earlier and with +7% higher hitrate, so 80% without mtr.

    Lesson learned, masscan will be updated at least once per week, gg.
    As soon I get the mtr integration working It should easily get a 90%+ hitrate.

    Thanked by (2)someTom FrankZ
  • @Neoon said:
    It seems that a recent masscan is mandatory, I still used a 2 months old one.
    The build just finished 3 hours earlier and with +7% higher hitrate, so 80% without mtr.

    Lesson learned, masscan will be updated at least once per week, gg.
    As soon I get the mtr integration working It should easily get a 90%+ hitrate.

    I thought port scanning was frowned upon by most data-centers/hosts?

    Btw, what's MRT?

    90% hit rate as in for IPs or returning correct geo-loc/country?

  • @somik said:

    @Neoon said:
    It seems that a recent masscan is mandatory, I still used a 2 months old one.
    The build just finished 3 hours earlier and with +7% higher hitrate, so 80% without mtr.

    Lesson learned, masscan will be updated at least once per week, gg.
    As soon I get the mtr integration working It should easily get a 90%+ hitrate.

    I thought port scanning was frowned upon by most data-centers/hosts?

    I never said I was port scanning.

    90% hit rate as in for IPs or returning correct geo-loc/country?

    hit rate means, you get a result.
    Accuracy depends on the amount of locations.

  • @Neoon said:

    @somik said:

    @Neoon said:
    It seems that a recent masscan is mandatory, I still used a 2 months old one.
    The build just finished 3 hours earlier and with +7% higher hitrate, so 80% without mtr.

    Lesson learned, masscan will be updated at least once per week, gg.
    As soon I get the mtr integration working It should easily get a 90%+ hitrate.

    I thought port scanning was frowned upon by most data-centers/hosts?

    I never said I was port scanning.

    90% hit rate as in for IPs or returning correct geo-loc/country?

    hit rate means, you get a result.
    Accuracy depends on the amount of locations.

    Isn't masscan used for port scanning? Are you using it to scan for something else?

Sign In or Register to comment.