These are unedited transcripts and may contain errors.
DNS Working Group session
Wednesday, 14 May, 2014, at 11 a.m.:
JAAP AKKERHUIS: Good afternoon ?? morning, well, whatever. But if you want to know everything about address policy, you are in the wrong room. And so, because this is going to be the DNS Working Group, and my name is /SKWRAP /A*BG, I am chairing this first session, the other two Chairs are Jim Reid here and Peter Koch there.
First on logistics, policies that we have been kind of late in publishing the agenda, and we hope to do better next time. So that is number one.
Number two: I see the room is filling up, and yes, the other logistics problem is that the room is slightly too small. However, downstairs, there is an overflow room with a big screen, so if it gets too warm here and too many people, you can actually go downstairs as well. The added advantage is that there is no audio shortage ?? no audio feedback to this room so you can actually heckle and talk while you are sitting downstairs. And if you have questions from downstairs, please use the Jabber room.
The agenda is pretty filled, like always. And I don't think there are any changes on the agenda and the only thing we have to do is officially say that the minutes, which were kind of late as well, should be ?? if there is any remarks about the minutes of the previous meeting, say now or hold your moment
So I guess we actually, by this we have fixed the minutes for now.
The review of the action items is fairly short because there are no action items. Now we go to the first speaker, we will be Anand, giving the reviews report about what happened at the RIPE NCC on DNS.
Anand.
ANAND BUDDHDEV: Good morning. I am Anand from the RIPE NCC. And I am going to do a quick update on DNS services at the RIPE NCC, I am not going to focus too much on statistics and numbers but just talk about some of the more interesting things that we have been busy with of late.
So just a quick slide, first off. K?root. We haven't had any major changes here. We are still operating this service with 17 instances throughout the world, and they are all up and stable and running and business as usual. However, we have been working on some improvements and changes. We want to add new instances later this year, so what we have been doing in the background is developing a new model for this. We have talked about this at previous RIPE meetings but for various reasons haven't actually managed to do this deployment. But now, we do have a model ready, and we have our internal documentation set up and we will be nounsing more news of this within the next couple of months. But in summary, we have a model using a small single DNS in a box solution. This lowers the requirements for hosts, meaning it's easier for people to host a K?root instance. We have five instances of K?root which we call global instances. These are big installations with lots of servers and lots of transit and capacity. We plan to maintain these as they are, so there will be no changes there. And we currently have 12 instances which we call local instances, and these will be phased out and just replaced by the new model that we are developing.
One of the things that we are also going to do is increase some diversity. Currently, K?root runs NSD version 3, and we would like to add a few more applications to this mix; we would like to introduce BIND and Knot DNS to this mix and so that is all in progress.
The other service of the RIPE NCC is authoritative DNS. So we run primary servers for ripe.net, e.164.arpa and several other forward DNS zones, and we also do Reverse?DNS for the address space that is delegated to the RIPE NCC.
We also provide secondary DNS service. We carry the in?addr.arpa and IPv6.arpa zones. We provide secondary DNS for 77 ccTLDs at the moment. We also carry the forward and reverse zones of the other RIRs. We have several secondary for several miscellaneous zones such as AS ?? various nonprofit little organisations that need service from us.
And of course, we have over 4,000 zones, Reverse?DNS zones, which we carry for our members, LIRs. These are for /16?sized v4 zones, and /32?sized v6 reverse zones.
Our authoritative DNS cluster is currently consists of two instances. One in Amsterdam and one in London. Between the two of these sites, they reach peaks of 120,000 queries per second, so that is approximately 20,000 per server because we have six servers in total. Our third site in Stockholm hosted at Netnod, is actually ready. We are currently arranging for transit there, so if anyone is willing to provide us transit at Netnod in Stockholm, then please come and see us. And we hope to make this active towards the end of May.
This site in Stockholm is also going to act as a DNS provisioning backup site. We currently have only one provisioning site which is in Amsterdam. And as part of our resiliency plans we want to bring up a second provisioning site there. This will also act as a second distribution master for our authoritative DNS cluster.
We also provide secondary service for ccTLDs, as I mentioned. Our aim in doing this is to provide reliable DNS for small and developing ccTLDs. At the moment we don't really have any agreements or SLAs with any of them but this is something going to be addressed as part of action item 67 .1, and Peter Koch will talk more about this later.
Our authoritative DNS cluster until last year ran only on BIND, and BIND ?? BIND 9, I should mention, is mature. It has a very small memory footprint, so for all our zones together it uses only 11 gigabytes of RAM, so this is by far the smallest we have seen. We can add and remove zones without stopping service, and it provides views for separation of service into logical servers. But the main down side, of course, is that our entire cluster is vulnerable to, you know, for example, a serious bug in BIND, if there is something that can remotely crash it then the entire cluster is vulnerable to this. So we thought about increasing diversity, and our main motivation for that was of course resilience, we don't want to rely on just one type of software.
The other motivation we had was to try and improve new and upcoming software; I mean, without users, software doesn't get tested and bugs don't get discovered and we, of course, would like to help some of our community members who are developing some of this software, as well, so that was the other motivation to try out some of the new ones.
And we have an interesting mix of zones, we have several large zones, several small ones, forward, reverse, some some signed some not signed, the signed ones are a mix of NSEC and NSEC3 so all of these various features of our zones would, we hope, tackle lots of bugs in software.
We have some requirements that we would like to use. It has to run on CentOS Lynn you can, that is our platform of choice, it should be easy to package into rpm, should implement DNS and DNSSEC properly so that is kind of given. It should run under supervisors because we like to make sure that our services stay up. The maintaining it should be reconfigureable without stopping service and we should be able to add and remove zones on the fly.
So we looked at a few candidates, Knot DNS, NSD4, Nominum A NS, BIND 10 and Yadifa, these were on the Horizon and we selected a couple of these and we discarded the others and I will talk about those now.
NSD4, this is relatively new, but it is built on top of NSD3's mature DNS code, but the architecture has changed significantly. This NSD version can add and remove zones on the fly so this checks one of our requirements. It has a stable master process, so it allows us to run it under a supervisor. It supports all the current DNS standards and there is a highly responsive team of developers that we can talk to.
Knot DNS. This is an authoritative onlier as you have heard. It's also quite small and lightweight. It has a stable master process, allows supervised execution, supports all DNS standards and again I would like to thank the team at cz.nic for being highly responsive to us, to listen to our requirements and needs and bug reports and fix them and provide great support in general. Thank you, guys.
Just a little story. On Christmas Day, 25th of morning, in I woke up, I had a little tyre ball in my my e?mail from Merrick with another bug fix and that is how I spent my Christmas Day. And even more interestingly, I provided feedback by e?mail and I got a response. And I thought, oh, this is wonderful.
AUDIENCE SPEAKER: Get a life.
ANAND BUDDHDEV: That is how much dedication there is. We looked at Nominum ANS, this is commercial, authoritative name serve, supports all DNS standards. We would like to use this in our provisioning set?up but not our public facing DNS cluster yet. So this is going to be in the next phase of our deployment in Stockholm.
We looked at BIND 10 and Yadifa was still in heavy development and not really ready for production use so we discarded that. And we briefly looked at Yadifa but it didn't have the features that we required, and so we also discarded that for the time being. I have been told that there is ongoing work there so we are happy to look at it again in the future.
Some kick numbers: Memory usage, we find that BIND 9 uses the least amount of memory on our service 11 gigabytes and NSD4 was using up to 24 gigabytes but there is now a new mode of NSD4 that uses much less memory and knot uses about the same amount of RAM as NSD4 now but its memory usage is also going down.
Start?up time, BIND 9.9 starts up quite quickly, 45 seconds with our 5,000 zones or so. BIND 9.10 has got even better and starts in just 15 seconds. That is amazing. Knot takes about 90 seconds and NSD4 because of its single threaded reader takes about three minutes to start up but fortunately we don't need to restart the server that often.
Shut down time, similarly BIND 9.9 quickly, and NSD4 has no clean up to do so shuts down almost immediately.
BINDS provides us with views so this allows serving zones on their own IP addresses. Knot and NSD4 don't have this fear but but we run separate instances each with their own config file on IP addresses and these are all managed with up start.
During our testing we found there were several bugs and issues as I had expected, we tickled various things so NSD4, for example, it wasn't really a bug, but it was how it behaved. It returned ServFail for unconfigured as well as expired zones so there was no way to distinguish whether it was not configured or had expired, this has changed in 4.0.3. Our testing also revealed several NSEC related bugs in both knot and NSD4. There were all kinds of memory corruptions and stuff like that which were revealed. We found that Knot DNS parser was very strict so when it was running as a slave, if it got a zone with some bad data in it, whereas BIND would happily load the loan and discard those records, knot plainly refused to load these so we talked to the guys at cz.nic and said we should be maybe more for giving and there was some back and forth and some things have been accepted and made more liberal, so there is some improvement there. And we /TPHAOUPBD some types of zones transfers would also crash, knot and in some cases corrupt the NSD database so these have been identified and fixed as well.
Diversity is great but you have to be able to manage it. So how we do it is we package everything up into rpms and keep all the packages in our private respository so we can deploy them easily. We use Ansible for configuration and we have an inventory which tells each server what kind of role it is supposed to run and the roles are mutually exclusive so if I tell a server that you are a knot server then Ansible automatically disables NSD and BIND and loads up the appropriate configurations and starts up knot, for example.
All the TSIG keys are Joan master list, everything is kept in YAML so independent of any particular implementation and we use Jinja template for each type of serve. This allows us to switch easily between the servers. So if there is a bug or some serious issue with one type of name serve we can just switch one physical server from one type to another type within a matter of minutes. And, you know, just push one button and everything goes away and reconfigures itself and yeah. That is how we manage diversity.
Just a little plug for Ansible. Lots of you are familiar with CF engine and puppet. Ansible is the new kid on the block. We really like it and if you are using it yourself and would like to talk to us about it, please come and see me. I like the tool and I will tell you more about how we use it. We have a slightly unusual set?up which I may blog about later as well.
That was my presentation. And questions?
NIALL O'REILLY: From University College Dublin. I am one of the Co. Chairs of a dormant Working Group whose topic you didn't mention. I guess because there is nothing to say about ENUM?
ANAND BUDDHDEV: That's right. The zone just is there and is delegating to various entities, but there is no updates really.
NIALL O'REILLY: Meanwhile in another part of the forest and I notice from the people at train in a that another country has just joined the NR ENUM project and it's Malaysia. So things are happening there.
ANAND BUDDHDEV: Thanks.
JIM REID: Just some random guy off the street. Anand, you are talking about making changes to the way in which the route server is going to be operated and the Anycasting is done. This is maybe not a question to answer right here, I wonder have you given any thought about how others might be in a position to help the NCC with hosting to contributing by bandwidth or secure ?? if you wanted those service in particular regions of the world, I wonder if there is any plans to do anything in that area and if so when we met hear about them.
ANAND BUDDHDEV: We obviously will be trying to identify hosts and potential hosts for this service. We haven't started that process yet. We hope to start this process in the next couple of months, and we will definitely be reaching out to the community for this. And inviting people to come and talk to us. We do have certain minimum requirements of our own that we require in order to operate such an instance, and we will make these public. And basically, anybody who meets these requirements will be welcome to talk to us to host an instance of K?root.
JIM REID: Super, thank you very much, Anand.
AUDIENCE SPEAKER: Patrik Falstrom. I would like to know if you plan for diversity in operating systems and hardware?
ANAND BUDDHDEV: We have talked internally about operating system diversity. That is a little bit more complicated to manage than name server software. We haven't made any decisions yet. This is still something we are thinking of. I mean, the usual candidates, open BSD, FreeBSD are there but we also have requirements such as monitoring and hardware support which we have with CentOS Linux which is not so good with Free and Net BSD and Open BSD for example, but we have not discounted or discarded any of them so we are still quite open to more diversity, we think that is good.
Hardware is probably the most difficult because we ?? you know, our hosts need to be able to buy hardware easily and it's hardware that we need to be able to support as well. So I think that we may be more limited in our choices there, but again, we have not discounted anything at this stage. So, I am happy to talk to individual people about hardware diversity as well.
AUDIENCE SPEAKER: Cz.nic. I just want to thank Anand for his patience with us.
ANAND BUDDHDEV: Thank you.
JAAP AKKERHUIS: Thank you, Anand.
(Applause)
For the people in the back standing room only, there is an overflow room downstairs and you can see probably slightly more comfortable, one or two seats here as well and another one here. But then we are really going to standing only.
The next agenda item. The next one will be the DNS ?? everybody knows DNS can be really popular for using ?? not for looking up names but just pestering other people, and Curon Davies is going to tell about how to find the bad guys or the pest.
CURON DAVIES: Over the last six months or so particular college that we work with has been subject to denial of service attacks, not far off on a daily basis. And quite early on back in October an attack was subjected against the college and the local metropolitan area network that is part of JANET, partially cut them off to cope with that but they non?routed one particular IP address. That IP address subsequently, they just changed the IP address to another IP address as a way around that, and the attack moved on to that straight away and that happened about three times.
So, first of all, I will start off with saying what I actually do. I am part of JISC regional support centre and support post?16 education in Wales. With technical infrastructure, some of my colleagues deal with the learning side. We are from the same funding stream as Janet but act separate but provide support for local area. There are nine, well there are 12 regions altogether, Northern Ireland, Scotland, Wales and there is nine in England.
So the actual college under attack, further education college, probably the only college that I am aware of in Britain that actually can justify an IDN. They do a little bit of HE, higher education, about 10,000 students, 850 staff, then two gigabit Internet connections at different sites with a link between the two.
The ?? although the gigabits are there they are only using about average of 50 meg per second, but they have got capacity for a gig.
So back to the idea of DNS. If you allocate loads of different IP addresses to different destinations, then if an attack is against one of those IP addresses then you are pretty certain that someone must have done a lookup for that IP address. A bit like revolver gun with forensics with guns it can normally be traced back to which barrel or cartridge chamber within the gun the bullet was fired from.
So, we have only been working with IPv6 ?? sorry we haven't been working with IPv6. Legacy IP only. IPv4. But despite that, most attacks are against IPv4 only. Looking at source code for, for example, low orbit eye on canon, which is quite well?known attack software that runs on Windows, strongly suggests it only works with IPv6. It probably does work with IPv6 but it's not intended for that.
But despite that, the other aspect that you have to consider is the TTL, the turn to live of the DSL records that are returned. With an attack, it is possible for possible for run to look up the IP address today and a week later decide to attack it after writing it in a file or something. Although that is possible, the attacks we have seen are against a new IP address that wasn't being used previously or one of the attacks ?? so, just to clarify: Right ?? the controller of the denial of service attack will make the DNS request, and will provide a dynamic response. Using PowerDNS with a pipe back end, with custom written code in pearl, relatively works fine, relatively. But the traffic that are ?? the DNS traffic is quite minimal but I will come back to that.
Then, we did have a proprietary firewall at site but we ?? well the college replaced that with PFSense after about a week of the first attack. The firewall that they did have, I won't mention it, dropped all packets from one source IP address if there were more than 300 /S*EUPLS per second. If there were more than 1,000 send packets will second it will drop all the traffic from the MAC address which is quite useful because it drops everything.
So there is PF flow D which is quite nice providing NetFlow packets and NetFlow data which is sent to JANET's C?cwert. The only problem with PF flow D only sends the stateful traffic, soft flow D sends the stuff that are dropped as well. So, knowing which address is being attacked, particularly when it's blocked at the firewall is mainly down to soft flow ?? NetFlow data.
So most of the attacks from SYN flood attacks. So an example here, not massive amount of traffic but we suspect that some of it was dropped, because there was a lot of dropped packets that weren't part of the attack during that thank period. And there is two, one starts at about 1 a.m. and the other about 8, a slighter closer look there shows more packets per second than the other graph over an average period. During that attack, the SIS log on the PF sense had details of the TTL of each packet that was dropped. That was a lot of data, it was a few gigabytes so we did lose a fair bit of packets there ?? sorry, log lines. But the ones that were logged, most of them were three different TTLs. The maximum is 255, so there was only about 15 hops away from the attacker.
Looking at the DNS logs, yeah, there is ?? this was on a different day we lost ?? we had so much logging going on that we have lost some due to disks filling up or various other reasons, but the attacks were within, looking at the logs, the attacks were within seconds of these DNS requests. If you look on the right?hand side, all those ?? most of those DNS requests were for A records only. The unfortunate bit was we weren't 100 percent sure because allocating different IP addresses from the same block, depending on country, it's quite biased towards certain countries.
Most of the attacks came from either the US or Germany, but there is two IP blocks there that are reported in the MaxMind geoIP database as being Germany and the US, but if you actually make some further research, they are both Brussels and they are both Google public DNS. So what we decided to do, instead of using the country code, just use the last objecting at the time of the IP address and allocate based on that, which gives a far more uniform distribution across all IP addresses.
And in addition to that, PowerDNS supports EDNS clients subnet so we enabled that with Google public DNS. We did have an attack that went through Google ?? made a request through Google public DNS. We did ask them if they would help but they declined. So we enabled EDNS subnet client instead which provides almost all but not all of the information they could have provided us with anyway. I am not going to bore you with the details of that.
Another type of attack was an amplification attack, plenty of traffic in one direction, hardly any in another. The maximum we could see in PFSense was 36 meg per second but the metropolitan area network insisted it was 3 gigabits per second attack. Interestingly insisted it was dead on 36. I am not ensurely sure why that was. But the number of packets was again around the same, well approximately around the same as what we have seen previously.
Quite small packets. But this request, as I mentioned previously, everything came from either Germany or the US or probably through Google public DNS. This attack came within seven seconds of the attack starting, and those in front of you there are the only requests within a 24?hour period that were allocated that IP address. I am mot ensurely ?? I have replaced them with documentation space to hide the probably guilty because, but they might be innocent because we are not 100 percent sure there are ways around it, within 7 seconds it's quite close to the source of attack.
Then another attack that we had was a stateful attack. Didn't notice it much because the only way we noticed it was the utilisation on PFSense because if you had actually look at the bandwidth, hardly noticeable, and that is only about 10 meg. Packets per second, again relatively small but there is a bit of a block there between 23 minutes two and 24 minutes past four. But the states within PFSense were extremely high. And from that towards the end we only noticed this at about 3 o'clock, noticed that we could actually dump the entire state table within PFSense. We also had the Net flow data from PF flow D and yes, there were a significant number of queries per hour during that period to a specific host name.
Some 36,000 compromised or infected hosts could be identified, mostly hosting providers. Interestingly, some ?? a lot of the requests against ?? this time also included AAA A records, so the hosts were ?? a lot of these hosts were IPv6?enabled.
So, with IPv6 there is so many addresses that we could actually do something quite neat with this. With EDNS?Client?Subnet, Google and, I believe, open DNS provide the /24 with IPv4 and provide the /64 with an IPv6 record. So it's possible to just allocate the network prefix of the source request address and just put it in the interface identifier. Therefore, you have unique IP address for every single request source. Of course, it could take the next step further and encrypt it or hash it or even have a lookup table so off unique interface identifier for each request.
We haven't actually tried this but we have got the code for for it. The main reason we haven't tried it is some routing issues with IPv6 and the other issue it's quite do I have receive a request on any single interface identifier within that /64 although I believe HA proxy does support it but we haven't tried it yet. The attacks have stopped within the last few weeks, so we have little incentive to do it right now, but if they do restart we probably will implement that.
And the other aspect that we are, we have not quite tested but we have got the code ?? well, we have got the config for it is because all these attacks were against one host name, we could allocate a different host name to every single requester. Because the host is used for single sign?ones search engine opt my /TKPWHRAEUGS is not required. There is a /RO bought ? text file so search optimisation is not a problem. There are several hosts with redirect to this, so it's quite possible that the attacker actually was intending to attack a different host name. We have seen some attacks against other hosts, but nearly everything redirects to this particular host name. So the only problem; wild card records are a bit of a problem but PowerDNS is quite flexible with the pipe back end in a regular express can be used quite easily to match everything with an S?.
That is the end of my presentation. So if there is any questions, fire away.
JAAP AKKERHUIS: Any idea why you are being attacked?
CURON DAVIES: Yes I forgot to mention that. We strongly suspect that it's a student that is initiating all the attacks. There have been some suspicious VPN traffic that have been noticed, but we are not 100 percent sure of that. To hosts in some countries that I will not mention, but they are not countries that are normally trusted with spam, etc.. the other aspect that I haven't mentioned is that these techniques can probably be used for spam filtering and a few other options, in addition to just denial of service attacks.
JAAP AKKERHUIS: OK. Thank you.
(Applause)
And now well, we are making some time here, which is fine. Now we have Nicolas Canceill looking ?? using RIPE Atlas or now DNSSEC validation is being used in practice, we hope.
NICOLAS CANCEILL: Thank you. So, good morning, from the University of Amsterdam. And I have some DNSSEC news for you. So, I will start with a very brief introduction because I suppose you all know more about DNSSEC than me. And so it's insecure and DNSSEC is the solution.
A bit of history, of course you also know this better than me but it took long time from the first DNS specification to get to deployment of DNSSEC, at least at the DNS root level and the scope of this research was to use the Atlas network developed and managed by the RIPE NCC to get a better understanding of the user experience of DNSSEC, and so there was some challenges about how we can assess DNSSEC support and then, of course, how does this support influence user experience.
So the Atlas network for those who don't know what it is, it's a global network of probes developed by the RIPE NCC. It's based on volunteering so you can volunteer to host a probe on your network and it's just small USB powered devices that you plug in your network and that will be able to use all the properties of your local network. Now, there are about 8,000 probes in the network, about 5,000 of them are active at a given time, and despite its worldwide network, most of the probes are located in Europe.
The methodology was simple. We had the Atlas probes, we had the name server that we controlled and the big one was the resolvers in the middle. Since it's very complex system, there are a lot of different configurations, it's kind of a black box and by looking at the size of this black box we tried to understand how it influences your experience. So the best advantage this Atlas network gave us on previous research is true presence in the client network. Most research on DNSSEC deployment was focused on ?? I mean, the one that it was focused on on user experience, was using tricks, for instance, advertising networks that allowed them to force users to make requests to their name servers. But here, with the Atlas probes, we can actually get the real response that clients get instead of trying to guess what the response was at the end.
So, the brilliant thing is that the Atlas probes gets all the properties of the local network, including the default name servers that are distributed by DHCP. And by using packet capture at the name server we tried to balance or match the two sides of the experiment.
So there were some challenges. First of all, well, there is the black box of the resolvers and usually the two sides are not matching, and you will have the IP that of the resolver that probe is seeing and the IP that the name server is seeing. So the solution we used for that is to use the ?? prepenned probe ID and it allows to prepenned as an extra DNS label in the request. Moreover, there were challenges due to the fact that some probes or some networks have weird resolving set?ups, multiple resolvers, forwarders, etc., and misconfigure resolvers. The Atlas network has limitations. Most of the probes are in Europe and as you can see here, it's not really an accurate reflection of the distribution of Internet users.
The process of the measurements well, there is another limitation to the Atlas network you cannot query all the 5,000 probes at once, so we had to do repetitive process in order to have the biggest sample possible. We use 5 different zones, so one secure zone that has every, that is well completely complying with the DNSSEC and no signature at all, no delegation and three corrupted zones. One was them was using incorrect label count in the records, just incremented by one. The bad RR SIGs which was using incorrect signatures, just flipping a bit in the middle of the signature and NO RRSIGs was a signed zone with all RRSIG records removed. The methods were done using Python scripts and Atlas library from ?? and from Google in order to pass packets.
The name server itself was managed by NSD and the zones were created or managed with LDNS, which is also helped doing this corrupttive stuff. And the capture was done with voyeur shock.
The results, that is what you are here for. First about the resolvers, we wanted to know how many resolvers that had are DNSSEC aware and one of the best indication sincerely to send requests with do bit to see if the resolvers or forwarders setted themselves to request secure answers. So as we can see, it's not all of them, and we also looked at the resolvers that were able to forward the R SIG back to the client. This is very interesting in the light of Willem's presentation from earlier this morning, and application level validation, because if you get ? of the client you can do validation, of course.
The second thing we looked at is the support of the DS type, so you all know exception in DNS because it's the authoritative zone for DS record is the upper level compared to the rest of the records, and technically, a resolver that is completely unaware of DNSSEC should not accept to return those answers. So, we got actually a lot of answers with DS records, we suspect that this may be a bias because the DS record might get cached when you first work the DNS tree. The biggest thing we found out is the number of authenticated answers. This is tremendously high and compared to previous research is suggest that they also may be a bias in the distribution of Atlas probes, most of them may be hosted by network engineers that want to have DNSSEC and this could explain the high rate of authenticated answers we get. So we looked at the resolvers distribution. As we can see, most resolvers have less than 10 probes and we looked at the most common resolvers and most of them were Google and interestingly enough, not necessarily the Google public DNS.
There are the biggest results. So, the first thing we looked at is the protection provided by DNSSEC, which means if zone is not correctly signed, then your resolver should refuse to return an answer for this. And that is what you see here for those three zones, we counted ServFail answers and first thing you can remark is that we got 29% of ServFail instead of the 30% of validating resolvers. This seems to indicate that some networks have ?? are configured with multiple DNS resolvers, some of them able to do DNSSEC and some not. And as a result, some clients, when the DNSSEC answer is a ServFail, will fall back to normal DNS resolver and get bad answer.
The second thing we noticed is the difference between the result for bad level and bad R /S*EUGS and no R /S*EUGS. In the case where R /S*EUGS records completely absent from the zone and so from the response from the name server, then it seems there is an additional full back case of about 1% of the sample that will consider that if the R SIG is absent there is no ?? you just anticipate the answer.
The other thing we found that is even more troublesome is due ?? well, it was an unexpected result and it was due to the fact that we used wild card records to match the extra label added by the probes. What you can see here is an extraordinary amount of no error rating codes but the answer section of empty and this seems to be a bug with a wild card records we compared with the insecure zone which doesn't return any of them and compare with a secure zone without using the wild cards and we also don't seem this problem. It seems to be related to the fact that the wild card itself in the zone doesn't count as a label but when something matches the wild card it counts as a label and apparently there is a big problem with that.
We looked at the distribution of validating resolvers and we see that most of them are the resolvers that are used by few probes only, and we looked also at the protecting resolvers to the one that blocked the corrupted answers, and similarly, it was also from the less used resolvers.
So, in order to sum up the findings, a good, good rate of DNSSEC awareness that seems to be agreeing with previous research, a very, very high rate of validation and indeed, the fact that some networks have a configuration that makes them fall back to insecure DNS when the secure answer returns a ServFail.
Finally, it is very enDeering to see that almost a third ?? two?thirds of the client were getting the RRSIGs in the answer, and that means a lot of them would be eligible for doing application level validation.
Finally, the big issues we found: The fall back when the R significant is missing and the bad valuation of work out which is extremely troublesome. We tried to track this down and the only reference we could find was a bug in some BIND 9 versions in the package so it's still unclear why we have such a high rate of failure in that case.
So, that was my presentation. I want to thank NLnet Labs and especially which will /OPL Toorop and ?? this is my last series for the University of Amsterdam. I want to thank you all for listening to me and especially for the RIPE meeting for inviting me here. Thank you very much.
(Applause)
AUDIENCE SPEAKER: Andrei speaking as holder of one of the home routers were cz.nic, the rest, maybe you heard about it on Monday. There is this problem with wild card failing to validate, is actually exploring in the wild with DNSSEC validation in broadband router at users' homes and it's quite common programme which users are complaining the most that they are, they cannot see ?? some of the domains they just cannot see and it worked with other routers, what is the problem, so it definitely is a problem especially from ?? BIND of any current stable version of BIND, I think not only them but all other stable versions of BIND of any distribution according to my tests. Only this remark.
NICOLAS CANCEILL: Thank you.
JIM REID: Two questions. First, what is that strange thing you have got around your neck?
NICOLAS CANCEILL: I am sorry about that.
AUDIENCE SPEAKER: First timer. Educated.
JIM REID: OK. Anyway. We are not the fashion police, don't worry about it. I have got more of a substantial question, actually. You were talking about using the instance of the do bit to indicate DNSSEC awareness, and that gives me a little bit of cause for concern because as I am sure you realise, the BIND 9 implementation always sets the do bit on its outbound queries even if it doesn't have a Trust Anchor or the open SSL libraries ?? so I wonder if you try to counter that say, for example, by doing some fingerprinting on the resolving address toss find out is this an instance of BIND which might not be necessarily DNSSEC aware or some other DNS implementation which really is fully DNSSEC aware.
NICOLAS CANCEILL: So we indeed have this distinction between DNSSEC aware and validating able to do validation, and of course, there is a big difference because we see about 90% of resolvers able to set the do bit when requested but still only 30% are setting authenticated answers, so yes, there is a big difference between those two categories and there was unfortunately not enough time to look more into it and try to categorise it.
JIM REID: OK, thank you.
NICOLAS CANCEILL: Thank you.
JAAP AKKERHUIS: If there are no more questions, thanks, Nicholas.
(Applause)
And now that ?? there is also a report with a lot of details, not explain it here and available from NLnet Labs website. So I guess you can ask questions then.
Since we are doing measurements, here is Geoff, also looking at DNSSEC, but this time from a user's perspective.
GEOFF HUSTON: Thank you and good morning, everyone. I don't know about you but I have sat through a lot of DNSSEC talks and a lot of them have been let's measure the number of zones at a sign, you are looking at the supply side and it's look at all the zones are assigned obviously DNSSEC is a failure, success, or whatever, the real issue: Is that really relevant? Because you might sign but if no one validates you are wasting your time. And then we just heard a talk about trying to look at resolvers. But resolvers are really the middle entity. And the number of resolvers that validate or not, I contend, isn't that useful as a metric.
What you are really interested in, is what users are doing. If you are a server, an authoritative name server, and you are serving an unsigned zone and you are thinking about changing it to be a signed zone and you are thinking how many more queries will I get and more traffic will I get and more load will I get, the number of resolvers that do this doesn't really matter as much as the number of users that are going to query and do the full SIG chain. So let's look at the DNS again, and this talk will do this, but from the perspective of each and every one of you as users. Because the real questions are more about: How many users can retrieve a URL using v6? Will v6 DNS transport work? How many use DNSSEC validation? How many users, when you sent truncate bit all the time, will successfully resolve a name by going back to TCP, what is the failure rate? Because if we are talking about using DNS over TCP it might be useful to understand metrics of how many users you are going to damage. If you are thinking about let's use D names to fix up all the glue, how many folk follow it, how many users. So looking at the infrastructure, I contend isn't useful. It's bloody easy. Because, you know, here is this authoritative name serve right under your nose and this router, these traffic measurements and it's piss easy, look at this and extract some numbers. But are they useful numbers? Because sometimes when you measure stuff, it's seductively easy to talk about what you see as distinct from understanding measurements that talk about the issue.
And I contend in a lot of this that looking at the result and trying to guess the cause, is kind of stupid, if you can look at the cause you are a lot further down the track, a lot further down the track. So how do end users see things? How do we measure that? So, you know, counting zone files and what sign doesn't make much sense. Analysing log files and trying to infer who does DNSSEC validation from looking at the log files of an authoritative name server signed zone, doesn't really help. What we are really interested in is what going on out there with end users and the resolvers that they use. How many end users are actually using resolvers that perform validation? And that is the sort of underlying magic question. So you get down to this really interesting question.
If my minimum /PWEPBLG mark is a million end users all over the globe, how do you measure it? (Bench). Well, there is an obvious answer. Get the world's largest website to do your measurement for you, and, you know, any other popular massive web page can arm up with Java Script that works in the background and can measure /TKPWAZ I will I don't knows of users. They might do so, for all I am aware. The problem is with most of this stuff, they know the answers but I don't and you don't. It's not public data. It's their data. So, option A is, you know the answers if you are big enough. But that doesn't help you, and it doesn't help me.
So is there an option B? A ha. Option B: Try to deliver your code to a million users' machines. So what thing is everywhere? Ads. BotNets aren't everywhere but ads are everywhere. Ads are absolutely ubiquitous. Every news site is smothered with this shit. Ads are just everywhere. Everyone on the plant has ads delivered to them. So that is totally ubiquitous, no matter what you do, everyone runs YouTube, ads are absolutely everywhere.
Now, most advertisers think that text is crap. Most advertisers think that flat images are crap. Real ads an mate. Real ads have flash. Because they want action scripts, that if you hover over the damn thing all of a sudden lights flash, so advertisers demand flash. Which is code. I don't need to click. I don't even need to hover. As soon as the ad is given into your browser, the flash code is with it and it executes on load. No clicks required. So all of a sudden, I can load stuff just by having the ad delivered. And if the advertising company is silly enough to charge me by the click, then as long as none of you click, it's free. Make the world's most boring ad that is unclickable and you will get gazillions of impressions.
So there are a few slight constraints. It's got to be in port 80, I have been mucking around with others and it's a bit of a challenge. But that is not really bad. The ordering of things that happens in flash is bizarre, it's this weirdest code engine and asynchronous and Sprite animation, but as long as you craft your experiment right, you can get over that, so we did. It's easy.
What we do, is we sit inside an ad and get a whole bunch of one?by?one non?displayed PNGs, yeah? How do you get a one by one PNG? You have to do a DNS resolution, interesting, and then you have to do a get. Let's think about that for a second. Because if I make the DNS name point to me, and I am the authoritative name server, then I will see both the DNS and the web. And if I am very careful to make sure that the name hasn't been cached, then I get to see two things which are really quite fundamentally interesting: I get to see the end user and I get to see the address of the end of the forwarded chain of the DNS resolvers. I get to see the end user and the DNS resolver they ultimately use. And I can unite them together and go, that person uses this resolver. This starts to get really fascinating. And then, of course, I can get the flash speak script to go this is what I think happened, so I can even get results sent back so it's a two?way channel sitting inside an ad that you better not click because I have to bloody well have to pay so. This becomes fascinating.
The other thing about ads is they really work hard for me. They try really hard to never give you the same ad twice, so I get to see fresh IP addresses day after day after day, million after million after million. So the advertising channel seems to encompass most of those two?and?a?half billion unique IP addresses. This is amazing.
As I said, don't click, because I have to pay. So the impressions is just that ?? the other thing I can do, because the advertising agencies are really good, I can do time of day, I can do demographics, with a little bit of mucking around I can even do regions and if I really try, countries. But for this kind of experiment I am talking about here, I don't. I just simply say give me bulk.
So for 100 bucks a day and that is what we send, 24 hours of work, we get around 300, 400, 500,000 impressions per day all over the planet. So, 100 dollars a day, one dollar CP M, which is dollars per click but we will pay for 1,000 impressions, 350,000 placements a day. How big is Atlas? This is kind of a different way of looking at things. You don't have as much flexibility, but you have astonishing volume. So how does the ad network work?
These guys have spent a lot of time. We use Google and we start reverse analysing Google and we noticed that Google are runs a 24?hour cycle based in my time zone so at 10 a.m. UTC, which happens to be midnight my time, on the first day it day it says, he is going to spend 100 dollars, let's saturate the ad system and see how many get impressed. That was a lot. By about two hours later it's used up most of my money, target achieved and dropped off.
Day two it goes, oh, I remember that. So the next day, normalising it out, it starts more gradually and starts to learn how to make 100 dollars and no more and no less.
Day three, getting even better. Day four test the other end, doesn't like to what it sees. Days 5, 6, 7 pretty amazing. The advdertising network itself sells tunes to give me ad placement across the first 20 hours and holds the last in reserve just in case it doesn't take all of my money. If it spent 90 bucks after 20 hours another four hours.
So what do we use it for? The other thing you will find with an ad network is that each ad needs to be approved. So every time you want to do a different experiment, you sort of run the random wheel of the weight of the advertisers and this goes Google, to come back and say yes. We found this was a bit irritating so what we then did was do one generic ad where the first thing the impression does is go back to us saying, what ad am I meant to do? And we customise the ad so we never need to go back to Google for new advertising campaign, it's all one massive campaign and you can turn on different experiments any time you want and Google don't know. This is cool. So we can measure a whole bunch of things without changing any of the advertising parameters. This just works.
How do we set it up? We found RTT is pretty important, so have you, so we actually target the ads against three servers, one in each region. We have Iraq ?? rent machine in US and rent machnines in Australia. And depending on where you are approximately by your source address, we send you to one of those three. What is on our servers? We run BIND, we run apatchy because we need the web fetch and we log everything by packet, a full packet capture, so an entire packet capture running all the time. And we send you to the closest serve. By day the DNS logs and all the packets.
For us, casing is shit. We want to see you. So we use unique DNS labels all the time. Which for DNSSEC is really a pain because for DNSSEC work those unique labels need to be signed. And so far, what we are doing is we did a domain of around a million zones and then recycle across the million signed zones with short TTLs. It's not totally satisfactory but it's about the best we could do inside that system and it's a unique name, you need to talk to me because I am the only authoritative name server so I get to see what you are doing all the time, caching won't help. Because every time is unique.
So, what can I see? Well, I have done a whole heap of work at looking at v4 versus v6, the performance RTTs, your fetch performance data mode, connection failure rates, all that have stuff becomes obvious when you start doing this kind of work and for DNS, well, how many folk actually do DNSSEC validation? How long does it take? How many packets? How long do you actually spend on average when validation occurs? How much slower is validation? All that have behaviour is exposed and is sitting inside the data.
So, that is let's sort of generalise a bit and get back to the DNSSEC answers.
The advertising system is really quite amazing because we can direct a whole bunch of users, arbitrary large numbers of users, depending on how much you want to spend on advertising, to effectively do DNS tasks and web fetch tasks of our choosing. We can create the cause and look at the effect. So instead of looking at what happened going, I wonder what the users were doing, we are going: Users are doing this, here is what we see back at the authoritative name servers, the authoritative web servers, whatever. So the user actually doesn't know all this is happening, they don't contribute all their own measurements, they can't lie. I measure back at the server of what they are doing. All the data collection sits on the servers.
What have we done. How much of the world does v6? Well, if I give you these tests, I can figure it out. If I give you a DNS name that is dual stacked, so there is an A and AA /AFPL A, I want top know which one you are going to choose, which do you prefer. The second one I give you only has an A record so ?? using v4. If you are v6 you won't fetch it. This thing only has a AAAA even if you prefer happy eyeballs whatever, that third URL will expose the fact you have got an active v6 stack. Except if you are Microsoft and you are running some crap old thing that has Teredo on it in which case I have got to hit you a bit harder and offer awe v6 literal address, no DNS and a nan has Teredo, at that point Teredo will fire. So those four URLs are enough and I wait for ten seconds to make sure you have had enough time to do it and you send back a result. If I am looking for the absence of; sometime users have short attention span and they move away, as long as I get the result URL I know you have tried for ten seconds and those results are good. That is one way of doing v6 deployment, separate talk and data.
Does everyone see my route? Whose doing filters? Well, sheer a DNS name that resolves into a commonly accepted pretty old prefix that we think everyone routes. Here is a second DNS name that maps into a prefix that I am looking at and a result. Does everyone fetch most /PWO* both of those, are there systemic blocks on the second? We were looking at one point about net 103 or something to find out because it was a relatively recent prefix at the time, about two years ago, if there was persistent blocking for that prefix and yes, in Czech five ISPs were blocking. So again really easy to do.
So DNSSEC, I am interested in realistically three things: Here is a name that is signed. I want to see if you do signature chasing, do the whole validation thing. But just to make sure, the second one is signed badly. I want to understand if you have got a second resolver, if the first one says ServFail, I want to see if you try again. And you go to a second resolver that says, well, I will do it and I won't validate. Surprisingly large number of folk do that.
And the third one is the control, there is no DNSSEC whatsoever, it's just a standard name. Wait for ten seconds and report the result.
So three URLs, the good, the bad and the control. Unless Google pay them 100 bucks per day, they are grateful for the money and I am grateful for all the work they do for me, this is brilliant.
Here is some results and this is from December but I will show you the URL that /TKPEUFS you up to the date. In December, 5.6 million experiments we did across the globe and 4.9 million of them ran all the way to the 10 seconds. And just looking at those logs, and doing the analysis, it's pretty clear from that data at that time, 6.8%, the world's users did full validation. And they fetched the one that was validly signed, they did not fetch the one that was badly signed, and I saw full SIG chasing every single time.
Oddly enough, another 5% of the world dig the SIG chasing, they appear to go, this is cool and you saw a second query for the A record. In other words, they had flicked over, ServFail, oh, that means, the service failed, I will try the next resolver. Bang, this doesn't validate, 5% go I will fetch it anyway. And the rest, I don't see any DS or DNS key fetches so 88.5% of the world are completely DNSSEC ignorant. That was the view from December. That is what I just said. Why should I say it again?
Interestingly, you kind of think is the Internet home genius, is all the world coloured the same way? It's not, is it? How much not? So I started breaking this down because I know the end address of the user, I am just not looking at the ?? I know who it was. I know where it was, I know which originating AS it came from so I can do this kind of table, which countries did they come from.
70% of the folk in Yemen validate, 70. Sweden: 62%. Slovenia ?? United States, America, 20th. Poland isn't there. And down the bottom was the world figure, so 6.7% validate, 4.8 first give it a try and fail back anyway and the other 88% don't. So an interesting kind of map and I am surprised to see Vietnam, I am kind of surprised to see Thailand, Indonesia, the occupied Palestinian territories, these were not normally folk you would put in a top 20 list of countries. That is the top and that is the bottom 20. China, interesting at the time, this is December. Those well?known economic power houses: France. Why is France in the bottom 20? Portugal, I would have thought better. Korea, which was top of the list for broadband penetration, about five years ago, is bottom of the list for DNSSEC validation, where 99.3% of Koreans did absolutely nothing. I just filtered this out, there are more countries than 118 for those who gave me more than thousand data points.
So there is a lot of variation from 67% in Sweden to fractions of a percent in Korea. The world is not coloured uniformly. But I was doing this, and I was looking at the eastern territories, which they like to call themselves, New Zealand. We win. We have now started to get a bit fancy with Google charts and maps and I like this, it's actually really good to set it up on database, whip it down. That URL down there, you might like to play with, because the maps is interactive, it drills down and goes up and goes sideways, it does all kinds of cute shit. That is the current map of today's world, and again, you know, what is coloured way over in the green? Greenland. Sweden. Ireland. But not the UK. Bits of the Middle East, bits of Africa, lots of bits of Africa. This is kind of curious.
Now, there are a number of questions that I sort of come straight up with that I am stunned about. How long have we been bashing the v6 drum? How many, to the closest thousand, v6 presentations have you sat through? Come on, I might get numbers at least, 10,000 or 20 ?? all of you. But why is it that that DNSSEC is at least three times the amount of penetration than v6? And we haven't bashed the DNSSEC drum very hard at all. Have we?
What is the accidental victory here that has made DNSSEC a run away success? This to my mind is the obvious answer and this is March 19 so it's little over a year ago ?? Google's public DNS turned on DNSSEC validation. Interesting. Don't forget, I can match the user to the resolver they use. So I did. For December.
Of the world's users, 10% used Google, one in ten, in December. Half of those, 5%, only used Google, so even when ServFail, whatever they said, ServFail can't go there. Only Google trusting. The other half had a bet each way, and said I will use Google but if it gives me ServFail I will go and use something else. So, I can do the same country table again but I can add some more columns. And these are: Of those who validated did they use Google?
So Yemen did not use Google, neither did Sweden, or Slovenia or Estonia, but in Vietnam if you validate you are using Google. 98% of those who validated were using Google's public DNS to do so. Tanzania 94, Algeria 71, occupied Palestinian territories
And so on and so forth. Interesting.
So, I can also do this by origin AS, because you can, the data is there and you get this kind of table. And there are some interesting things going on. If I can find it again, Linkum Spar in Italy is doing amazing well because it hands awful its user's requests to Google. As do the occupied Palestinian territories.
So you can actually see which ASes are doing it and which resolvers they are using by much the same way. But we are in Poland aren't we? And interestingly, in Poland, in February, someone turned on DNSSEC validation. Because there are two lines here, the red line which no one can see apart from those in the front row, which is pretty steady at around 10%, says that 10% of users' queries are handed to Google, in Poland. But interestingly, since February one?third of users in Poland have their queries validated with DNSSEC if DNSSEC is there. Brilliant. And if you play around with this same URL because you know it's ?? that is a Grong?grong, a small town somewhere in the outback of Australia, you can click on these ASes and it will show you exactly what is happening in each AS, I have forgotten which turns it on but one did in February. Well done them.
Look up your own AS and see what is going on. Because all the data is there.
So, I have got a bit of time. And I have got some things to sort of throw back at you:
This is no longer a 5 /12 objecting text game, is it? The signatures are enormous. So all of a sudden small query massive response, turn on DNSSEC, this is fantastic. The more signed zones there are the worse this gets.
BCP 38 is deployed, oh, God. Nowhere near enough. Some but nowhere near enough.
So what is the answer? Do we keep on bashing the BCP 38 drum? Do we say this is a TCP problem? If you really want to play the DNS game, answer a packet. This is no longer stateless query response, it really needs us in /SHAPBD shake. How many folk do ?? when we measure users and I can and I have, how many users will fail resolution if the only way the zone can be served is over TCP? ? Because that is the other colourry to that question. If you start serving exclusively over TCP will your servers melt? Or not? We should think about that because that is today's question.
Because once you start doing massive responses in UDP, you have got all the machinery in place to mow down everybody through DDoS. That is not the right answer.
So, the next thing: The standards guys in their search for backward compatibility really, really did us a masstive disservice. Because when I look at the badly signed domain, and I look at the number of queries for the badly signed domain versus the number of queries for the good domain, that is signed, I get up to 30 times the query rate. Because most DNS resolvers are obsessive /KPULS sieve. And when they get back ServFail, they go, oh shit, let me try somewhere else, somewhere else, somewhere else. Early versions of BIND, when they got serve failed, tried every single NS. If you were five levels down, every single NS on every single level is a lot of queries. So failure in this mode is really, really bad because the signalling is really, really bad. So we are unleashing DNSSEC on an unsuspecting world with a failure mode which is crap.
Now, some folks have said it's a cost, it's a moral argument, if you put up badly signed DNS you melt and that is just desserts. I am not sure that is really true. I think richer signalling would help. At some point you have got to do more than ServFail as a responsibilities, you should do something.
Something that came up in the last talk: 84% of queries have EDNS 0 and only 6% of clients follow through. And this is because, as far as I can see, it's all BIND, for some time now, BIND has been setting EDN0 and DNSSEC OK, if you are serving a signed zoning you are pumping back the big answers for 84% of all of the clients out there on the entire Internet. But the next one is really weird. Is the DNS being vaguely prescient because it doesn't know that I am signed until it gets an answer and nothing is cached. But very subtly, I see relatively more queries with the DNSSEC OK bit set when the zone is signed. Spooky. Really weird. Shouldn't have happened. I don't know what is going on there. Maybe someone else has some clues.
I told you DNS and Google was 10% in December, and if you go and look at that map, that URL I gave you and you actually look at the number of users today you will find 16% of the earth or the user population of the Internet which is evidently going to be 3 billion by the end of this year, uses Google. One in six of all users have everything they do in the DNS forwarded up through Google. And we are worried about state?based surveillance. If you think about the rich vein of information about everything you do, every site you go to, everything that that requires a DNS step before you go somewhere, on?line anonymity and privacy, how do you feel about one in six users?
Two of my kids are actuaries, and they sort of extrapolate out from small samples to populations and they reckon if you get about one in 1,000, you know, the world, you only need .1 of a percent to do a really good statistical average of the planet. 16% is a Godsend. There is nothing you don't know.
So this is the map of the world day by day through these ads since October. And the red line is the line using Google and inobjectionably, despite Snowden and everything else, it's OK to send all your data to Google and more do so every day. Cool.
Google, 16%. So just as I was doing this slide packet this came out and I am thinking, you knows even more than I know, but it's true. And when you do a dig back into Google public DNS back into google.com, you find a reference to the same master plan and there it is F you have got that much information you don't even need a website any more or search. You know what everyone does all the time, every time.
However, back from that into actually I suppose the met at that thing here. Watching users makes sense. That is understanding and measuring what the Internet is all about because if you can measure what users do you really understand the Internet. Looking at infrastructure and working backwards is tough; you are guessing. If you really want to measure what users do, make them do something and measure what happens. So that technique of putting code behind ads is brilliant as far as I can see. It triggers particular themes over the planet which you can direct back to you and understand behaviours. What is the time penalty of signing a zone, how much longer does it take, what is the failure rate? Etc. All of that stuff sits in the data. So we know an awful lot about the DNS and DNSSEC, even today, and we will continue this work and continue reporting and hopefully there will be some more interesting stuff to talk about next time.
I have overused my time but if there are sometime for questions I will happily try and take them. Thank you.
(Applause)
Warren from Google. You know my answer already.
Warren: Yes, I do but I am going to do it anyway. So, Google might collect all sorts of information from users. The one place where I would encourage people to go and read the Google public DNS privacy stuff. The one place where we don't or one of the places where we don't, is from this. I realise that proving that is impossible. But I just wanted to mention it again.
GEOFF HUSTON: It's timely and relevant to the video that Google have made undertakings for that and they are undertakings such that they don't analyse or make you don't leverage this data that comes in from your data logs.
WARREN: We do many queries things just not there.
GEOFF HUSTON: He wasn't near the microphone when he said that.
SHANE KERR: So I don't think it's as much as of a mystery that queries for signed domains come in with the do bit set more because, well the users aren't randomly distributed right, certain users are more likely going to search for certain domains and sit behind certain ISPs, if you are in Sweden you might have a greater chance of having do bit and going to a authoritative server that /HA*S happens to ??
GEOFF HUSTON: We need to talk about the experiment. It's not quite like that because every user across the planet is getting the same three URLs but what I am finding the URL for the signed zone tends to get more in at 0 DOB setting, I am going that is so creepy, that is not funny.
SHANE KERR: Yeah you are right. That is the UFOs ??
GEOFF HUSTON: There is a scientific answer that doesn't involve mysticism, but I have yet to understand what that answer is.
AUDIENCE SPEAKER: A bit of a detail, perhaps. This is about the ServFails. The way you describe it, it makes me feel like you said there is this ServFail traffic between the authoritative and the recursive. There is no ServFail traffic between authoritative and recursive, it's between recursive and stub. Now I think you can measure something there that is, is there a failover if the stub gets the wrong answer and they query again, I would think that most ?? some implementations have at least have something that is called a bad cache and basically said I just gave you ServFail because I didn't validate, don't expect me to go out now and validate again and do validation again. There might be other implementations that do so. There is something to learn there.
GEOFF HUSTON: I have certainly observed in response to this that the behaviour under failure is absolutely fascinating. Google's public DNS when any of the NSes that get queried basically it doesn't validate, immediately back off and go that is it, the user is going to get an answer, I am not going to bash away any more. Other forms of DNS resolvers go well there are three NSes, I will ask the same question of the second and the third. And what we are finding is that the badly signed zone attracts a lot more query traffic at the authoritative name server than the good signed zone.
AUDIENCE SPEAKER: That is not more traffic at that one particular, it's just banging the whole set of authoritative servers.
GEOFF HUSTON: Well even if I have got multiple NSes in my parents but not down low I will get more queries weirdly. It's trying to the full chain. 2010 gets failure, it doesn't diagnose what part of the validation chain was wrong. Most resolvers that I have observed, seem to go failure, throw.
AUDIENCE SPEAKER: Clear. And that is Postel principle at work, I think. So really, trying to figure out, try to get a good answer if there is one.
GEOFF HUSTON: Well, trying to understand what failed would be even better because you know what you throw. This is a cheap answer, I will throw the lot. But it's an expensive answer on the authoritative name serve.
AUDIENCE SPEAKER: Nicholas. I am also very worried about this you found, more do bits on the secure zone, can you just say how much more it was?
GEOFF HUSTON: I am guessing, because I need to go back and do the data and I did the slides a while ago, around 5%, enough to stick out consistently going this is spooky prescience. But I suspect what is going on there is more repeat queries going on. But I have yet to go through the full analysis. I said I found it silly and strange, this name you have never seen before if it's signed you get more DO bits.
NICOLAS CANCEILL: I will be waiting for your detailed analysis.
ARIN: Thanks very much for the talks, it was amazing. The thing that got me really worried is the thing that you mentioned about the information that you can leak out over DNS via ANS, I maybe I have to see here I am not Java script programme, did you also look into what data you could get out from the browser history, cache ?? different settings, over DNS?
GEOFF HUSTON: No, no.
AUDIENCE SPEAKER: Theoretically, did you think about what is possible there because that is a huge global privacy issue if it's possible?
GEOFF HUSTON: The Java Script and flash libraries that are available to you do limit you an awful lot and typically what you can do is a get.
Now, I have exploited a simple port 80 get and you load up a whole bunch of DNS and some URL behaviour with protocols, v4, v6. I can't do anything more than that because I can't reach in and twiddle with you. I can make you do gets. And that is enough for me.
AUDIENCE SPEAKER: Sure I understand that.
GEOFF HUSTON: If you are worried more about what Java Script can door to you, the Java Script standard forum thing and you can talk to them about that.
AUDIENCE SPEAKER: I wanted to raise that point, since you showed really how to do research on a massive scale, that might be an issue, actually.
GEOFF HUSTON: As I said, we have found that the ad folk, when they go and check our ads, basically are looking for the limitation of behaviour and the libraries that are accessible to us through the ad system typically do very simple stuff, it's a get. But I am saying that is enough. It does everything I need. Thank you.
AUDIENCE SPEAKER: /HRARS. From Netnod. I thought about what you said about the problem where you receive DNS queries that seem to indicate a full validation and still you obviously the user gets a ServFail. Could it be the case that the web browser talks to forwarder that tubings a DNS resolver that talks to you and that the resolver does its work and that the forwarder doesn't understand the A B bit and ??
GEOFF HUSTON: We used to do maps of the Internet with ASes connected to others and we'd draw these fluffy cloud on the board and put lines on it. Do the same for DNS forwarding. I challenge you. Between the server fails and the crap has in the DNS and the issue is, there is no history, so the user asks their resolver, magic happens, and the authoritative name server gets a bunch of queries from somewhere else. Now, in the world's simplest model, never happens, I have a resolver that uses no forward and it's asked the authoritative name server. That never happens. What always happens is there is this indescribable shit out there and one or two queries pop out and they may pop out over time because once you ask a DNS resolver to do something, even if the user goes I am bored, I am leaving, the resolver goes no, no, I must solve this question and you find sometimes the get happens because it's gone to other resolvers. The first is going if I ask a different question do you think I might get a different better? You can trace this. So the DNS, I am always amazed it works in realtime. Right. It's just fraught with shit. And it's just so overprovisioned that they ever see daylight because the number of queries that comes in versus real traffic that is really quite high. I find trying to ask the question: How many resolvers validate, is an unanswerable question unless you are God. Because no matter where you sit, you actually don't understand if I am a validating resolver or behind me is a validating resolver that is sending me questions and I am just forwarding on. You can't tell. And so I don't think the question makes any sense at all. The only question ?? I followed that path and got nowhere and I thought, God, I am only interested in users, stuff the DNS, I am interested in whether users get DNSSEC validation or not and I want to know where they are and which ASes are doing the good stuff and who is being lazy. And I thought that was useful. And it's the best I can do.
AUDIENCE SPEAKER: So I am ?? I was also kind of curious about the additional numbers of do bit queries for signed zones. Could that be from the data because there are more queries you need to ask for DNSSEC for the keys and stuff like that or maybe P ??
GEOFF HUSTON: I was looking at the basic A queries, it wasn't the DNS DNS query, it wasn't the SIG chasing; it was the A query and I was actually looking at zones that were validly signed versus unsigned because I still get 87% on average whether it's signed or unsigned, have that EDNS 0 do bit. So as I said, there is still some work and analyse the repeats. Because it's all about repeats. I still think it's really crazy, if you are an authoritative named server 87% of the time you are doing the full SIG stuff and sending back all this data and most of the time everyone is going didn't care, not listening. This is stupid. Validate. And if Google can do it, so can you. Thank you.
(Applause)
JAAP AKKERHUIS: Thanks. Well, it's lunchtime and hope you are back in time for the second half. See you then. And don't forget there is an overflow room downstairs.