These are unedited transcripts and may contain errors.
Plenary Session
12 May 2014
4 p.m.
CHAIR: Hello. Welcome to the second session of the Plenary. The session is chaired by me, Benno Overeinder, and Andrei Robachevsky.
Our first speaker is Russ White and he will talk about network complexity. Russ, the floor is yours.
RUSS WHITE: It's kind of cool to be this early in the conference schedule because your minds aren't full yet. It means I get to fill them up very well.
We are going to talk about network complexity and some trade?offs in this. My talk is, if you have ever seen Dave Myer talks about network complexity I have the anti?Dave Myer's talk on the anti?complexity. He is full of math. Mine only has three or four graphs. And the other thing is, he want to talk a little bit about from an operational perspective, what I really do is network design engineering, architecture type stuff, so, while I do a lot of routing stuff, my Twitter handle is routing geek and things like that, I try to focus on the practical application of this stuff rather on than just on the theory.
So, let's tell you about some of the things we talk about when we get to ?? when we talk about network complexity some of the things we talk about, we say things like it won't scale. How many times have you heard that? It won't scale. It's not elegant. We need to reduce our complexity in order to make this scale or to make it elegant. We talk about al comes razor and some other things like that, but we don't really define these things well and we don't think through what we mean when we say things like it won't scale.
First of all, I'd like to point out that network complexity is necessary. Complexity is a fact of life. If you have ever dealt with a lot of tiering, tiering proved in ?? and you will see one one of the tierings chart in here, that essentially, you are always have this point where cannot get three out of three things and we'll talk about this in a minute, but complexity is the way you try to drive down and you get your third thing. So, when we talk about cat theorem and other things like that, we'll talk about that in a second. But basically, complexity is necessary to build robust systems. You can't get away from complexity in network design and architecture. You can't get away from complexity in proposal design. You have this curve here where you have a robust section. You drive your complexity up and then you have a fragile section of your chart. So as you drive your complexity upper increasing robustness in your system to some degree. When you get past that sweet spot, you actually start making the network or the system more fragile. The whole key in the complexity game is figure out where that sweet spot is.
This is a great quote from Alderson and Doyle. "In our view, however, complexity is most succinctly discussed in terms of functionality and its robustness. Arises primarily from design strategies intended to create robustness to uncertainty in their environments encountered in component parts." What we are dealing with here is, we are trying to make a network in our case, a network architecture work well within a large set of circumstances or changes.
However, complexity is impossible to solve. So this is the other side of the problem. Here is my tiering chart. C is less than or equal to 1 over R, if you have dealt with tiering much you will have seen this quite a lot. Cannot get to the lower left hand corner of this chart. No matter what you do, cannot get to the lower left hand corner of this chart. Complexity, what it tries to do is ?? in this case I have increasing fragility and increasing cost and it's impossible for me to get to cheap and low cost, or cheap and very low fragility at the same time. So this little corner of this chart is always impossible. Complexity tries to drive the curve out of this curve here tries to make the bend sharper in this curve.
However, you can't get there. This is a turn proof, you can't do it. So there is a sweet spot where I'm optimal. This is what I'm trying to get so. So fragile, if you have heard Dave Myer do this talk on network complexity, he talks about robust yet fragile. Cap theorem, everybody should know the cap theorem, even if you don't already, if you are familiar with, it you can have a database that is either consistent, you can have it available or you can have it are partitionable. Choose any two of the three. You can't choose all three in database design. What's a routing proposal? It's a distributing database. Hence I have the exact trade?offs in the routing proposal design. Particularly BGP. You have probably heard of QSC before, which is quick, cheap, high quality, so I can choose two of the three. You have heard that before, right? I get my choice ?? I can say as a coder, I say, all right, you have your choice between fast, cheap, and high quality. Pick two. You can only do two of the three. This is another tiering issue.
So all of these are symptoms of this same problem in the complexity world.
One of the interesting things about network complexity is it's organised. When we think of complexity in the modern world we tend to think of disorganized or random complexity. When we start thinking about quantum mechanics and those types of problems, we can take a set of pool balls on a table and we can roughly tell you exactly where every pool ball is going to be through some router complex statistics. The thing is the pool balls have to be moving randomly. If I put the pool balls in a specific order on a particular type of pool table, so that all the bumpers on the table lose no energy when the ball bumps the edge of the table, I can actually make it where you can't statistically predict the position of any ball on that table ever. However, you can intuitively know where all the balls are because you can see the pattern. So there is a difference between the two edges of complexity. Organised complexity does not lend itself to statistical analysis. There is actually a really, really good paper by Weaver written in 1948, if you are going to get into network complexity you really need to read this paper, even though it's not about computer networks, where he compares and contrasts disorganised and organised complexity, and the difference between the two, and he describes the two really well. And he talks about what the problems are with organised complexity.
So, again, statistical models will be of limited use in the organised complexity realm. Where are we in the networking field? We are in the organised complexity realm because we design our networks with complexity in them to provide robustness. So when we start can looking at complexity models we need to take into account that I'm actually designing my network to be complex, to be robust. So, here is the statistics are going to tell if you there is information, but it's not going to tell you what the information means.
So we must interact with intent when we look at network complexity. The network intends to solve a set of problems. How do you measure it? We haven't done a very good job of it over the years.
Bottom line: Complexity in networks is necessary. It's impossible to solve. It's difficult to impossible to measure or to characterise. So, we could often get on our pirate hat and say abandon all hope, ye you enter here. This is a field we are required to deal with but it's hard to measure, and it's hard to cope with, and a lot of it is seat of the pants flying.
So what I propose is that we try to look at complexity as a set of trade?offs rather than as a pure number, or something else. So maybe we look at where we see trade?off points in our network. So, I'm going to give you an example later with fast reroute. I am going to propose to you that almost every network design and architecture problem you face today is a trade?offs. SDNs, right, do I sent lies my control plane or do I use a distributed control plane? This is a network complexity trade?off problem. Okay. IP fast route, these are network complexity trade?off problems. I have got to learn to look at what my trade?offs are in my network design, proposal design and all of my architectural pieces and try to figure out exactly where that complexity curve lays and see if I can start to intuit where the sweet spot is in that chart.
So, here is a few things we can look at. Complexity versus the problem. Harder problems tend to require more complex tools. Does anybody build an entire house with screws? No, they use nails and they use nails in some places and screws in other places, so you always try to look at the problem you are trying to solve and figure out how to use it or what to do with it. This is one of my favourite quotes of all time. This is from a friend of mine who was asked how many neighbours you can get on a single ERGP router, it was 7200 at that particular time, and his answer was how many balloons fit in a bag? It always depends on whether the balloons are blown up or not, how big the bag is, what size the balloon is when you below it up, those types of things.
Okay, so you have complexity versus your toolset. So you are got complexity against your problem. What's the problem I'm trying to solve? The second is what's my toolset? What tools do I have available? Do I have a hammer? Do I have a nail gun or do I have a sonic screwdriver? Right? Now, there is a trade?off here as well. Because when you use a tool to solve a complexity problem, you have got to maintain the tool. So now you have introduced complexion in order to reduce complexity. There has got to be some insanity stopping point there where you say, enough of the tools, let's go back and readdress the original problem.
Then there is complexity versus the skill set. I don't know, you know, I'm always with this guy. This is...
Give me an afternoon nap any time...
So things that are complex for one person might not be for another. It's not a matter of intelligence; it's a matter of focus and training. How good is your knock at ISIS versus BGP versus OSPF, versus writing pearl scripts for an SDN controller? So let's talk about what we do with each one of those things.
So then complexity versus complexity. Complexity comes in pairs, there is two RFC 1925, it's still my favourite RFC of all time. There are two very good quotes from this here, there is essentially, even though when they were written they probably weren't considered a complexity issue, they are actually a complexity issue. It is easier to move a problem around than it is to solve it. Right?
Okay. I have 10,000 line ACLs, I don't care because my network management application takes care of that. And my network management application is taken care of by somebody on the other side of that cubicle wall, right. If I can throw the complexity to somebody else, it makes it really simple for me. That doesn't get rid of the complexity; it just moves it. Okay, this is always possible to add another level of indirection. How many tunnelling proposals do we have today? And how many will we have in five years? Probably twice as many as we have today because we keep adding layers of indirection and different styles of indirection and we think we are brilliant for doing, it but, hey, stop the madness at some point, right.
Decreasing complexity in one part of the system will almost always increase complexity in another. Here is my little complexity graph. I have complexity, function, and counter?complexity. What I'm looking for is the sweet spot where these three things meet. I want to drive my complexity so I get the most robustness and the most functionality in my networks, but I don't want to pass into the fragility point. I don't want to make my network fragile.
Okay. 'The Point' probably nobody in this room has seen the movie 'The Point', it's a 1960s movie ?? anyway ?? you see everybody is going to go look it up ?? it's about a kid who was born in a society where everybody's heads were pointed and he was one with a round head so he was banished to the pointless forest. And it gets worse from there. Anyway, you can never reach any desirable goal without increasing complexity. Whether I want my network to be more robust, to provide more functionality, more services, I have got to increase complexity to do it. Decreasing complexity in one place means increasing it in someplace else. Decreasing it in one place will lead to suboptimal behaviour in someplace else. My example is, aggregation is actually a complexity trade?off. I am reducing my control plane state which reduces my complexity in the network, but how am I doing it? Every time you remove information from your control plane, you end up with supply optimal routing. So therefore, you are reducing the optimality of the network pathing and utilisation by reducing my control plane state to I have reduced my complexity but I have actually let the suboptimal behaviour someplace else.
So here is the thing. I want us to stop asking bad questions and I want us to start asking good questions. How complex is this is a bad question. Compared to what? This is a complex question. It's logically invalid. Will this scale? Okay. I don't know. To what size? What are my requirements? Good questions.
Will adding this new thing increase complexity? If I really want to deploy MPLS on a non?MPLS network today will it increase complexity? Not only will it, where will it increase the complexity? What tools do I have to manage that complexity? What are my trade?offs? If I reduce complexity here where will suboptimal behaviour show up? So it's easy enough to say I'm just going to throw the complexity over the fence. Great. Where is my suboptimal behaviour going to show up if I do this?
Complexity at the system level is about trade?offs, not absolutes.
Let's talk about fast reroute as an example. I want to start by saying something simple here, okay. I have a long and history with fast reroute. I don't think it's a bad thing. In fact I helped develop several of the techniques that are discussed on these slides so I'm not telling you that fast reroute is a bad thing; all I'm saying is that fast reroute introduces complexity in order to get at a set of problems that you are trying to solve. So what I want to do is try to expose some of those complexity things as a very specific example.
So let's talk about pre?compute. Pre?compute is LFA, so here we have a very simple little five?node network we don't like to think of routing protocols as doing link?blocking, but they actually on do a per destination basis. Let's say I'm trying to get 192.0.2.0/24 ?? so I have a route from A, my best path is A B E, but what I find is that my, if A is attracted to C it's going to loop along this link here because C as best path is split between these two paths. What we'd like to be able to use is to use C as an alternate path.
So what I do is I actually check C and I say, okay, what's your metric? How would you get to E? And I make sure that E is not going to actually loop back to myself in order to send traffic there. So A can compute the cost from C to determine if traffic forwarded to 1920 /24 will, in fact, be looped back to a. If not, then A can pre?compute this path, install the route as backup path even though it's not ECMP. I can gain a faster convergence. My cost is my additional computation at A, which is almost nil. There was a thing a long time ago about how many SPF calculation it is would take to run IP fast reroute calculation to say pre?calculate loops. It turns out that the cost is, in processing end, in cycles and memory is almost nil. It's almost nothing to pre?compute these paths. So, you can design the cost is the computation, in designing the network with LFAs in mind,
If I'm IP stalling an LFA from A through C to get to this destination, I need to think about the quality of servants on this A to C link and I need to think about utilisation on that A to C link. I can't just shove the LFAs out there. I have to make sure the network will handle the load if I fallback to that.
What is my next choice if I get to the point now, my next step is to do pre?compute tunnelled LFAs. So if anybody is familiar with the IP fast reroute space, but there is Q space and P space, let's assume that the metrics are such that the traffic sent to C would be looped back to A automatically. Because C is actually using A as his best path. So, this is essentially a split horizon point in distant investigator protocols or in OSPF, in link state I would take this A to C link out of the path towards 1920 /24. So, what do I do here? What I want to do is I want the traffic from A to go beyond C, past this split horizon point past the PQ boundary and reach do. To do that I need to tunnel through C because C is going to loop the traffic back so I have got to bypass C by tunnelling through C. I have got to do this with not via. It's a brilliant scheme, just injects a lot of new state. MRT, everybody read the MRT drafts and understands the formulas and everything. And there is remote LFA, all sorts of ways of doing this, right.
All of these are essentially the same thing. They are all computing a tunnelled LFA or a tunnelled point past the split horizon or past the point where the traffic will loop back to me, and allowing me to tunnel to that point.
Gains: Relaxed network design rules, rings are okay. These will converge faster. It eliminates micro loops. I get faster convergence. My cost, additional computation a day, almost nil, some form of dynamic tunnelling, all right, what type of dynamic tunnelling do I want to stick in my network? If I'm not deploying MPLS, then I have got to start thinking about what are the Stuart concerns for instance in having an open?ended tunnel that terminates at D. I have additional control plane states. Somehow or another A has to discover this alternate path and has to know about it and so you have got to carry some control plane state there.
These mechanisms don't support every possible topology. There are some non planet topology that remote LFAs can't support. I have got to at this about my alternative traffic patterns, again I have got to make sure if I reroute traffic from A going through to D through B, from A going through C and D then I have QoS and bandwidth capabilities there to do it.
These are my trade?offs in fast reroute. So whiter complexity. Will we ever have a single number that tells us how complex something is? The answer is no. Okay. There is probably a mathematical proof of that but I'm not going to put it on the slides. We will have a bunch of numbers however that may help us figure out these edges where the trade?offs are, what is the complexity of going all the way to one end of the graph or the other. Will we have something we can point to that says this is complex or this won't scale as a mathematical formula? No, there are some drafts out there positing these types of things. They are very good in particular edge areas.
So one useful result would be a more realistic view of network design and operation. If we can start taking complexity into account when we think about network design and operation before we deploy something and start thinking about we just scant toss the complexity over the wall into the other cubicle, it would be very helpful. I'll tell you something else we do in the networking industry. I guess everybody probably has seen 'Finding Nemo' and and Dory, shiny things, and the networking industry we have a very bad squirrel problem. We deploy very, very complex distributed control planes and then we say wow, wouldn't this be so much simpler if we vent lied it all and then we sent lies it and we go, I seem to have forgotten that we used to have this distributed, wouldn't this be so much simpler if we distributed it all. We tend to play this pendulum game where we run after shiny things or squirrels. Maybe, if we can get to the point where we understand complexity a little better and understand that we are dealing with a complexity trade?off, it would actually help us avoid this.
Okay, one way forward. Measurements within a framework, think about how to measure the points, document the trade?offs we find in real life. And help build a body of knowledge for a future and up and coming network engineers to understand the complexity problems better, so here are some NCRG pointers if you want to read them. Parallel to this presentation is design complexity and there are some other movements afoot. Dave Myer has a whole WIKI and network complexity up and some other things as well.
So, questions? Or did I run over so far that...
CHAIR: No, we have time for two or three questions.
RUSS WHITE: I answered all their questions, there is no complexity left in the entire room.
CHAIR: So, to make one of your graphs you showed in your presentation, more operational. So at what point ?? at what point can you make sure you're at optimum? So you go from robust to fragile. So nowadays we are running to, well into virtualising everybody, so first we had virtualised service, now we have networking virtualising, it brings some ?? it solves some management, probably also brings some robustness, but are people aware that in the process of virtualising all kinds of network equipment or stack, they go beyond that point and enter the fragile area.
RUSS WHITE: I think that's a good question. The problem is I don't know how to answer it other than to say fly by the seat of your pants. Experience is the only thing.
I'll tell you, one of my rules of thumb is when you get to the point where I can't actually start ?? when I get to the point where I can't understand the shared risk groups any longer, when I have so buried things in virtualisation, that I actually have to spend a lot of time trying to understand what the shared risk groups are in my design or architecture, then I'm probably too complex and I virtualised too much. So, that's kind of one rule of thumb, you know, there are some others you can get to in that area, but typically, it's just experience and looking at it and going... it doesn't look good...
AUDIENCE SPEAKER: This is Shane Kerr. You seem to be despairing that network engineers kind of drift back and forth between sent lies and distributed, what you said is not right ?? but, on the other hand, these are kind of the knobs that we have to play with, right, so I mean, I worked with people that would complain because we tried that in this other system and it didn't work, it's like... but ?? I don't know it doesn't necessarily seem that network engineers are not understanding complexity or revisiting things that have already been tried. It could just be that they are trying to find to find the bet fit for the problem space they're presented with.
RUSS WHITE: It could be, but I don't know. My experience is little more of a downer than it sounds like.
SHANE KERR: It sounded like a downer.
RUSS WHITE: Sorry, I first started putting in BD100 emulation cards and Z100s back in the mid?1900s when I ?? 1990s when I first started networking, late 1990s ?? you didn't see the IBMS 390 ?? so anyway, I saw ?? we had main frames that were centralised and everybody bought Z 250s and Z 100s and all these IBN PCs and I'm sorry if anybody here is from IBM but we always said that the IBM initials was $100 for each initial, because ?? you were more expensive. And then we thought this distributed stuff is really great so we started talking about the end of the main frame, then we got to SQL, then we thought we are going to do minis, and then we did middleware where where he centralised the processing and the database and then we were about to distribute it and now we have IBM selling S 390s as cloud. So...
SHANE KERR: I mean that story is actually exactly what I mean. I mean, for any begin point the technology and the economics may have made sense and yes, now we have big rooms filled with air conditioned computers, uniaxe style again but that may be the best ?? the most optimal solution for the days.
RUSS WHITE: Part of it comes down to cost, it used to be that the network was expensive and compute was cheap. Or the other way around and as networking computes switched back on the forth we tend to go back and forth. And I suspect in the future we're going to see the network become expensive enough that cloud so no longer as viable of an option it is today as far as processing everything centrally. It's hard to judge. But that seems to be where we are.
CHAIR: Russ, thank you for your presentation.
(Applause)
CHAIR: The next speaker, Geoff, BGP in 2013. On the presentation, the document is mentioning BGP in 20?13, so I expect some surprises in this presentation.
GEOFF HUSTON: Good afternoon, I am Geoff Huston, I'm with APNIC.
I play around with stuff and measure stuff. And one of the things I have been looking at for sometime is the actually the enter domain routing protocol. If you go back 25 years, back to been 1989, there was actually a really all of conversations happening in our community. At 1989 we got this first glimmering that this experiment was not only going to be a success, it was going to be a success disaster, because we first started project that go if this was going to work, and it looked like it was going to, we were going to run out out of class B addresses. That was the first thing, we were going to run out of addresses. But it wasn't just addresses. Because the other observation was that if this was really going to work, we were going to run out of routing slots on the PCX, its that were doing inter?domain routing on the Internet at the time. Because routing is an unconstrained thing. More addresses, more routing. And so, we were faced with a problem that we used to call the road problem: Routing and addresses. Now, you know what we all did about addresses, we solved the addressing problem with NATs. And we thought a ?? v6 as well, but we sort of thought we had solved the addressing problem by using classless inter?domain routing and the whole thing would just go away. But, that's not what really what's what happened. And underneath the last 25 years has been this subtext that routing is going to explode. And this is a typical research paper. This Cape from 2012. It got accepted somewhere. And you sort of see this methodology that pervades our thinking that routing is out of control. That the routers you have today will be useless tomorrow. And I'm kind of interested in that, because over 25 years, we have become good at a few things, but I think one of the best things we have become really, really good at is lying to ourselves. We just repeat bull shit. And we repeat is so often it becomes true, just because we repeat it. And you kind of go, can I actually see if that's true or not. And that's what a lot of this talk is about, as to whether this kind of idea that you know the big routing explosion is going to happen any day now is actually true.
So, I want to look at this and also look at this issue about is deaggregation horrendous, antisocial and the worst possible thing you could do to the Internet or not? Where is the problem?
So, let's sort of look at the really big picture. 25 years of BGP. And oddly enough, there are a number of things there that sort of MAT to business. We thought we had a problem when the routing table grew above a PCX, that was problem number one. Oh, my God, the sky fell and a huge amount of effort and we curbed the problem for a week or two two and then it was all back on again. The next one is here is this kind of euphoria that, you will remember that informs year 2000, when the rest of you were playing with ?? some crap you were playing with ?? the rest of us were having this whole lot of fund building the big Internet boom of 2000 and immediately after that we are all out of a job because it all went bust and that's where it went bust for about a year. And when we thought the Internet is doomed, it's all going to die, the grand thing is over but of course the real figures were this whole broadband thing look off and that pace of growth was amazing, and there is the global financial crisis just there, what a crisis. Hardly noticed. Bizarrely, this is when we ran out of addresses. You guys are amazing. You sort of seem to invent routing even though there were no addresses to do it from. Something really weird is happening out there. I have no idea how you are doing it but you know I want more of it too because these drugs look good.
So let's have a look at the past couple of years. That's the past couple of years. But oddly enough, there were a few folk that I just want to make special mention to. My mate Dave at Windstream is a problem. He was a problem in July. But there is someone here from orange ?? come on, 'fess up ?? because that's you right here, thank you very much. Well done, whoever you are.
So, on the whole, the routing system was pretty good apart from a few exceptions. This is about the only place where I can see address exhaustion. This is the last few years since IANA ran out. The amount of address space is tapering off. We have run out. Right. So we have to actually make do with less. And the theory was when we ran out of addresses we'll all start trading, yes? And we'd free off all of those unused addresses and they had all hit the transfer market. So, in 2011, there were fifty /8s that weren't being routed. In 2014, there are fifty /8s that aren't being routed. So whatever you guys are transferring and trading between yourselves this part of the address space isn't touched at all. This is weird. This is just what's going on. But you know whatever transfers and trading may have happened, you haven't really released more space from that unadvertised pool. I have no idea why but maybe those last /8 policies in APNIC and the RIPE NCC have actually been really good. That the folk who were desperately after addresses, SSL doesn't work for Windows XP unless you have your own address, blah?blah?blah, you are always able to get a /22 from the registry, right? Yeah, because so far, that's what we have been doing in APNIC and RIPE. Oddly enough, ARIN and LACNIC haven't run out yet but they don't have a last /8 policy, so when they run out the biggey bank will be empty. So we are about to experiment with the dark side of running out just as we have experimented with the last /8 side so far. I wish them look over in America. It will be fascinating to watch. Car crash.
Anyway, the routed AS count. This is like clockwork. I have never seen anything that is as regular as this. We turn on 11 ASs a day. Why? Is there a quota? You are number 12, sorry, wait tomorrow. What is going on that this is so amazingly uniform? Out of all that, what you get are these summary pitches. And the summary is actually relatively banal. We used to think ourselves amazing that the Internet was doubling in size every nine months or every year or so on, Moore's Law was going to kill us all. That is such a pedestrian thing. Like the GDP of China growth was higher than that for a few years, it's slightly faster than the GDP in Europe. The growth rate in v4 is actually relatively pedestrian. So, 8 to 10%, slowing down a bit. Address shortage really hasn't hit much. Maybe NATs are doing all the heavy lifting and maybe I have gone and deployed the Internet as far as anywhere that counts. V6, the other solution, the plan B.
I thought it would be interesting to plot the impact of world IPv6 Day on the v6 deployment levels. As is evident.
Enough said...
There it is again in address space as is evident. There is growth and I must admit it's interesting growth. V6 is sort of these huge building blocks that you bring in a /20 and it's so much bigger than everyone else's /32s, you get these big steps of address space, but overall study growth. AS count, v6 is accelerating, isn't it? Growth of deployment is just going faster and faster and faster, that's why the AS is count is actually dropping. As you see the rate of growth is not increasing. It's slowing down. So the numbers are big, but they are not really big. They are not hundreds of a percent. Sort of 20 to 40%. A couple of years ago it was 90%.
So, this way, if you look at the AS count, you'll have as many ASs in v6 as you have in v4 by 2030. God those NATs had better bloody work because by then you are going to need a huge collection of NATs. So, again, the rate of growth of the v6 Internet is not increasing, it's actually slowing down. Whatever momentum is there, isn't. That's really, really bad.
Is the world just so addicted to v4 that v6 isn't cutting it? Is there some other factor we have all missed? I don't know. It's kind of interesting.
What can we expect out of all of this? How big a router do you need to buy? Do you want it to last for five years? How many routes will you need to route in five years time? It's all unpredictable. It's hard. There is the math. It's cool. There is some first order differential. That's a first order differential. It's the rate of what I think expressed everyday.
That's the relative first order differential and the really interesting thing is that it's actually evident what I was talking about that these numbers are actually coming down to zero, over the last year?and?a?half the rate of growth of the Internet relatively in v4 has slowed down. And so, as far as I can see, if you are going to buy a router today and you want it to work by 2020, you'll be under 750,000 entries at this point, pretty confidently. Now, there is a lot of uncertainties, but you know these aren't frightening numbers.
What about v6? Well, you know, you can do some curves and more maths. First order differential looks brilliant. This is really growth. Hang on a second, let's do this in a relative sense. Let's blow out the last couple of years. That's what I'm saying that the rates of growth of v6 is actually really low. Whatever is going on in this industry, v6 is losing momentum not gaining it. That's kind of sad.
How big a router would you need? Let's be optimistic. Let's say exponential growth. Five years' time, 127,000 entries. What kind of TCAM will I need in five years' time? A million entries will cut it, just, but it will cut it.
So what makes this painful? Moore's Law makes it painful. This is cool. Let's map Moore's Law to v4. Oh, my God, heaps of head room, not a problem. Let's map it to v6. Still under. The unit cost of routing isn't changing. Routing is not become more expensive. Oddly enough per unit routed it will become less expensive. This is good.
As long as you are prepared to live within the constraints of the current routing system you don't need to change it. BGP will cope for at least five years, isn't that cool? You don't need to learn another routing protocol. Wonderful. But some people say it's not the size; it's what you do with it. So let's look at what you do with it. Because it's all about updates. Isn't it? It's all about that constant ebb and flow of actually updates inside BGP. And what you think is that routing is a lot like drowning and noise. More routes, more noise, yeah, double the size of the routing table, double the number of updates, yes? No.
You guys are weird. You guys are so weird I have never seen anything like it. That's the size of the routing table. More than doubled. Yeah? The red line is the number of withdrawals per day. Flat. Even the blue line which is the number of updates has grown to up to 100 thousand a day, whoopie?doo. So how can you get an a system that just grows without bound but the dynamic properties are almost flat? You better tell me, because you better know what you're doing right, because as soon as you stop doing that, everything is going to explode. So, I'm kind of interested as to why the network just grows like crazy, but the dynamic properties of the routing system are flat. The same IBM PCX, it could actually map the updates and keep pace, that's bizarre. So how many unstable prefixes are there? How many bad prefixes per day? The size of the bad room has reached 50,000 a day tops. It's sort of this part of the room is being unstable, the rest of you can't flap. What determines that? What's going on that only a fixed amount of the network is unstable on any day? That's weird. Oh, it's a disinspected protocol. The biggest dollar take ages for a route to converge, bull shit. Since a long time ago, it actually goes back to 2000, it takes offer average 70 seconds for a route to converge, it's flat. BGP works as well now on 500 thousand entries as it did back then on 150,000 entries. Magic. Spooky. And very, very strange.
That's possibly a clue. None of you want to be a customer of any of the rest of you. Really. All of you want to crowd in as close as you can to a hypothetical middle. So all of you go as far as you can into the middle of the network and get to the largest exchange. You want to get as close as you can to everyone else. So the average AS path link. This is as seen from route views, but it's the same wherever you look, over an extraordinarily long time. Since 1998, has been the same. The Internet has had the same diameter and as we're going, it's become denser, not longer or bigger. And that happen means most of the routing properties have stayed relatively static. That's what you're doing right. You are not being a customer of anyone else and oddly enough, routing, that works.
So, you know, as far as I can see, this is pretty cool. It's wonderful. What about v6? Again, I can see the same kind of properties happening in 6. That for a while v6 was kind of so small you couldn't measure it, but since 2011, the number of routes in v6 has grown but the dynamic properties are actually quite stable. There is the unstable prefixs in v6, about 500 a day. You know who you are. Stop it. But you know the rest of the network is astonishingly quiet and stable. It takes slightly longer to converge and I'm not sure if it's Cisco doing something bad or just the fact of where I'm measuring but it takes 100 seconds to converge that eats been stable for quite some years. So the average AS path link, before v6 was money, no one cared how you connected. You used tunnels from here, crap from there, paths in v6 were nonsense. And I look at that, the graph shows it. But once folk started to say, I'll pay you money for 6, you wanted to be as close to the middle as you could and that kind of AS path. That profile is exactly the same profile you see in 4. In other words, people pay money for 6. And money determines topology and that's cool. So, again, convergence time is much the same. This is cool.
And there is this last thing about CIDR. Because the other part of this is, we're all encouraged to do the right thing in routing and not deaggregate. More specifics are evil. That's the current theology. Are they? So I actually wanted to look at this because there are a number of folk, is AS 2118 here? Well, if you are looking at this remotely, don't do this. It's stupid. Exactly the same sort of set of AS paths because they have taken the original prefix and decided that /24s are cute. They are not cute. They are annoying. So who is doing this the most? Someone from Brazil, someone from Indonesia and the US. I tried to find someone from Poland but they are pretty down low. We know who you are and we do lists everyday.
And you can track the folk who advertise more specifics. And it's a kind of a strange curve. Because it goes up. And you think well as a proportion, what is it? Spooky. It's 50%, has been for 12 years. I want to deaggregate. No, no, you can't deaggregate 'til you deaggregate. It's a sort of constant, but there is no rule, but this constant sort of parameter that 50% of routing table is more specifics and has been for 12 years. Weird.
The amount of address space covered by more specifics has grown like crazey. It's now a third of the address space but as I said the number of routing entries: 50%. Who does it? Really small group. Just those 10 people in the far end. Because you know, of the 50,000 ASs we see, 332 of them do 50% of that totality set of more specifics. So there is a small number of folk, we know who you are, who are Inn credibly bad at this. Who just deaggregate like crazy and the rest of you are actually quite good. So 1% announce 54% of the more specifics.
V6? Well whatever sins we do in v4 we do in v6 very accurately. This is v6 coming of age, right? Because if we weren't doing the same thing, it would just be a toy. So, you know, the same thing happens. We are now up to 30% of the routes in v6 are more specifics, we expect to hit 50% in a year because you guys are professionals. The amount of address space going up because you know it will.
Are you getting any cleverer? So, I looked at these offenders, and that's the list of offenders, and I looked at their more specifics every day for the last three years and found that while some people show evidence of learning, to an extent, and there is one who sort of plummeted downwards. Other folk are blindly ignorant of this whole thing about aggregation. Some folk don't know what they're doing and there is this strange mix of noise in there that some of you are getting smarter, some of you are getting infinitely dumber and some of you just don't know what you are doing. What a surprise. And as far as the result is, it's just 50%. Now, the real question is, why are you doing this? And I kind of get three answers.
I have got some of my providers address space and for whatever reason I want to advertise it differently. I want to patch a hole in your aggregate, so you have a /20, I'll take a /24 and advertise it in some other way, hole?punching. This is good. Another one is, I have three incoming paths and I need to balance my incoming traffic, because outgoing is easy but incoming is kind of hard. I have to selectively advertise routes down each path to make sure I get the right traffic balance, right. So we call this some kind of traffic engineering. And the last excuse which Bill South keeps on telling me, I advertise the /24s to stop everyone else. Weird.
So, we can kind of categorise these into three. We talked about traffic engineering and we know what that is now and we talked about hole?punching and we know what that is. And the last I'd categorise is just senseless vandalism. And that's the count over some years. They are all going up. Relatively. Senseless routing vandalism is just under half of the more specifics. So there is a bunch of folk who just graffiti the BGP sessions. Crap.
Hole?punching is going down. Traffic engineering going up. Are they noisy? That's an interesting point. Because do more specifics make more of an impact on updates or not? So, here is the daily BGP update rate. Lots of graphs. This is cool. But let's split out the two. If more specifics are one half of the routing table and they are equally noisy, both of them should contribute one half of the update load. Yes? Yes. But more specifics contribute 85% of the update load and the aggregates contribute between 20 and 15% on any day. Shit... more specifics are four times as noisy. Now that's weird.
And, you know, well, what is it? And it's possible to actually do the same degree of sort of analysis. Once what you actually find is while senseless routing vandalism looks like abhorrent behaviour, oddly enough it's stable. It's stable. And the one thing that I found really non?intuitive is, you think traffic engineering would be unstable, I want to balance that traffic, I will move prefixes around. Now, that's too smart. What they seem to do is actually relatively stable. It's hole?punching that is more unstable relatively. I find that weird.
So, you know, do more specifics contribute more to the load? Yes they do. And you know, it's the hole?punching.
So, around 1% of ASs do this to the rest of you. And the reason why your routers are churning and using up, you know, compute power updates, etc., is actually all about more specifics than deaggregates. And it's the hole?punching, the PI space that actually contributes the most to this. And I really don't know why. So, the next kind of question is: Should we actually arm ourselves with the pitch forks and the tar, light the torches and go on a witch?hunt? Witch?hunts are still fashionable in some parts of the world. Probably around here...
It's evident that you know we could do something about this. And you could. Because very small part of the BGP routing population actually contribute the buck of our problems in BGP and scaling if there are problems. So we could do a whole lot better and it would be really, really easy.
But, should we? And this is one of these things about so many hours in the day, so many things to do, will my routers melt if I do nothing? No. And it's kind of hard to actually justify the effort and the cost. And I know a lot of people are out there sort of scouring through the list saying naughty, do better. And the folks sometimes go, yes, I'll do better, but other times it's kind of nah, and that's what we see, because, quite frankly, no one is interested in putting in the effort. Should we? It's hard to say, from there I am left with going in some ways this is a story that says the routing bogey man doesn't exist, and part of the reason why is that, oddly enough, the network is incredibly stable for most of us, most of the time. Circuits aren't unstable. Routers aren't unstable. This stuff works. And works brilliantly much better than we ever thought. For most people most of the time. And the other part is that your business logic about driving into exchange points and doing long circuits to where you can pick up the most traffic most of the time actually makes routing work. That the way in which we try to connect directly to what we think is the centre of the network actually produces more stable routing and that's an amazing result. So, you know, long stream networks are unstable but year not building long stringily networks, they are building dense fat networks and they work brilliantly. The result of that is, it kind of works and that is kind of cool and there is probably not a problem. But could you do better? This industry is full of obsessive compulsives, and I'm one of them. You look at this and go, yes, yeah, we should do better because we could do better and that's the only reason I can think of.
Thank you.
(Applause)
CHAIR: We have some time for questions.
AUDIENCE SPEAKER: Three quick comments. First of all, I will disagree that v6 is losing momentum. What you have seen is the first is the most easiest step of deploying v6, yeah? Get in prefix advertised few comments on the router. And the ISP or any other company would do after getting v6 aggregation from RIR, that's why probably you could not see much on v6 on v6 launch because it's been done already. And the hardest point is just hidden and you could not see it in this type of measurement.
And the second, I think, is about deaggregation. I think deaggregation in v6 is more difficult for people because if my peer deaggregates the /48 into /64 for traffic ?? it would immediately get my BGP session down because of prefix limit. It just wouldn't be propagated. With similar stuff in v4 could just go unnoticed.
GEOFF HUSTON: Actually I'll make a comment there. China Telecom, who isn't in this room, actually got rid of 800 v6 more specific prefixes and they were kicking around the /32 and simply announcing a single /24. So it's not about 64s and 48s. It's further in. But what I notice in the v6 routing table is, there is an increasing view that /48s go everywhere. And, oddly enough, I think they're right. And that maybe the currency of v6 routing that works, at least today, is a /48. And I'm not sure that's the right or the wrong thing. I just don't know. But oddly enough, that's where it's sitting. That was my comment.
AUDIENCE SPEAKER: I'm not actually surprised that you are saying how [BIND?] can be more unstable than anything else, because from what I have seen is just ISP advertising aggregate which is tabled because of just static advertising BGP and actually multi?homing customer was probably more unstable network as advertising more specific.
GEOFF HUSTON: Again, my intuition would be if I am traffic engineering across multi?home, you are sitting there with the one prefix and you are advertising it down different paths, and I should be moving those prefixes to suit my temporal needs on traffic, and, oddly enough, I don't see that.
AUDIENCE SPEAKER: I think it's not traffic engineering. It's one of few cases I have seen is where you have ISP ?? one ISP uplink is one ISP and you have whole advertising showing other ISP, so you see, basically, small prefix advertised through one of the channels and both ISP advertise the prefixes.
GEOFF HUSTON: Tell you what, let's write to them, it's only a small number, let's ask them. Do you want to help?
AUDIENCE SPEAKER: Yeah.
GEOFF HUSTON: Cool.
CHAIR: First I want to close the mike. There are three people, still. Very brief questions, please.
AUDIENCE SPEAKER: Matt Moyle?Croft from Amazon. There is a lot of stuff in RIR policy about obsessions with routing table size. Now, it strikes me, have you looked at people who come to RIR meetings like this versus people who deaggregate? Is there a connection? Has it made a difference?
GEOFF HUSTON: If you look at the CIDR report and drill down, miles down the page you will see a comparison of what you received from the RIR versus what you advertise. And generally those two lists aren't the same, for most folk. So, what we said in routing sort of our expectations are routing policy versus what people do are actually miles apart but that's kind of okay, because as I said the system isn't blowing up. It's sort of working for most folk. But what I'm finding is this small collection of more specifics that are just disproportionately noisy and I'm not sure the RIRs could actually take the blame for that.
AUDIENCE SPEAKER: I'm just talking about the obsession with routing size, considering you have just said it hasn't actually made much of a difference over time, it's been pretty consistent. Has RIR policy, an obsession about routing table size, made any difference?
GEOFF HUSTON: It hasn't made it any worse.
AUDIENCE SPEAKER: Benedikt Stockebrand. Two comments: First of all, we should remember that as far as, especially BGP is concerned, we are pretty much ?? I'm not going to say through with v6, but we have got way of a head start compared to the, let's say, small to medium company content provider sort of things with the web shops and what not. So I'm not really surprised to see that BGP or, from your measurements, v6 is kind of stagnating because some other people have to catch up. That's what I think we are currently seeing, and the second point is, why do we have for pretty much constant number of people misbehaving, it's basically if you misbehave and you are too bad, somebody will tell you and it will actually boil down to costing you money. And I wouldn't be surprised if some MBA type could actually figure out how to make sure that this is some sort of composite due to the numbers, right.
AUDIENCE SPEAKER: Mike Hughes. I'll make it very brief. Earlier on in the presentation, you had some slide saying here is the effect of World v6 Day, and you said not a lot. I wondered if you have any analysis or any stats on the effect of Martin Levy?
GEOFF HUSTON: That's kind of an interesting thing and I actually look that average AS path link and I have got some more stats around RTTs and tunnelling and one of the effects of Martin actually was to encourage folk to go native, because be tunnelling kind of got you there but the experience was not quite everything it could be because tunnelling is hard and the whole MTU issue and reliability and so on and it's kind of wow you mean if I go native it will be ?? and it's kind of yeah, brilliant. And I actually haven't looked at the birth of ASs in v6 and where I saw them before and maybe I should because I suspect that, in that data, I can see Martin. Because I think that would be, you know, brilliant.
AUDIENCE SPEAKER: Brilliant, fantastic, look forward to seeing it.
CHAIR: Geoff, thank you.
(Applause)
CHAIR: We are switching roles. Another PC member helping to moderate the session and the last session is dedicated to so?called lightning talks. Very short talks, each ten minutes and the rules are very strict. What we are going to do is, ten minutes, and you partition your talk or whenever or whoever you like, you can say five minutes talk and five minutes questions or ten minutes talk and bye?bye, no questions taken.
So those are the rules. The last talk will be coming from a programme and I will probably say what's more before producing this last talk but for now I'd like to invite Chris with his talks on operators and the IETF.
CHRIS GRUNDERMANN: Thank you. I am going to talk about operators in the IETF, and the first thing I want to clarify is, in this context at least, when I say the word "operators" I don't mean ISP, I mean network operators in general, so, enterprises, campuses, data Internet networks, the gamut. Anybody who operates a network. What we're talking about here is including more operational context, and content into the IETF process.
Why? So, the dream is that every standard the IETF makes runs great on your network, right, out of the box. And that you know everything is working together perfectly. The operators are involved providing the input that's needed so that that can happen. The IETF is telling the operators when they need that input and everything just kind of works perfectly together and we get protocols that come out every time that you can go implement on your network and you see them coming, right.
The reality is that that's not happening. There are several instances and examples and the further you go back and talk to IETF members, the folks who have been been at the IETF or implementing new technology longest have the most number of stories. You know, one good example is with IPv6 and source routing. So we had source routing in IPv4, everyone turned it off because it's a horrible idea, it's in the protocol for IPv6 and then they depricated it. Having that feedback and input and operational context in the process may help avoid things like that and may help things move a little faster, possibly, and at least happen a little bit smoother. That's the hope.
So we have a plan. My team at the Internet Society is trying to facilitate operator input. There are operators who participate already and they are doing a great job but they think we need more of that information coming in and more of that feedback loop. More of that communication between operators and the IETF.
So, what we're doing now, we're travelling around the world. We are trying to talk as many operators as we can, trying to find out why don't you go to the IETF. What are the barriers to entry? Why do you feel that your voice isn't heard? If you feel that it is heard, we want to hear that feedback. Myself and Jan Zorz are here this week. We have a survey up. We are going to analyse all that information. We are trying to build a problem set. We don't want to solve problems that don't exist. We want to analyse and find out what the problems are, document that and go back and make sure those are the problems that are affecting everyone and try and solve them. We are hoping by the end of this year we'll be seeing some of these solutions implemented, whatever they may be.
So, the opportunity and the reason I'm here giving this talk right now is we want to hear from you. This is your opportunity to talk to us to try and inform this process as things move forward.
As I said, find one of us and talk to us, we'd love to talk to you and jot down your talks. There is the long link and short link version. The survey is going to close at the end of June. And at that point, we are going to wrap up the results and dig into the analysis deeply and kind of come back around and share what we have heard and make sure that, doing a sanity check, make sure what we heard is what you said, as far as operator communities across the globe.
Then, we are thinking that, once we have that problem set, we want to start building solutions. Those solutions might fall into three categories. Things that operators or probably more appropriately operator groups can do to help solve the problem. Things that the IETF can do or change to help solve the problem and probably a third set that the internet society may be able to step in and do some activities to facilitate this going forward.
I'd be happy to take any questions now.
CHAIR: Any questions?
AUDIENCE SPEAKER: This is Shane Kerr. The IETF has had a refrain that there is not enough operator input forever, like since before there were computers, they were complaining there weren't enough people. So, I don't want to discourage this effort but I guess my question is: First of all, is that presumption really true? Is it actually important that operators have input? Because, from one point of view, if it was important, it would have happened by now. And the other question is: is it really likely that this effort is going to succeed to change this.
CHRIS GRUNDEMANN: Those are great questions and that's why we are doing this survey. We want to know is there a problem or is this a perception or is there actually a problem here? From what you have heard so far, there is actually a problem here, I believe.
SHANE KERR: It would be ?? I agree, it would be really good to detail data.
CHRIS GRUNDERMANN: The second thing is, over time, there has been this constant thought that there is a missing piece, right. That's kind of ebbed and flowed a little bit. I don't want to go into the history now, but if you look at the operation ?? the operations area within the IETF has kind of changed over time. And it sometimes has done different things, have been more engaged with operators groups versus, if you have less engaged with the operator groups and personalities. It's kind of something that's been constant but there's been an ebb and flow through time and I think there are some solutions so I'm hopeful ?? I don't want to put the cart in front of the horse, but I really do think we have something here that we can act on.
SHANE KERR: Okay. Good luck. Thanks.
AUDIENCE SPEAKER: Gert Doering. I'm a little bit active in the IETF, more in the RIPE world. One of the issues I have with IETF participation is that I have a job, and it's only ?? part of my job is actually sort of keep track what the vendors will surprise us with in the next years bus it's only a small part and the bigger part is keep the network running. So in the limited set of Working Groups that I read the mailing lists, one of the huge problems is that when you try to influence the process, you're up against people that are full?time paid or whatever they do for a living, to argue the point. So all the discussions tend to go into huge mailing list threads that I think most operators that have something else to do will not be able to sort of like sustain. So we go away and do more interesting things and break our networks.
This is a very personal thing. I'm trying to really follow a few of the IPv6?relevant things, but I cannot find enough time to really go into all these discussions with the amount of time that would be necessary.
CHRIS GRUNDERMANN: Absolutely. That's great feedback. I will say that's not uncommon. That time is definitely one of the issues that we have heard.
AUDIENCE SPEAKER: Richard Barnes, I'm speaking here as one of the area directors that comprises the Internet engineering steering group which is sort of the board for the IETF. I think it's great that you are doing this and collecting this feedback. I look forward to working with you guys, I hope you'll keep it going because we are actively looking for ways to make the IETF a friend letter place. A lot of discussion we had last week was about a lot of points that Gert was making about how do we control the discussion and make it so that it's more focused on the productive things as opposed to endless threads. So we are really looking for inputs from folks of that character and where you're feeling the pain in this domain and how ideas for how we can make that better. So this is very much appreciated.
CHAIR: Thank you very much, and you are around for the whole week.
(Applause)
Our next lightning talk is training at RIPE, and Rumy asked me to say if you have any feedback, please ?? well apart from questions of course, after the lightning talk, please approach Rumy or members of RIPE with your feedback.
RUMY SPRATLEY?KANIS: I work at the RIPE NCC, I am the training manager, I manage the training department. As you may know, our department does about, well actually over 100 training courses a year in the service region, so we travel around a lot. I'm not here actually to talk about the training courses we do for our members; I am here because I have a question for you, so I'm going to try to keep it sort so there is as much time as possible are you guys if you have any input, and not get into a fight with Andrei.
So, actually, as you may know last year, at the RIPE NCC did a survey, and one of the questions we asked in the survey was, how do you think or how do people think in the community we could improve the experience of the RIPE Meeting and how can we encourage people to come more to our RIPE meetings and interestingly enough, there were quite a few comments of people asking for more training at RIPE meetings. So I have posted some of the comments that we have received.
So there was one question which was question 79, "Do you have suggestions for how RIPE NCC can improve the RIPE Meeting experience?" And as you can see several comments about more workshops, maybe the Monday morning not being enough, people saying they would be able to convince their bosses to send them to a RIPE Meeting if they could justify it by also having some training or learning something while being here. More technical, more hands?on. So that was one question.
And then there was a question, is there anything that would encourage you to attend a RIPE Meeting? Apart from everybody wants to have a RIPE meeting in their city, which is kind of difficult, and several people also asked for instant teleportation devices, which we are working on, but I'm not sure we can have that in the next coming years. And again, more training, especially technical topics like IPv6, DNSSEC, security, having more workshops, as some of you know at APNIC meetings, they are more of that.
So, following this survey, the communication managers, Nick Hyrka and myself, were tasked to investigate what can we do and what can we offer at RIPE meetings? And so, before we go and run and say okay let's do all this interesting things in London or later on, I would actually like to know from you what is it you want, do you think we should do more trainings at RIPE meetings? Do you think they should run parallel to Working Group sessions, do you think they should run on the Monday? Maybe even on the Sunday? We have a lot of options, we have dedicated staff. We have lots of training materials. So there is quite a lot we could do, either by using the RIPE NCC staff or maybe using knowledge that we have in the community. But we would like to hear from you what is it you are interested in, which topics would you like to see, how big should the sessions be? How should run these and when should they take place? Would you be interested in would you come earlier, would you be be interested in doing that. Basically what I'm asking for is talk to me, I'm here all week, you can also send an e?mail to training [at] ripe [dot] net or rumy [at] ripe [dot] net or you can talk to the PC members, I'm sure they would also be happy to hear how you think we can improve or what you think we can offer. Basically, that's it. If you have any comments or questions, please let me know or come to me directly and let me know what you want. Thank you.
CHAIR: We still have some time for questions, so if you have any questions right now, please...
Okay. No. Then, yeah, the mail address is there and you can also contact pc [at] ripe [dot] net. Thank you.
(Applause)
CHAIR: So, I mentioned RACI, and RACI stands for RIPE Academic Cooperation Initiative. The thing is that we, together with RIPE NCC, try and facilitate researchers and students coming to RIPE meetings with their RIPE research relevant for this community and getting new fresh blood in this community as well. So, every RIPE Meeting there is a competition, this time 17 talks were submitted and seven people, if I'm not mistaken, were granted attendance of RIPE meetings and out of those seven, one was selected by the RIPE Programme Committee to be presented as a lightning talk here. Other winners of this initiative, they also present through this week at Cooperation Working Group, at DNS Working Group and there is also a RACI BoF on Thursday, so please come if you are interested, and now I would like to invite Sameneh, with her talk about online banking fraud, which has a more complex title.
SAMENEH TAJALIZADEHTHOOB: Hello everyone. Thank you. The idea of this research that I'm doing now in the first year of my Ph.D. is about extracting more intelligence about online banking fraud. In this case we studied ZeuS financial malware.
To mention this study is only the preliminary study and since I have a very short time I will only give you a quick overview of what I did till now.
As we all know, the statistics of online banking fraud is raising since a few years ago and, 'til now, there is no specific pattern that why some targets are attacked more and what is the specifications of those targets that are attacked:
So what we did, so for those that do not know how ZeuS financial malware works, it perform the man in the browser attack and hijack your online banking sessions. In some of the cases, it injects nil field on your banking session on your browser or modify your account balance, and will show you the fake balance so attackers would transfer money without you knowing.
What we looked at was that we got data from a third party, FOX?IT security firm looked at it in the Netherlands. We got 11,000 records of ZeuS configuration files for four?year period. This is how a configuration file looks like. This is part is the inject section of the config file in which the attack target are mention and below is the code that attackers inject into the browsers.
So, from these files we extract what are the instructions of the attack and which targets are attacked.
The questions we tried to answer was that identifying targets to know what are the specifications of these targets and, in general, how attackers are identified the new targets over time.
We also looked at how their attack code changed over time, which is also related to defence measures of these domains or in most of the cases financial service providers but since I don't have time, you can always look at the paper that we have in this regard.
So, what we found first was that when we looked at the targets over the four?year period, we found around 2,500 domains from around 15,000 URLs located in 92 countries, and from around 2,400 botnets or, in this case, command and control service. What we saw was that, in general, 74% of the overall attacks was on financial service providers and the rest was on some security firms, online security products and telecom companies, etc..
When we looked at the popularity of the targets that are attacked in terms of time, we saw that a minor group of targets, which is located in this part of the graph, attacked for the whole period that we have data, so we called them always attack domains, whereas as you see in the left, most left part of the graph, around a half of the domains that we had in our data set was attacked only less than four weeks.
The diversity of those domains were a lot, so we see only 50% of them were financial institutions located in 80 countries, whereas, this group, 97% of them were financial institutions and located in 13 countries.
So what we concluded from the first category, we concluded that they are ?? maybe they are the result of trial and error of attackers on the domains that either they are not successful because of the attack measures ?? because of the defence measures of the defenders of those domains or simply because they were not attractive enough to be targeted any more.
Next, we tried to find the external factors that may cause a domain to be attacked. We correlated domain popularity with size of the domain, which we extracted from Alexa page, and we saw a very weak, though significant, relationship between them. So, as we can see, size does matter, but only to a certain threshold. Above the threshold, it doesn't matter. It's not a determination for attack persistence, let's say.
We also looked at the institutions only located in the US, so, among 6,500 active financial institutions in the US, only around 200 attacked by ZeuS. This is a very high difference. How we can explain this is, that in the United States, a lot of banks use the same back end for their online banking platform or they outsource it to a same company that are either present in our data or not, or, the rest of the financial institutions are not simply attractive for attackers because of their small customer base.
As you can see in the graph, these are the 50 top US?based financial institutions. And the other is the average number of attack botnets, so we can see there is a high diversity even within the top 50 big domains in the United States, so, also, we can see that size does not matter to some extent.
Next, we looked at botnet activity in this research. So, as you can see in the graph, I specified the time that a ZeuS source code has been leaked in underground economy. What we expect from it is that the system of active botnets or criminal groups that use those are the tool to perform online attack would increase, though ?? although, if you look at the graph, this is not a trend. They are decreased. So, what we understand is that the underground economy is limited, or its pool, that it's a shared pool and it has limits. So everyone can enter the market but it's not that everyone could be able to perform the attack easily. Also, after Microsoft take down the number of attacks are degreed. So, maybe it was ineffective effort to decrease criminals.
If we look at the number of domains, new domains that are attacked in each period over the whole time that we had data, we see that there is also a certain ceiling, although we have unlimited pool of financial institutions or domains that could be attacked online but the number is not raised above a ceiling, so, again, when we talked to the experts of the field, we realised that finding the right domain is not the only requirement for a successful attack. But the botnet may be somewhere else on the value change. Till now, we maybe see transferring money as the most important bottleneck in the value change, so, finding the right manuals and transferring money from the bank is one of the important impediments in the online banking fraud now.
Also, when we looked at the countries, how they changed over time, attacked countries. We see that the diversity becomes more, from 2009 to 2012. Though, also, we are not seeing attacks on some specific countries any more. We are seeing more attacks, more on small islands, still we don't know, maybe it's related to tax and stuff, but these are also really interesting for us to know.
That's all. Thank you.
(Applause)
CHAIR: I think we have time for one question. If not, well thank you very much and again, we invite you to the RACI BoF that will happen on Thursday. Thank you very much.
This concludes our Plenary programme for today. Thank you very much. There will be a task force taking place here at six o'clock, there is also the meet RIPE NCC Board at half past six and then there is a social welcome drink at seven o'clock. See you all. Thank you.