Lesson 1, Topic 10
In Progress

Cryptography for Beginners

digitalscotland July 13, 2021
Lesson Progress
0% Complete

In This Video first thing I’d like to do is to set some expectations on what’s going to happen here.

What is cryptography?
I’d have to add this in I hadn’t thought about with cryptocurrency. we won’t be talking about cryptocurrency what I hope to give you is a working understanding of some of the terms the common terms used and the key drivers for choosing cryptography methodologies algorithms and strengths so you won’t leave here being an expert hopefully you’ll leave here with enough information to make intelligent decisions about what you’re doing with cryptography so the first thing we should go over is what is cryptography and if you’re like me the first thing that you do is you go look at Wikipedia and Wikipedia has this moderate moderately well-defined definition it’s the practice and study of techniques to secure communications and the presence of third parties called adversaries and what I like to focus on is the fact that we’re securing communications and that there are adversaries right you don’t have to study it unless you’re a cryptographer you don’t need to study cryptography but it’s important to understand that what you’re trying to do is you’re trying to secure communication even that even if that’s just communication with yourself storing on a hard drive sticking in memory across the network to yourself it’s still communicating and it’s there are adversaries that will try and get it.

How cryptography works?
The first thing to understand that cryptography all starts with secrets so secrets are an essential part of practical cryptography you can have cryptography without secrets it’s just not a very good idea but secret has been in use for more than cryptography as well right there’s very old stories about using pass phrases to gain entrance to things a very famous what does the Open Sesame to be able to gain access and that was a secret right in the United States during Prohibition times or places called speakeasies where you could actually go consume alcohol because you couldn’t anywhere else and you would have to have a proper passphrase to know that they weren’t going to you weren’t an official trying to arrest the min ancient cryptography they used methods the cryptography methods were secret and so you would have to know the method but that was the secret part all right how you would actually go from the plaintext the cipher tax from the cipher text back to the plaintext that was the secret in modern cryptography the methodologies are well-known they’re documented they are peer-reviewed they go through this large process to be accepted and they’re published in academic journals and oftentimes they’ returned into if they succeed and they’ reaccepted they turn into distributable libraries in PHP we have M crypt open SSL and sodium and there are others so secrets that in modern cryptography is done in the form of keys and those are what we use to add to an algorithm to turn plaintext in the cipher text and back the other way generating and exchanging keys though can be a tricky business before the mid70s 1970s passing secrets was a thing that people did with briefcases and handcuffs and long trips as he would have someone with the secret with the codes.

In World War 2 it was codebooks you would have to just pass around these large you had to pass around secrets so that both of you knew what the secret was for today or the secret was for this time or the secret was for this message so in modern cryptography to make that usable and affordable and accessible probably the most critical advance would be key exchange so some of you are familiar with diffie-hellman key exchange and what diffie-hellman key exchange is the ability for two people to have a secret value to be able to come up with a known common value but never pass the value across the wire so when you make a connection to a webserver that uses TLS often fir – as ssl this is what’s going on in the background all right this is the diffie-hellman key exchange and it’s a process I won’t get all into it you can read about it that’s actually a picture from Wikipedia so there’s really good information on Wikipedia on how that actually works I’ve also got some references to some books that go through it really well – but it’s an it’s an it’s a way to actually come up with a common secret from your own secrets never pass them around you can both communicate in a secure manner and to do this is very expensive it’s just using prime numbers very large prime numbers and modular arithmetic so it’s kind of this black box of magic that happens which isn’t so important for us to understand today just understand that it’s this is what happens and you’re exchanging a key and key exchange is a really big important thing if you’ve ever had to converse with a with a third party like if you use Amazon or if you use git hub you’re communicating over TLS via SSH using SSH key you have to key it to move that thing back and forth that key which can make it difficult key exchange make this lose you don’t have to do that the next super awesome piece of voodoo mathematics that came about was public/private key pairs and this is so you don’t have to exchange secrets so much like key exchange.

The diffie-hellman key exchange says I have a secret do you have a secret?
we’re going to create a combined secret public private key cryptography says that I have a secret I’ll give you just enough to do it you have to do on your side and then you’ll give me your a public key which has just enough information to do the same the most famous one of this is called RSA named after the three inventors of the product of the process also before they invented this there’s actually a British mathematician that came up with it basically design the same thing before they did but it was super-secret because it was part of British intelligence so he was not allowed to tell anybody for a very longtime but symmetric key cryptography which is what the public/private key pair is talked about made it’s this you didn’t have to do that key exchange so it made a lot easier to move data without having to do this complicated process of doing key exchange and building that in the other part of cryptography is ciphers so keys and ciphers so we see some ancient ciphers here on the left that’s just basically our placement on the top right is that is rot13 I believe know what is it rot 3 yes very simple but that used to be considered acceptable cryptography 20years ago and in the bottom that’s RSA so modern cryptography ciphers use mathematics very complicated mathematics and we’ll get into that a bit more later just to understand that it’s all math it’s why you need computers to do it it’s because it’s all mathematics there are different ways that cryptography issued and we’ll go over that as well so the important things that we’ll talk about that are happen in a daily life of a developer someone who is writing applications are going to be encryption digital signatures key derivation and hashing and key derivation is what a lot of people call password hashing even in PHP libraries we call it.
What is Password Hashing?
Password hashing it is hashing but it’s a process of doing multiple hashes in different methodologies for generating keys so they call it key duration so encryption is used to place data in a form that can be reconstituted into its original form so what it can be encrypted can also be decrypted it comes in two flavors symmetric which is a shared key and a symmetric which is public key encryption so an a symmetric encryption you have a shared secret I’m sorry yes an asymmetric encryption you do not have a shared secret that’s symmetric encryption asymmetric encryption have your own set of keys and they’re different and what you’re going to do is you’re going to use a public key to encrypt data for a private key and digital signatures go the other way around I’ll get more in depth than that in just a moment symmetric encryption is where you share a same secret the same key.

Why we use Same keys?
Same Keys used to encrypt it is to decrypt hashing is where you create are presentation of the value and it cannot be reversed it’s reproducible if it’s a proper hash but it cannot be reversed so when you hash something it doesn’t normally have enough information to regenerate it now this particular input the input is smaller than the hash but more often if you see when you’re downloading a file and you see that it has an md5 or sha-1 representation and there’s not enough information to recreate it but the function allows you to come up with a string of the same length every single time and the difference between a good input hash and a bad one is how well how well it prevents collision what you don’t want is you don’t want two things having the same hash it will happen because just the way to algorithms work you’re just going to have that so the larger your hash the better the algorithm the less likely you are to be able to create that and we’ll get into that more also and a modern hashing algorithm won’t create an input role credit collision for an input whose size is equal to or less than the value of the hash so because it’s the same size you can’t have a collision it’s only one it’s larger so if you have a 30 character string and you’re doing a48 character hash you never have to be concerned about collisions because they just can’t happen on a proper algorithm.
So in our example our input size is less than our output size it means it would be impossible to create another hash but if you have a larger value like this image it’s condensing it down and there’s a popular attack that’s been recent called shattered say one hurt or shattered all right we do have crypto geeks okay so shattered is it is possible using a PDF to adjust the headers in the PDF until you can have a different document with the same hash so you’re taking something I just not necessary the same size and you’re condensing it down to the point where you can keep adjusting the headers you would change the what the document is and oftentimes that is I agree to do something or the terms of a document so if you sign a document if you sign a PDF and you hash that that should be secure except if you’re depending on what you’re hashing it with and shattered is for a sha-1 which is an older hash you can create the new document that you want to interpret as being signed make the brute and do brute force you make a hash does it match no change the headers does it match now no change the header does it match now it’s possible albeit expensive so in order to do that I did the math on AWS and before discounts it would cost about $10,000 in GPU rental on AWS and it would take a week so it’s a lot of these things we mean when you hear about things that are possible to do it’s getting easier now that anyone can go rent ten thousand GPUs but it’s still prohibitive and there are also ways around that you can use this shot too and use another has well an md5 so if you hatch it twice using two different algorithms and you have two different values it’s nearly impossible to find one not impossible but nearly impossible to find one that you could be able to create both with that would take a lot more time hashes although fantastic by themselves aren’t terribly useful because all the hash tells you is that that is this that is the data that you have you got .

So if I download a file and the hash match is fantastic that’s the that’s the actual data there was there but who’s to know that the hash didn’t change with the data so on an attacker uploaded a new file that has malware in it and then generated a new hash it’s perfectly central I’ll download it I’ll verify the hash the data is good so what you want at that point is digital signatures digital signatures are hash mixed with a key with a function that uses a key so it’s not like you would just stick the key on stick – off and so that’s going to verify the integrity of data as well as the fact that the individual that you are assuming created that the data do create the data and so a digital signature looks very simple like a hash so these are hmx and it follows the same thing and there are some rules that have to be required for an HM AK and that is that if you have the same data with the same key you get the same signature every single time that’s kind of a requirement if you have the same data with a different key you get a different signature because that’s to prevent someone from being able to say here’s the data you got from persona but person B signed it and if you also if you have different data it will be different as well so it’s very similar to hashing although it’s putting that key in there that secret value that no one else should be able to know that can tell you that I’m signing it and as it’s me as an individual signing it so if you’re if you sign your GitHub commits which I have – you’re doing this you ‘resaying that I as an individual signed this and you can trust that this commit was made by me not because it just has my email and my name on their because anybody can put that in but because I have been I have provided GitHub with my public key and it can verify the signature that I made with my private key against the data key duration is another really good use of hashing functions.

So key derivations are cyclical hashes where the hash is fed into the lower them over and over again to generate a brand new hash it’s done many times to require a lot of computing power the idea behind it is to in require a lot of computing power hashes should be very fast if I need to verify data I should be able do that very quickly the faster the better however for passwords I don’t want you to be able to try them constantly over and over again I wanted to be very difficult for you to be able to brute-force a password so key derivation by design is very computationally expensive and it also uses in good methodologies for key derivation they use random values called salts and the salts are used as well over and over again to create this different value and I’ll show you a very oversimplified value or oversimplified example of why that’s necessary so I hate passwords is actually on the top passwords in use today mixed with a random salt get you random value the hashed value you take the hashed value in the salt you get another value and you do it again and again and again and so that’s how you ‘regenerating you’re basically going over and over again to make it computationally expensive which is important because the way that passwords work is you’re trying to give your users time to change them one of the nice things about passwords as opposed to biometrics is you can change them it’s important to secure these things in such manner to buy your users time so if your site gets attacked and you may just have a simple blog site and you and many people think well I just have a simple blog site what do I care unfortunately your users probably use their email as the username and they probably use the same password for their email that they did on your blog most people don’t know better and that’s what they do and if I can hack your site and get their password and I have their email and then I can get their email account.

I can now do password resets on every other thing that I want to have access to those because those are always sent to email so I can get their bank account I can get their credit cards I can get enough information from their I can get on to their shared drives where they probably have their tax statements that have person identifiable information and enough to impersonate them however I like so if you only have very simple site that you require people to log in to try and be very careful with what you do with their passwords because again if users are impersonating people on your site you may not care but your users don’t know security and they’re doing terrible things like some of you may also have just gotten very scared when I told that as I did when I was first told these types of things I didn’t know any better figured well I’ll have certain types of passwords and that’ll keep me safe but again it’s important to as you’re dousing these key derivation functions you want to make them as strong as you possibly can as many iterations as you can stand because what that means is if you if you’re using the password functions in PHP as you go through large number of iterations every time I want to try and attack your passwords I have to do a large number of iterations every single time for each attempt brute-force so how many people here remember.

Ashley Madison the talk about Ashley Madison:
all right all right that’s a little or popular so Ashley Madison was a website for non-marital affairs most of the individuals most of the men on this site were marital they were also because of the way the site work they tended to be unusually important a large number of military government and high-level c-class executives information was found on this site and it used B was PHP and it used the password function which meant it was very secure so the people attacking it after they were able to there’s this thing called the top 100 passwords it’s a list that people keep updated from things they’ve been able to attack they said I’ll try the first 100 which gives you anywhere from 10 to 30% of the password base and then they just stopped because it’s just too expensive to try and just brute-force all the other passwords one character at a time until they found a back door because they had left in a separate hash non key derivation that was built using the username and the unencrypted password and the unhatched password which was just a sha-1 and sousing that using the known username and the known the symbol in there they were able to just generate everybody’s passwords so always important KeterBatian because attackers will just walk away more often than not because they know that by the time they are able to crack it the users will change most users will change their passwords that’s why cuter evasion isn’t important so getting good cryptography.
Importance of Good Cryptography:
we have a kind of basic understanding of how cryptography works the key to your own implications will be using good cryptography and his cryptography has been used for centuries with varying levels of success many of us have likely used bad cryptography at some point and if you’re as old as I Amit may have often been good cryptography when you started using it right that’s like using an md5 or a sham for password12 years ago using a salted sha-1 for password was considered the best password security available and it was just fine but today I can crack that on my phone so as technology increases so does the need to make us better cryptography but it’s important I think for you to understand what makes good cryptography because what is good cryptography changes throughout the years it will change and change and change and changes PHP 7/2 is taking us to a new level of cryptography with live sodium but it’s important to understand.
Why it’s important and what makes it good so good cryptography makes it obscures data in such a way that it’s difficult and costly to duplicate a reverse?
Alright just we talked about with the keyderivation it’s making it very expensive to try and get someone’s password when you’re using algorithms you want them To be very costly and difficult because it’s going to make it harder and there’s very good reason for that and we’ll talk about kind of how this works and why it’s important into two different sections entropy and computational cost and well if you don’t what entropy means we’ll get to that as well they’re independent as well as interdependent so things that can be can have good entropy have nothing to do with the computation but together they create good cryptography so when attacking it there’s two ways that you can attack cryptography one is pattern analysis the other one is brute force and pattern analysis is you often hear it referred to as crypt analysis that’s taking some information you know about the subject arrow subject and seeing if you can determine patterns and the other one is just trying it again is this it know is this it know this into that’s brute forces brute force is much like smacking rock with a hammer if you hit a rock with a hammer long enough the rock will break depending on the size of the hammer is how long it will take all right because you can sit there with Avery small hammer or just keep it in the rock if you’ve seen very small streams cut through rock over centuries it’s possible it’s just difficult and costly in cryptography it’s just trying it over and over again all right is this it is this it is this it try the combination but based on Moore’s law:
The Hammers get more powerful every year more processing power more processing power and with networks never computing you can get multiple devices working on the problem the same time and with cloud platforms you don’t even have to purchase it right you can just rent the processing power from the cloud or you can build new hammers so these are Asics this is application specific integrated circuits and all these things do if you know anything about cryptocurrency these are miners and all that they are programmed to do is generate hashes they are extremely fast at generating hashes there’s also chips that are very good at math.

So, this is a an NVIDIA GeForce chip and this is a GPU GPUs are very good at making Mac mathematic computations.

Trying to break Some encryption?
If you’re going to try and break some encryption this is fantastic they also do really well at generating hashes and we won’t even get into the quantum problem all right there are some algorithms that are available in PHP that are considered quantum secure and it’s not because they can beat quantum computers it’s just that the algorithms were made and with a type of complexity that makes it difficult for the current quantum computers to do but nobody is going to be using quantum computers to break your encryption today it will be the quantum computers of tomorrow so we’ll see where that goes but there are ways to fight all that power and algorithm complexity is one of the top kids they’re harder math requires more computation so the difference between algorithms before and algorithms today is that the new algorithms are more computationally complex it requires mockup to generate the encrypted value or decrypt the encrypted value back to plain text and history is littered with algorithms that were too easy to brute-force me crypt is no longer in PHPthat explains everything right it had algorithms that were so old and so outdated and the fact of the library itself was it was taken away because it was too old but most of the algorithms that were available weren’t even considered safe ok.

when I started using PHP?
A latten years ago Triple DES was considered safe it’s now considered completely terrible don’t use ties is the new standard but now that’s changing as well with Lib sodium so it’s constantly evolving to create more complex algorithms large keys are also very helpful depending on your algorithm specifically very large prime numbers in asymmetric encryption algorithms so your everyone here have to update their website to something larger than a 1kkey a year and a half ago for google nobody post website and Google said you need to change this so we’re going to lower your rankings okay so Google live was about a year and a half ago said if you don’t have a like I think it was three years ago they said if your site isn’t secured by SSL you’re going to drop in the rankings and everybody said oh nose and let’s encrypt was born all right the easy way to the easy and freeway to-do SSL shortly after that they said 1kkeys are not enough because there are a lot of sites with 1k he’s out there so2k keys are required and that’s what we mean by large keys what that what a what 2k key is that’s a 20 mm 48 byte integer it’s a very large integer and it’s not just an integer it’s a prime number so just finding a prime number that size is very difficult doing matron it’s even more complicated which makes it difficult to brute-force if you’re using symmetric encryption your larger key means they’re going as they try and brute force it’s more combinations that you have to try which makes it again more difficult.

So, The larger the key size the more secure system resources requiring large amounts of sis immune resources can prevent the ability to use the things that you’re not expecting to use something like:
Ana SIC or GPU they’re very good at doing little things very small computations but if you require that they use two threads they can’t do that or if you require that they use a hundred and hundred Meg’s of RAM they can’t do that either or even a Meg of RAM on most of them so you can increase the complexity by adding in the requirement for using memory or using a number of threads which is what the new password stuff does for Lib sodium and the argon to implementation iteration so in iteration you’re just doing it more times right so whether it’s more substitution inside of your encryption algorithm that’s part of making them more difficult is that they do more substitution rounds they do more mathematic rounds your binary mathematics makes it more complicated and makes it more expensive to do and then password hashing is the big one right where you just do it again and again and again and again and again as many times as you can bear and that will double do that so the other tool for cracking is crypt analysis and graph analysis and I show the puzzle up here because in World War two these are the people they would find to break codes they would find people who are really good at crossword puzzles because crossword puzzles you’d have question you might have two or three of the letters and you have to come up with word that would match so you’d have to have the subject matter and a couple hints and they could find the words which is basically what crypt analysis is today we use machine learning alighted don’t need people we have machines so mixing the knowledge and identifying the patterns to fill in the blanks.
It’s been used for years but today brute force is usually easier but if you could reduce the cost of brute force you’re going against what you’re trying to create right you’re trying to make it more expensive and more difficult if you can reduce the amount of effort you have to spend in brute force by understanding what certain pieces are most likely To be then you’re reducing that cost for brute force so entropy we talked about this little bit earlier cryptanalysis is fought with entropy removing order and creating the appearance of randomness right in computers there is actually no such thing as random but we can try and make the parents of randomness as individuals we cannot do random because we have our own biases herbs are our bias and we have we were more likely to come up with the same if you said come up with a random string in your head if you wrote them all down you would come up without eventually if there was large enough list you would see that ascertain number of values would come out more often than others.
Because it wouldn’t be random because your head doesn’t work that way and computers are programmed to do the same thing the same way every single time if a computer does something random you have a problem we call those bugs all right so trying to get computers to-do random can be difficult and real-world data has very predictable patterns which is why entropy is such a big deal in computing so if you take for example an HTTP message so this is Avery standard HTTP message where you make a request to get information and it’s going to give you back data is the response right so I’m asking for account information and it’s returning me account information so JSON responses look very similar I can very easily guess just by looking at someone’s APIdocumentation what these pieces will be I know that I’m going to be getting a200 response I know that content types could be application JSON a very quick curry will tell me what the server name and version is and I know what the fields are going to be all that is going to always be the same on every response unless they change the server version or they change their API.
So, I can very quickly predict most of the data in this response other pieces are highly predictable if I know when I’m making the request I will know what the response date is going to be right for one second I will know and then the next second I will always know I also know so I have a second to try and do terrible things before this is not valid but honestly how many people will validate the response times all right so it doesn’t necessarily matter I just have to test has to be a time if you’re even looking at that header at all other data also has predictability so dates of birth there are people that were born more likely on particular years than other years there are people that are if you’ll notice birthdays tend to fall on particular months as opposed to other months less predictable now based on the fact that everybody lives in well many people live in climate controlled environments people that would be in this type of database probably are names.

There are a limited number of names and you can definitely determine that ascertain number of names are much more popular based on the language in locale that other names so there’s predictability there and then some sort of identifier most countries are going to their country identifier becoming random values some of them used to just be counters so based on the date I had no arrange the date of your birth I know the range of your ID number your country ID number again that’s changing credential data credential data is also highly predictable most services use email forth username so if I know your email address which most people don’t think is being terribly private it passed around on cards it’s on every single it’s on your Twitter account it’s everywhere right if I’m if you use that as username then I have half of what I need already to attack you I have half the credential passwords also have a high predictability use so 68% of people overuse their passwords.

That means that I will use a single password on more than one site I think that’s low but in 2015 that was the value right and the top 25 passwords so when you get all the passwords on these nefarious dark websites these paste in sites that you can go you yourself can go find everybody’s passwords on paste bins today the top 25 of those constituted over 50% of the passwords so if I get password database and I just to try and the top 25 passwords so I each one of these items I try 25 times I am 50%likely to get it I’ll be able to hit the password in 25 tries that’s problematic nearly 17 percent of users safeguard their accounts with 1 2 3 4 5 6very scary but most people hate passwords I hate passwords I have pledged my life to getting rid of passwords for the last three years but they’re there and people hate them and they just want something they can remember so they will do the minimum necessary to get past the password policy most users will change will choose passwords based on the ease of recall right rather than entropy so because of that the reuse the predictability creates a serious problems people who are trying to secure the web and protect people from themselves we have to create you know the lack of uniformity across our data set so if you take a look at this data set if you’re if you’re like me and notice patterns very quickly you will see things that are matching these are three years of the same password right so as that said15% of your users will have the same password one two three four five six right so what you’re trying to do is you don’t want them to be able to single it out because without having to run single query against or a single attack attempt if I can see in the data that three out of five is the same I’m going to guess that’s one two three four five six just by knowing passwords or that’s the same user with multiple usernames but you have you will have commonalities across the data set right in your password database twenty-five percent of the people will be one two three four five six so all I have to do is go through there and find it.
Good cryptography uses random salts to add entropy to hashes so you add the salt tithe password and now they’re all different and that’s why you put salts on two passwords so that you cannot use the known tendencies for passwords against your database if you’re ever wondering but nearly every type of data has recognizable patterns so English as a language has patterns spaces can be determined very predictably most languages spaces can be detected very predictably single letter words have Avery limited number of characters two letter words are very predictable as well three letter words and then the most common letter in all the words if you don’t add some randomness to that to leave very easy to predict what they are so this is actually using really good encryption without using entropy so you can see that this is encrypted birthdates or just dates right so the first two are the same we don’t want that but even if they weren’t the same they’re all very similar most of the data in here is the same so this is encryption using an auto tag which is awash this is very modern encryption.

Are you Intrested?
If you’re interested it’s a really good book if you want to get deep into crypto cryptography engineering talks about thief you’re a math geek talks about math algorithms and explains the math behind it. so if you’re someone like me who better understands physics when you can do the math this is a good book for you if you’re not one of those people you can just skip the math and I’ll still explain it pretty well and then serious cryptography is a new book that I’m reading that’s actually pretty good it does a really good with randomness there are websites so the password manual it actually talks about it fairly well if you want to use sodium the pecola documentation is identical to the PHP the PHP extension and you can also start using lib sodium today using the Pecola installation there’s also open SSL and C spring the information is there as well and then a lot of things I learned about cryptography I learned from Wikipedia because it really has really good information on how does RSA work how do look to go carve work what’s an IDI to 5:09 curve those types of things so check that out if you want to get really into it and then we have like 30 seconds for questions okay if you have other questions which you probably do but your force for time I’m going to be here all week so please just come see me and find me down I love talking about this stuff check out the books I’ll put the slides up somewhere so you can go through those again.

Thank you for coming.