In order to properly do performance testing, you need lots of data. Sometimes you already have plenty of DNS data to use in evaluating a DNS server, but when you don't, generating it can be difficult - until now.
The DLZ performance tools can easily generate data sets to use in testing DNS servers. A simple configuration file controls the number of zones generated, the number of host records in each zone, and the output format. Currently, the tools generate files appropriate for bulk loading into all of DLZ's supported databases, Bind zone files, and to a CSV format too. Export to CSV allows for the generated data set to be used by others so performance of different servers and configurations can be compared.
The DLZ performance tools also help with testing a DNS server's query latency, update latency and startup time. Query latency is how long a DNS server takes to respond to your query. A DNS server without much load should have a low latency. A server that is under heavy load may have a higher latency. Update latency is the time taken between submitting an update to a DNS server and the new data being served in response to DNS queries. Updates to an unpatched version of Bind can be done by editing the zone file and refreshing Bind, or through RFC2136 updates. Updates to DLZ managed data are done through your database. Startup time is the total time taken between starting a DNS server and it answering queries.
- dnsDataGen.pl - PERL
- dnsCSVDataReader.pl - PERL
- Configuration file
- Standard test data
- timeDnsLatency.pl - PERL
- timeDnsRefresh.pl - PERL
- eliminateDups.pl - PERL
- randomizeLines - Bash script
- dictionaryFixer - sed command script
dnsDataGen.pl - PERL |
dnsDataGen generates randomized DNS data. The application is controlled through a very simple configuration file. You can set the number of zones to be generated, the number of hosts in each of those zones, and the format of the output file(s). The configuration file is documented below. The algorithm used to generate data is very simple. An input file is used to provide unique names. The input file should have a single word on each line, and each word must be unique. It is best to not have many (if any) short (3-4 character) words in the file. The application then loops through the input file one line at a time. The word pulled from the input file then has an integer inserted at a random location within the word. The integer is a count of how many times the application has looped through the input file. This is how host names are generated. Zone names are generated by using the same algorithm and then having ".net", ".com", or ".org" appended to the end randomly. During each loop through the input file some words will be used to build host names and others will be used to build zone names as needed by the application. This further randomizes the data, as the same names are not always used for only host names or zone names. Since the integer is inserted at a randomized location, any re-runs of the dnsDataGen utility will generate new random data. It is best to use an input file where each word is at least 3 or 4 characters in length to provide more possible random locations to insert the integer. There should be a fairly low occurance of data matches between any two runs of dnsDataGen. This algorithm, while simple, should provide a good randomized set of DNS data even with an input file that is small relative to the generated output. For example, if a word in the input file was "grandchildren" and this were the first pass through the file, the generated zone name may be "grandchild0ren.net". On the second pass through the file, the zone name "gran1dchildren.org" may be generated. On the third, "grandchildren3.com", etc. It is also equally likely that during any of the passes through the file the word "grandchildren" is used as the base for a host name and not as a zone name. It is important that the integer be inserted at a random location within the input string to better simulate real data and force database indexes to work similarly to the way they would with real data. If the integer were always inserted at the end (or at any fixed location within the string), the indexing (btree) path would be too similar for each of the zone/host names, and test results would be useless. |
dnsCSVDataReader.pl - PERL |
One of the output formats supported by dnsDataGen is a CSV file. The dnsCSVDataReader can process the CSV file generated by dnsDataGen and then write the data back out to any of the supported file formats. This allows data sets to be re-used and converted to any of the file formats as needed. I highly recommend that one of the output files of a dnsDataGen run be a CSV file, as re-running dnsDataGen will generate new (and different) randomized data each time it is run. The dnsCSVDataReader is controlled by a configuration file just like dnsDataGen is. The configuration file is documented below. |
Configuration file | ||
The configuration file is very simple. A '#' anywhere on a line indicates the rest of the line is a comment. Comments and blank lines are skipped while processing the configuration file. Everything else in the configuration file must be a key : value pair. Key is the left most string, followed by a colon, followed by a value string. Whitespace around keys & values is removed. So " key : value " would be "key" and "value". Depending upon which application (dnsDataGen, or dnsCSVDataReader) is using the configuration file, different keys must be present in the file. Both applications require the key "inputfile". For dnsDataGen, this is the word list to create DNS zone/host names from. The dnsCSVDataReader expects "inputfile" to be a CSV file that was generated by dnsDataGen earlier. DnsDataGen also requires a single "zones" key and at least one, but possibly multiple, "hosts" keys. The value of "zones" indicates how many zones dnsDataGen should create; it should be a whole number. The "hosts" key requires a value in the form of "xx : yy" where xx is the "repeat count" and and yy is the "host count" or number of host entries to have in that zone. This will be explained further below. If multiple "hosts" keys are present, they are processed in the same order as they appear in the configuration file. Not all zones have the same number of hosts in them. Sometimes your particular DNS server may have a lot of small zones, or a few very large ones. When doing DNS performance testing, it is important to closely simulate your expected real world data. The "hosts" parameters were an attempt to allow this type of control within dnsDataGen. As all the alogorithms in dnsDataGen, this one is very simple too. Think of the "hosts" keys as creating a list of how many hosts to create in a zone. So a "hosts" entry that looks like "hosts: 100: 10" means that the next 100 zones that are created should have 10 hosts each. If your config file has multiple "hosts" entries, they are just appended to the end of the list. So "hosts: 100 : 10" followed by "hosts: 50 : 20" means that the first 100 zones created will have 10 host entries. The next 50 zones created will have 20 host entries. The number of zones created is controlled by the "zones" parameter. If the end of the list is reached but more zones are to be created, dnsDataGen will start at the top of the list again. If you prefer to think in percentages, just make sure all your "repeat counts" of the "hosts" parameter add up to 100, and that your "zones" parameter is a multiple of 100. So a config of "zones: 10000", "hosts: 50 : 10", "hosts: 40 : 20", "hosts: 10 : 30" would create 10,000 zones. Fifty percent (50%) of the zones would have 10 host entries. Fourty percent (40%) of the zones would have 20 host entries. Ten percent (10%) of the zones would have 30 entries. In the previous paragraphs, I have consistently used the term "host entries" and not "host records". That is because dnsDataGen will write two records for each "host entry", and 5 records for the "@" host of each zone. Every "host entry" is actually an "A" record with "127.0.0.1" as the data and a "MX" record with "10 zone" as the data, where "zone" is the zone the host exists in. The 5 records for the "@" host are the "SOA" record, two "NS" records, an "A" record and "MX" record. Now that you have generated all that data you need to do something with it. That is where the "writer" parameter comes in. Both dnsDataGen and dnsCSVDataReader use and can accept multiple writer parameters. Writers are perl modules that have been developed to output the DNS data in a particular format as needed by the different databases. Writers are included for each of the databases that DLZ supports. Unfortunately, the writers are not as flexible as DLZ itself in that their output expects your database to conform to a particular schema. However, custom writers are very easy to build if you use the existing writers as templates. I would recommend copying one of the existing writers to a new file and modifying it as your needs dictate. It is best to always leave the original writers included in the DLZ Perf Tools package intact and un-modified. To use and configure a writer, you need to create a "writer section" in your configuration file. A "writer section" is all the configuration statements starting with the first "writer" key, up to but not including the next "writer" key in the file, or the end of the file. Therefore writer sections MUST occur AFTER the "input", "zones" and "hosts" parameters. In addition to the generated DNS data, writers need to know where to put the data. They may need to know where to write a file, or usernames / passwords for a database, etc. Different writers will require different parameters. What dnsDataGen and dnsCSVDataReader do is read in each of the key : value configuration parameters in a "writer section" and then pass that information to the writer. The writer is responsible for verifying that it has all the data it needs to do its job. If it doesn't, it should output an error message and die. The best way to understand all of this is to look at an example.
In this example, dnsDataGen will create 200,000 zones. The first 100 zones created will have 2 host entries. The next 100 zones will have 3 host entries. The next 10 will have 20 host entries, and the next 2 will have 100 host entries. This is only 212 zones, but we told dnsDataGen to create 200,000, so it will start at the top of the list again, and the next 100 zones will have 2 host entries. This continues until all 200,000 zones have been created. The last few lines of the configuration file tell dnsDataGen to use the CSV file writer. We are also telling the CSV file writer which file to write to. Writers are grouped into packages. Packages just allow for related writers to be kept close together. All writers shipped with DLZ Perf Tools are in the binddlz::writers package, or a sub-package. If you look at the un-compressed DLZ Perf Tools, you will notice there is a directory called "binddlz". Within that is one called "writers". In that directory are more sub-directories and files ending in ".pm". Each of the "*.pm" files are perl modules / writers. You can use any of these writers with dnsDataGen and dnsCSVDataReader, or use them as templates to create your own. I highly recommend you create a separate base package for your own custom writers (something not in the "binddlz" directory). Here is another example:
There are no "zones" or "hosts" parameters in this file, so dnsDataGen can't use it. It's perfect, however, for dnsCSVDataReader. This configuration will read the CSV file we created in the previous example. It will then write the DNS data using several writers. As you can see, each writer takes different parameters depending upon its needs. The first writer creates a bulk load file for postgres. The only parameter it requires is the name and location of the output file. Next in the configuration is an LDAP writer. It requires several parameters. The first parameter is the output file. The "base" parameter is appended to the end of the distinguished name (DN) of each DNS record written to the LDIF file. The "header" parameter (broken over two lines here for readability) is written to the beginning of the ldif file before any of the generated DNS data. This header creates the "base" (ou=dns,o=bind-dlz) in the LDAP tree as required by LDAP before the DNS data can be inserted there. Thus, the final ldif file created by this writer can be directly loaded into an LDAP server that has a root of o=bind-dlz. The next writer in the configuration file creates a "filesystem" database. This writer does not create a single file, but instead creates the directories and empty files expected by the filesystem database driver. Here the "base" parameter tells the writer what directory to create the filesystem database in. The "maxlabel" parameter tells the writer to split zone and host name labels longer than 5 characters using the same algorithm that the filesystem driver uses. After the filesystem writer is a Bind zone file writer. This
creates zone files for Bind, and also writes most of the named.conf
file (to a file called named.conf-data). You only need to add the
appropriate key, controls, options and logging sections for your
DNS configuration and Bind will be ready to go. I recommend
creating a separate file with all the Bind configuration and then
combining these two files together to create the named.conf you
actually use. This is simple using the cat program.
Last is a writer that can create queryperf input files. This writer has its output piped to the randomizeLines utility included in DLZ Perf Tools. The output from randomizeLines is written to a file. Any of the writers that output to a single file can have the output redirected through a pipe by using "|" as the first character of the file name. Writers that output to multiple files do not allow piping. The configuration file is very simplistic, but offers a good deal of flexibility. Several writers are provided, and custom writers are very easy to develop. If you develop a custom writer that you think may be useful to others please contribute it back to the DLZ project so we can consider it for inclusion in a future release of DLZ Perf Tools. |
Standard test data |
I have created a "standard" data set for use with DLZ Perf Tools. This is the data set I use for the performance tests on DLZ. By providing a "standard" dataset, performance of different servers and configurations can be compared. The standard data set is available on the downloads page. |
timeDnsLatency.pl - PERL |
This application helps determine how long your DNS server takes to answer an individual query. When used to test an unloaded DNS server, the latency should be very low. A heavily loaded server will generally have a higer latency. To test properly, you should run this application on a machine with very low load. (I.E not running queryperf, or named). To test a loaded DNS server properly, you will need three machines. The first should be running your DNS server you wish to test. The second machine should run queryperf to put a load on your DNS server. The third machine is used to run this application and determine latency. This application is simply a wrapper around the "dig" program that comes with Bind. You will need to have "dig" available on your path for this application to work properly. If executed with no command line parameters, the application will print a brief description of how to use it. |
timeDnsRefresh.pl - PERL |
The purpose of DLZ is to allow for simple and quick updates to DNS data. Using timeDnsRefresh, you can determine how long it takes a DNS server to respond with updated data, or how long it takes a server to start. TimeDnsRefresh only has a granularity of 1 second, but this should be sufficient. Generally when it comes to updates / startup time, we are interested in "order of magnitude" differences - i.e. a difference of a second or two is not significant, but a difference of 30 seconds or several minutes is. TimeDnsRefresh uses "dig" from Bind to perform the actual DNS queries, so it must be available on your path for this application to work properly. If executed with no command line parameters, the application will print a brief description of how to use it. |
eliminateDups.pl - PERL |
EliminateDups is a very simple application to eliminate duplicate entries in a file. Each line is compared with the next to see if they are the same. If they are, the duplicate is skipped. Since the algorithm used to detect duplicates is so simple, it is vital that the input file is sorted. This will group duplicate lines together in the input file, allowing eliminateDups to find them. You can use the UN*X utility "sort" to sort your input file. This program reads from standard in and writes to standard out and does not take any command line parameters. |
randomizeLines - Bash script |
Randomize lines reads from standard in and writes to standard out. The program will read a line and prepend a random number to the beginning of it. This is then fed through the sort command which sorts the input alphabetically. Since a random number is at the beginning of each line, this actually randomizes the data. The output from sort is fed through sed to strip off the random number at the beginning of each line. The remainder of each line is then sent to standard out. The programs awk, sed and sort must be available on your path. When randomizing a large file, you must verify there is a large amount of free space in /tmp. Sort uses /tmp as temporary storage when processing large files. If /tmp is not large enough sort will be unable to work. If /tmp is not large enough you can use an alternate location for temporary storage. See the sort man page for the appropriate parameters, and then modify the call to sort within this Bash script. This script works on Cygwin & UN*X. |
dictionaryFixer - sed command script |
The dnsDataGen application does a good job of automating the creation of DNS data, but it requires a word list as input. Creating a word list can be almost as difficult as creating DNS data in the first place. The best source for a word list I could think of is a dictionary, or a dictionary file to be more exact. OpenOffice.org has recently created numerous dictionaries for its office suite. The dictionaries are available in a variety of languages and can be downloaded from here. I used the US English dictionary. However, the dictionary file
as downloaded from OpenOffice.org was not ready for use by dnsDataGen.
The file has some extra data on each line, and also has some short
(less than 4 character) words in it. To clean up the file execute:
Now the extra data has been removed from the file, and all words are at least 4 characters long and contain only alpha characters. However, the file is still not ready. The dictionary files seem to have a few duplicate words. To remove the duplicates first run the cleaned up dictionary file through sort to make sure everything is in alphabetical order. Then run the file through eliminateDups.pl to get rid of duplicates. After I had a "pristine" word list, I ran it through the randomizeLines bash script before using it with dnsDataGen. I think it just makes the generated DNS data even more random because the first few host names and zone names don't all start with an "a" that way. |