GetRight

Code and programming notes for BitTorrent and BitTorrent DHT

Notes for BitTorrent™ authors from the GetRight author.

As I was developing the BitTorrent support in GetRight, I found a few quirks with other clients or things that weren't as documented as I would have liked...other implementors may wish to be aware of these things too!

  • When pipelining requests, most clients return the blocks requested as if they're on a Queue (first requested, first returned.) But at least one client does them as a Stack (last requested, first returned.) It messed up some of my pipelining things until I added a special check to handle it. While it doesn't explicitly say anywhere in the documentations, I--and obviously most other implementors--believe a Queue is correct, but you'll likely need to handle both.

  • Not sure if it was a bug, but I saw one client sending duplicate Have messages, so be prepared to handle them.

  • Several clients support some sort of HTTP/FTP seeding. And I've even seen that the .torrents available on bittorrent.com included a http seed. Whether they read my notes about it, I don't know :)
    GetRight's HTTP/FTP Seeding for BitTorrent

Notes about the BitTorrent DHT Protocol

For the DHT Network, there were even more things I had to reread (and less information from other sources in general.)
  • It took me a few times reading to realized that Yes, you treat the XORed SHA values as a 160 bit long number.

  • The "t" value described in the text on bittorrent.org said "single character string value" (which I at first read as meaning somthing like "0" or "A", but several of the examples showed it B-Encoded as an integer 0 too which added more confusion.)
    From watching the values sent by various clients, it's a string. GetRight for example will send 6 base64 encoded digits (so ASCII text), others send binary values. Treat T as an "Opaque text string".

  • Several timeouts were not specified:
    • I timeout waiting for a reply in 20 seconds. I'd heard from the libtorrent.sf.net author that in his testing, 95% of the replies that will arrive given a 120 second timeout will arive in the first 20 seconds. (This timeout does matter and needs to be not too long based on the alpha a value listed below.)
    • Timeout for cleaning peers from Torrents I'm tracking if they haven't reannounced, after a bit of looking at the Mainline code, I finally saw the one variable for this. The mainline BitTorrent times them out after 30 minutes.
    • So the timeout for reannouncing to the DHT, presumably should be less than 30 minutes. GetRight does 15 minutes. But also sets so any failure to announce will retry in 10 minutes...to give an extra chance before it would be expired.

  • It is not in the specification on BitTorrent.org, but GetRight, uTorrent, libtorrent.sf.net, MooPolice, and likely others add an extra key to the DHT messages.
    • Version/Client: "v", it contains a 4 byte string where the two first bytes identifies the client, in these cases GR, UT, LT, and MP. The two last bytes are binary/text and identify the version of the client. For example, GetRight 6.0x sends "GR60"

  • In testing, the closest nodes I found when doing searches matched the first 18-22 bits...which is 218 to 222 == 1 in 262,144 to 4,194,304 which seems about right.

  • The uTorrent and libtorrent.sf.net authors talked, and had an idea that the write Token returned by get_peers should be generated to include the info_hash of the torrent that the requesting peer is searching for. This would prevent a malicious peer from spamming lots of random info_hashes to a client. I thought it was a great idea, so GetRight does this too.

  • The uTorrent and libtorrent.sf.net authors also thought about limiting the number of torrents a particular peer can announce, again to prevent spamming and another good idea. Given the 218 closest bits matched I found, it's very unlikely a peer could be downloading even two torrents that would match even the first 18 bits of the info_hash (1 in about 60 Trillion.) So just limiting to 2 or 3 info_hash announces per peer would fit the math. {I haven't done this yet, but sounds like a good idea too.}

  • I implemented so it has a "Main" tree, plus a "Search" tree for each different torrent it is downloading. When it's done with a torrent one, it will discard that Search's tree. There is no "housekeeping" done for the search trees...and it isn't retained when the application is closed. But lets it reannounce quickly.

  • The Kademlia Paper discusses an alpha a value, the number of parallel asyncronous queries to do at the same time.
    • The BitTorrent.org DHT document didn't discuss this as of now, but I've heard from the libtorrent author that libtorrent uses 5, uTorrent uses 8 right now. (And heard third hand that some routers don't like lots of requests, so it matters for that reason too.)
    • GetRight again does a little different, it uses 3 for each search, plus 3 for any "housekeeping" on the main tree. This lets each search run independantly, so any bad branches won't affect everything else. Any more than 3 per group are queued, and every 3 seconds, it will send one item in from each queue--to keep a set of bad nodes from blocking up everything for a long time. (And when it gets a reply, it sends the next message from the appropriate queue as well.)
    • According to some stats GetRight will make me, 0.1 K/sec is the average for what it's sending just after starting up--when it's doing a bit more checking than usual, slowly checking all its saved nodes are still there.
    • When adding an item to the queue, it includes a Priority for any find_nodes, get_peers sorts. The priority is simply the number of significant bits that match between what I'm searching for and the peer's ID, so 0 to 160. When sending the next queued item, the one with the highest priority is sent first. This will help send the messages to the best peers first.

  • I've gotten some bad responses as well.
    d1:rd2:id19:Jxmn8.Fh^vNe1:t8:cfB+Oi
      Actually is doubly wrong, sending an ID of only 19 bytes (instead of 20) and the whole reply is truncated in the middle of the T value.

Thanks!

Some thanks for help with this...
  • Bram Cohen for creating this whole BitTorrent protocol!
  • BitTorrent.org for posting details about the DHT Network and BitTorrent protocols.
  • Arvid Norberg (libtorrent.sf.net author) for helping answer some questions and passing on information about libtorrent and other client authors he's talked to.
  • Petar Maymounkov and David Mazieres for the Kademlia protocol, which is used for the DHT Network.
  • Legal

    BitTorrent and Torrent are trademarks of BitTorrent, Inc.
    GetRight