Former admin of DC# here (forgive the sourceforge hosting-- it was a long time ago! https://sourceforge.net/projects/dc-sharp/). Great write-up! This was a fascinating read- thank you for putting it together.
One issue to be mindful of- the HttpWebRequest.BeginGetResponse method does not honor timeouts, and you are on your own to timeout the attempt. Consider using HttpClient, if available in Mono / .NET Core. Otherwise, see MSDN for how to do this:
Great work! Quite annoying actually. I finished my own implementation in Python at about 10pm last night, this would have been most useful. I'm no C# coder, but it's nicely readable, and this is a much better write up than I'm sure I could do.
If anyone who hasn't tried doing this before, the "official" BitTorrent spec docs, namely BEP-3 (http://bittorrent.org/beps/bep_0003.html), seem little more than a vague blog post turned in to a "spec". However, somewhat conversely, this has lead to is a wealth of articles describing how to do it.
I didn't know of the RFC mentioned in the post, that would have also been really useful.
A lot of BitTorrent stuff for Python is remarkably hard to find in all the noise of Deluge, the original client, and libtorrent wrappers, but none that existed were sophisticated (or at least well documented) enough for my experiments, they have different focuses.
I never went as far as implementing my own BEncoder library, a billion seem to exist in multiple languages and install any BitTorrent Python library and it seems to come with their own copy. (I suspect due to the way BEncoder was bundled in the original client, see: https://pypi.python.org/pypi/bencode)
I also found a Rust implementation which seems not to compile, but is useful as I'm trying to teach myself Rust https://github.com/kenpratt/rusty_torrent I think the work to get it to compile might be minimal.
" this would have been most useful. I'm no C# coder, but it's nicely readable, and I'm sure this is a lot better written up than I could do."
I agree. I don't do C# but mostly can follow it. It also is well-organized presentation of much of a protocol all kinds of people keep re-implementing. They need the help more often than not. A great write-up.
> I also found a Rust implementation which seems not to compile, but is useful as I'm trying to teach myself Rust https://github.com/kenpratt/rusty_torrent I think the work to get it to compile might be minimal.
There is also another project in Rust, it looks more active:
https://github.com/GGist/bip-rs
It is a collection of libraries.
> If anyone who hasn't tried doing this before, the "official" BitTorrent spec docs, namely BEP-3 (http://bittorrent.org/beps/bep_0003.html), seem little more than a vague blog post turned in to a "spec".
Doesn't look vague at all. What do you think is missing from it?
Thanks, I had seen that one, but forgot about it. I think it's a great project, but it's really just a collection of libraries that don't really tell you how it all fits together, which when I was picking stuff up wasn't very helpful. Hopefully now I have a better understanding of the client design I can make something from that.
> Doesn't look vague at all. What do you think is missing from it?
For a comparison I would recommend reading a few (what I would consider) good protocol docs. Docs that you could read and implement, and probably get working very quickly, for example:
- XMPP's XEPs (one picked for similarity in usage to BitTorrent) https://xmpp.org/extensions/xep-0020.html - Lots of examples in there for what messages should look like, which is always helpful.
I think the main thing that makes the biggest difference is adhering to a language spec such as RFC 2119 which recommends using "MUST", "SHALL", "REQUIRED"; "MUST NOT", "SHALL NOT"; "SHOULD", etc. which makes it really clear what you're meant to do or not to.
Specifically for the vagueness of BEP-3, how about this example that made me rage on IRC. In the description for the info_hash field in the Tracker section.
This value will almost certainly have to be escaped.
ALMOST CERTAINLY?? Will it, or won't it? Then, escaped? Escaped how?
What this turned out to mean was that the 20-bit binary sha1 hash MUST be URL encoded, and not hex encoded.
I would love to see someone try to build a BitTorrent client for the first time based solely on this doc.
---
BEP-3 also seems more interested in implementation detail, than describing the protocol. Take the last paragraph (before Copyright) as an example.
Something else which occurred to me today is that BitTorrent is not a spec, it's not been developed, it has evolved. Along with being built in a very modular way, i.e.: DHTs can replace trackers and simply dropped in, magnet URIs can replace Torrent files. This probably contributes it's success and longevity, but what this also means is that there is a lot of stuff, like metainfo, trackers, bencoding, that SHOULD belong in their own spec docs, which form a collective whole.
1. In your EncodeDictionary, you sort byte arrays by converting them to string. Correct but subeffective. See e.g. this: http://stackoverflow.com/q/19695629/126995 but add checks for nulls, authors of that code forgot about that.
2. You don’t need a dedicated thread to wake up every 1-10 seconds and do something small. Thread are expensive system resources, they own stack, cache misses are guaranteed then they wake up, etc. If your compiler supports async-await, use that instead + endless loop + Task.Delay inside the loop. If not, System.Timers.Timer class will do.
BEncoding and variants like REncoding are possibly one of my least favourite things ever. If you deal with the Deluge torrent client API you'll see it everywhere.
That aside, fantastic work on this, I think previously the only Bittorrent library for C# was an abandoned Mono project.
Yeah, monotorrent doesn't really work very effectively. Tends to get blocked by a lot of peers and really mediocre throughout when it isn't blocked.
The last time I was trying to build something that used bittorrent, I fell back on launching aria2 and redirecting and parsing its stdout, which as crazy as it was, worked much better.
Already bad if you wanted ASN.1 but can't get Galois's version. If you can get it, then choosing an ASN.1 alternative can be bad since it probably won't have a formal spec, verified parser, Haskell implementation, and so on. Probably a drop in correctness in some corner case vs whatever they made.
Note: I'd like to see them do a high-assurance JSON and/or XDR parser instead of just ASN.1. I know they did a Haskell-to-JSON library already. A strong one that extracted parsers or generators from a user-supplied specification with plugins for various programming languages would be nice.
HTTPS Everywhere redirects me to the HTTPS version of the page, but you've hard-coded http:// links for some/all of the resources, which the browser refuses to load, so it just looks like some big gray boxes.
Actually organising the write up forced me to tidy up the code much more than I otherwise would have. I definitely find C# to be one of the more readable languages although I have had to debug and untangle some C# messes before.
Yeah I remember changing it to a SortedDictionary but I changed it back. I can't remember exactly why, possibly because it's supposed to be sorted by raw UTF8 bytes rather than a nice neat C# string and I didn't want to start using byte arrays for dictionary keys. I guess it only needs to be sorted when in the BEncoding format and it felt better to keep the internal structure as simple as possible. The tradeoff is it doesn't support incorrectly encoded torrent files – I'm really not sure how much of an issue that is.
It's really nice to see a walkthrough of a non-trivial program all on one page like this. The clarity of the code and writing makes me want to port it to a different language because it seems like it would be easy with all the needed info in one place.
It's really great work OP. I know this would have taken you a long time to do but part of me can't help but wonder if programming is becoming even more like paint by numbers than it already is.
I am actually working on a GUI BitTorrent client in Swift, just converted my code to Swift 3 this morning, (I'm in the EU), quite far to go still, but I hope to have an alpha release out in Q4.
I guess a native Python implementation would be too slow. However, there is a fantastic libtorrent library that has Python bindings and allows to implement a torrent client in Python relatively easily.
BTW, regarding the original article, there is also a MonoTorrent library for .NET. Despite the name it can be compiled by Visual Studio. The original library was abandoned a while ago and seems to be buggy, but I was able to make a very simple .NET client with WinForms UI using this fork: https://github.com/ErtyHackward/monotorrent
The very first torrent client written by Bram Cohen (the person who invented bittorrent) was written in Python[1].
I remember it, because 15 years ago that was the only client available. Later people started creating other clients by forking his python code, and eventually rewriting it in different languages.
And no one used it, you know what was before C ? You probably don't because no one used it also after that. Azureus was developed 13 years ago and that was the client that was used... I know because i remember it also. And then they (from bittorrent inc) changed their python version to C++ and called it uTorrent because python was too slow and no one wanted to use it...
Lots of people used the Python client because not everyone wanted to run Java (memory hog) or were on Windows (uTorrent). Azureus had one advantage: first to support the DHT and trackerless operation.
I don't understand this encoding method. If say, a dictionary starts with d and ends with e, how do you know with "d3:key5:valuee" if the value is "value" or "valu"?
Somewhat disappointing that it's just a console app. I'd love to be able to do cross platform C# desktop development. There shock be something equivalent to WinForms/WPF on OSs other than Windows.
I've been looking at EdgeJs, specifically one version which was compiled to run in Electron. The idea being to make a bridge between electron and C# view-models (or whatever) so people can build cross-plat C# and HTML5 desktop apps.
Sure there won't be shared UI development support, but you can easily encapsulate app logic from UI. If you're brave enough you can try using Gtk# from http://www.mono-project.com/docs/gui/
And Mac. You could write shared business logic in C# for your Windows, Mac, and mobile apps, with a combination of .NET and Xamarin, and still use C# for each native UI.
But it sounds like you are actually thinking of Xamarin.Forms, which allows sharing UI code on mobile platforms as well as UWP. You are correct that there is no Mac support for that.
Actually that's exactly what I imagined to happen. What would be so fundamentally different in desktop development compared to mobile UI development that would exclude it from Xamarin? I don't see it.
Fun read, but using automatic properties might lead you down a path that isn't optimal;
Take this for example:
public byte[] Infohash { get; private set; } = new byte[20];
public string HexStringInfohash { get { return String.Join("", this.Infohash.Select(x => x.ToString("x2"))); } }
public string UrlSafeStringInfohash { get { return Encoding.UTF8.GetString(WebUtility.UrlEncodeToBytes(this.Infohash, 0, 20)); } }
You have an automatic property and two 'properties' that actually perform work every time you call the getter (might be smarter to make functions of those, so you know it's not just retrieval of data, but work is done).
If you were to rewrite this a bit, you could make sure the 'work' is done only when needed, and the properties become actual simple data retrieval properties like:
public class Hashes
{
byte[] _infohash;
string _hexStringInfohash, _urlSafeStringInfohash;
public byte[] Infohash
{
get { return _infohash; }
private set
{
_infohash = value;
_hexStringInfohash = String.Join("", this.Infohash.Select(x => x.ToString("x2")));
_urlSafeStringInfohash = Encoding.UTF8.GetString(WebUtility.UrlEncodeToBytes(this.Infohash, 0, 20));
}
}
public string HexStringInfohash { get { return _hexStringInfohash; } }
public string UrlSafeStringInfohash { get { return _urlSafeStringInfohash; } }
public Hashes()
{
Infohash = new byte[20];
}
}
Going further through the article, I spot many more items to improve; but let's not forget your did great work and the code is quite readable.
One thing that might help; is building some indexes to know how files are fragmented; you have the following code multiple times:
if ((start < Files[i].Offset && end < Files[i].Offset) ||
(start > Files[i].Offset + Files[i].Size && end > Files[i].Offset + Files[i].Size))
continue;
If you'd build an index to know which piece hits which files, you don't have to enumerate this every time.
Another general remark is to always 'retrieve' an indexed item from the array and use that instead of keep calling the 'indexed' record.
So; do:
var file = Files[i];
if ((start < file.Offset && end < file.Offset) ||
(start > file.Offset + file.Size && end > file.Offset + file.Size))
continue;
The code becomes more readable and allows you to change the structure later on more easily since you don't have 100 references tot he same array now and only use an itermediate.
Thanks for the comments! I'm all for improved readability. I definitely wasn't aiming for much performance wise, at least initially. These were two areas (of many) that I felt could do with some improvement (especially the file IO). I will look into making some modifications like those suggested when I get a chance.
From a quick look at the source [0], it looks like it supports Mono. I can't see anything stopping it being ported to Core, apart from JonSkeet.MiscUtil may not support Core.
If you package this up as a NuGet package and support Core then can you please ping me and I'll add it to https://anclafs.com.
I've been prototyping a .NET Core port. JonSkeet.MiscUtil source code is fine for Core, but for the BitTorrent code we need to change the HttpWebRequest objects to HttpClient and rewrite the Begin/End/IAsyncResult/callback-style to async/await.
One issue to be mindful of- the HttpWebRequest.BeginGetResponse method does not honor timeouts, and you are on your own to timeout the attempt. Consider using HttpClient, if available in Mono / .NET Core. Otherwise, see MSDN for how to do this:
"In the case of asynchronous requests, it is the responsibility of the client application to implement its own time-out mechanism. The following code example shows how to do it." See: https://msdn.microsoft.com/en-us/library/system.net.httpwebr...
I'm not sure if you have access to the ThreadPool class. In a bug that Microsoft's library had, I used the TPL Task construct to resolve this. See the pull request here: https://github.com/Microsoft/ProjectOxford-ClientSDK/pull/83...