BitTorrent
BitTorrent is a protocol designed for the exchange of files between equals (peer-to-peer) on the Internet. It is one of the most common protocols for transferring large files.
Programmer Bram Cohen designed the protocol in April 2001 and published its first implementation on July 2, 2001. It is currently owned by Justin Sun, founder of the Tron Foundation, which acquired BitTorrent, Inc. for $140 million. dollars as of June 18, 2018. There are numerous BitTorrent clients available for various operating systems.
As of January 2012, BitTorrent had 150 million active users according to BitTorrent, Inc. Based on this, they estimated that the total number of monthly BitTorrent users was greater than 250 million. According to data displayed on their own website, current data estimates that more than 170 million people use their product each month. The bittorrent protocol moves up to 40% of the world's Internet traffic daily. At any point in time, BitTorrent has, on average, more active users than YouTube and Facebook combined (at a point in time, not in total number of unique users specifically BitTorrent). On the other hand, after the closure of the Megaupload website, the use of this protocol has increased considerably.
The new cryptocurrency comes about six months after BitTorrent was acquired by the global organization TRON, focused on creating a truly decentralized Internet.
Description
The BitTorrent protocol can be used to reduce the impact that distributing large files has on servers. Instead of downloading the file from a single server, the BitTorrent protocol allows users to join each other in a "swarm" (swarm) to download and upload the file simultaneously. The protocol is an alternative to the server-based system, which consists of having several mirror servers (mirror) from where the user downloads the complete file, and it can work on networks with little bandwidth, In this way, small devices such as smartphones are capable of distributing large files or broadcasting video to many receivers.
A user who wants to upload a file first creates a torrent file and distributes it in the conventional way (web pages, email, etc). It then makes the file available on the network through a BitTorrent node that acts as a seed (seed). Users who want to download the file get the torrent file and create another BitTorrent node that acts as a client or "leech" (leecher), exchanging parts of the file with the seed and with other clients.
The file being distributed is divided into pieces. Each time a user receives a new part of the file, they can in turn share it with other users, freeing the original seed from having to send a copy of that part to all users who want the file. In BitTorrent, the task of distributing a file is shared by everyone who wants to have the file. It is perfectly possible for the seed to only send one copy of the file and the file to be distributed to an unlimited number of users.
Each part of the file is protected by a cryptographic hash contained within the torrent file. Accidental modifications such as malicious ones are received in other nodes. If a node has the authentic torrent file, it can verify the authenticity of the complete file it has received.
Parts are typically not downloaded sequentially, they are reordered by the BitTorrent client, which checks what parts it has and what parts it has yet to receive. All parts of the file are the same size and are transmitted in one go (eg a 10 MB file can be transmitted in ten 1 MiB parts or forty 256 KiB parts). Because of this, you can stop downloading a file at any time and resume it later without losing previously downloaded information. This makes BitTorrent especially useful when transferring large files. This also allows the client to search for available parts to download easily, instead of having to pause the download and wait for the next part to become available, thus reducing download time.
When a client downloads the file completely it becomes a seed. This eventual change of clients in seeds determines the health (health) or availability of the file (determined by the number of users that have the complete file).
The distributed nature of BitTorrent leads to the file being distributed across many nodes. The more users join the 'swarm', the higher the chance that a node will be able to download the entire file. Relative to traditional distribution schemes, this allows the original distributor to reduce hardware and bandwidth costs. This also provides redundancy for potential system issues, reduces dependencies on the original distributor, and provides transient download sources (it's not always the same users sharing the file), thus making it more difficult to trace where the file is coming from. file by those who try to block its distribution compared to a file hosted on an external server.
History
Programmer Bram Cohen, a former student at the University at Buffalo, designed the protocol in April 2001 and released the first version on July 2, 2001.
The first version of the BitTorrent client did not have a search engine or peer-to-peer sharing. Until 2005, the only way to share files was by creating a small text file called " torrent ", which was uploaded to a torrent index site. The first loader acted as the seed and the downloaders would initially connect as peers. Those wishing to download the file would download the torrent, which their client would use to connect to a tracker that had a list of the IP addresses of other seeds and peers in the swarm. Once a peer completed a full file download, it could in turn function as a seed. These files contain metadata about the files to be shared and the trackers that keep track of the other seeds and peers.
In 2005, first Vuze and then the BitTorrent client introduced distributed tracking using distributed hash tables that allowed clients to exchange data in swarms directly without the need for a torrent file.
In 2006, peer exchange functionality was added, allowing clients to add peers based on data found on connected nodes.
BitTorrent v2 is designed to work seamlessly with older versions of the BitTorrent protocol. The main reason for the update was that the old SHA-1 cryptographic hash function is no longer considered safe from malicious attacks by developers and as such v2 uses SHA-256. To ensure backwards compatibility, the.torrent v2 file format supports a hybrid mode in which torrents are processed using both the old and new method, with the intention that the files will be shared with peers in v1 swarms and v2. Another update to the specification is to add a hash tree to speed up the time from adding a torrent to downloading files, and to allow for more granular checks for file corruption. Also, each file is now individually encoded, allowing for deduplication of the files in the swarm, so if multiple torrents include the same files, but seeders are only seeding the file of some, downloaders of the other torrents still can download the file. Magnet links for v2 also support a hybrid mode to ensure compatibility with legacy clients
BitTorrent compared to other P2P networks
The method used by BitTorrent to distribute files is similar in many ways to that used by the eDonkey 2000 network, but generally nodes on this network share and download larger amounts of files, reducing the bandwidth available for each transfer. BitTorrent transfers are usually very fast as all the nodes in a cluster concentrate on transferring a single file or a collection of files. In addition, the eDonkey2000 protocol does not reward those users who share a greater bandwidth. However, it should be clarified that the most widespread client for the eDonkey network, eMule, does include a credit system to reward those who share the most.
Unlike other sharing networks, BitTorrent does not include any file search mechanism. BitTorrent users will need to locate the torrent files needed by the protocol on their own. Typically, these files can be downloaded from websites that publish large archives (such as GNU/Linux distributions) or from searchable web indexes (such as The Pirate Bay).
The original implementation of BitTorrent was programmed in Python, although today you can find clients written in C or Java for example.
According to CacheLogic, during 2005 BitTorrent was used especially in Asia, while eDonkey2000 was preferred in Europe and America.
Client Programs
BitTorrent clients can be found in two different types:
- Multiple simultaneous downloads such as Vuze, BitComet, KTorrent, μTorrent, qBittorrent or Transmission.
- Unique download (only download a.torrent file, but several open simultaneously), such as BitTornado or Opera browser.
Structure of a BitTorrent network
A Bittorrent network is made up of:
- Peers (pares): All users who are on the network are named.
- Leechers (leeches): It is so called all users who are on the network downloading the file but who do not yet have the full file. It is also called derogatively to those who download files but do not share them.
- Seeders (seeds): They are the users of the network that own the entire file.
- Trackers (trackers): One tracker BitTorrent is a special server that contains the information necessary for the pairs connect with each other. It is initially the only way to locate which users contain the file you want to download.
- Swarm (enjambre): Enjambre are the users in general that tracker He's in charge of looking. The name is due to similarity with bees and their behavior; in this analogy, the tracker is the bees pane, bees swarm are users and honey is the torrent with the content.
Mechanics of how BitTorrent works:
- A low user of a web server a file Torrent. which contains the information of the file we want to download. Among other information contains the address of tracker to the one we have to connect to join the swarm pairs (laughs) Torrent. is usually a very small file, of a few kilobytes).
- This file Torrent. opens with some "customer program", which can interpret such information. There are many free customers that can be used. Among the most popular are μtorrent, Bitcomet, Vuze (formerly) Azureus). They're all protocol-based. BitTorrent original but some of them include improvements to the protocol.
- The tracker and pair they communicate through a 'HTTP connection'. The tracker reports on the list of all pairs and seeds that contain parts of the file that we want to download. The tracker updated with new information pair He just admitted.
- Once the pair knows where to look for the necessary parts, this pair it communicates with others using 'sockets TCP' or 'UDP' and the file starts downloading on the user's computer. Each downloaded part is automatically shared with others pairs.
.torrent files and their internal encoding
.torrent files contain information about the file we want to download. This information is encoded by Bencoding.
If we open a.torrent file with a text editor we find a dictionary that contains the following keys:
- info: A dictionary that describes the torrent files. You can have one or another structure depending on whether the torrent is to download a file or several files with a directory hierarchy.
- announce: string representing the Tracker URL
- announce-list: (optional chain list). It is used to represent lists of alternative trackers. It is an extension to the original specification.
- creation date: (optional entry) The creation date of the torrent in UNIX period format.
- comment: (optional chain) Free field for the creator of the torrent.
- created by: (optional chain) Name and version of the program used to create the torrent file.
The dictionary info that we have just cited contains the following keys:
- name: (chain) The file name or directory where the files will be stored.
- length: As we said in the introduction, the file we want to share is divided into pieces. This parameter is an integer representing the number of bytes of each piece. Too large pieces cause inefficiency and too small pieces form a heavier.torrent file. It is currently advised to fix the size of each piece in 512 KB or less for files of several GBs.
- pieces: Chain that represents the concatenation of the hash key list of each part of the shared file. The hash keys are generated using SHA-1 with a 160-bit summary and a maximum size of 2^64-bit. This set of keys is used as a mechanism to ensure the integrity and consistency of a part, once the download of that part has been completed.
- private: (optional). It is an integer that can have values 0 or 1 and that indicates whether pairs can be searched outside of the tracers explicitly described in metainformation or not.
- length: (entero) File length in bytes.
- md5sum: (optional chain). It is a 32-character hexadecimal string corresponding to the MD5 file amount.
- files files files files: It will only appear in case it is a multi-file torrent. It is a list of dictionaries (one for each file, but with a different structure to info). Each of these dictionaries will contain in turn information about file length, MD5 and a path (path) where the file should be located in the directory hierarchy.
Algorithms: piece selection and pair selection
This section will explain in detail the rules according to which one or the other user is chosen to share parts of the file and which parts are transmitted.
First of all we are going to describe a few important terms:
- Parts and blocks. Files transmitted using Bittorrent are divided into pieces and these in turn are divided into blocks. Blocks are the transmission unit on the network, but partly received parts cannot be served by a pair until they are complete, i.e. they have all their blocks.
- Interested. It is said that a pair A is interested in pair B (A is in the state Interested) when pair B has pieces that pair A does not have. On the contrary, pair A is not interested in pair B when pair B only has a subset of the pieces of pair A.
- Choked (locked). It is said that pair A blocks pair B (B is in the blocked state) when pair A decides not to send parts to pair B. On the contrary, pair A is said to unlock to pair B when pair A decides to send parts to pair B.
- Set of pairs. Each pair keeps a list of the pairs you know.
- Local and remote pairs. It is called local pair to the pair that is running the bittorrent client and pairs remote to the pairs that are in the set of local pairs.
- Set of active pairs. The pair A can only send data to a subset of its set of pairs. This set is called set of active pairs. The locking algorithm, which we will see later, determines the pairs that will form part of the set of active pairs. Only the pairs that are unlocked by the local pair and interested in it are part of the set of active pairs.
- Rare parts and set of rarest pieces. The rarest pieces are those with fewer copies in the pair set. In the event that the less replicated piece in the pair set has more copies, then all the pieces with more copies form the set of the rarest pieces.
Rarest first algorithm
This algorithm defines the strategy used by the Bittorrent protocol to select the next piece to download. Each pair keeps a list of the number of copies of each part in its set of pairs and uses this information to define its set of rarest parts. Let m be the number of copies of the rarest piece, then the position of each piece with m copies in the set of pairs is added to the set of rarest pieces. Each pair randomly selects the next piece to drop from their pool of the rarest pieces.
The behavior of this algorithm can be modified for 3 reasons:
- If a pair has dropped less than 4 pieces, it randomly chooses the next piece to go down. Once these 4 pieces have been downloaded, the algorithm works the way described above. The reason for this initial behavior is to allow a pair to lower their first pieces very quickly, as it is important to have some pieces to start trading in the Block Algorithm. In fact, a randomly chosen piece has many more copies than the rarest pieces, so probably the down time will be less selected randomly.
- The second cause is that when one of the blocks of one piece has been ordered, the other blocks of the same piece are requested with the highest priority. The reason for this behavior is to complete the download of a complete piece as soon as possible, since only the complete pieces can be sent.
- The last cause is the end game mode (End Game Mode). This mode starts very at the end of the download, when the pair asks for all blocks that have not yet been received to all pairs of their set of pairs that have those blocks. Each time a block is received, the pair cancels the request for the block received to all pairs as a set of pairs that have the active request.
Choke algorithm
This algorithm defines the strategy used by the BitTorrent protocol to select the next peer to interact with. It is used to guarantee a good upload/downlink ratio between the pairs. For example, "free riders", pairs that never raise, must be penalized. The algorithm is described from the point of view of the local peer, so "interested" means interested in the local peer and "blocked" means blocked by the local peer. The algorithm works like this:
- Up to 4 remote pairs can be unlocked and interested at the same time.
- Every 10 seconds, the interested remote pairs are ordered according to their down speed to the local pair and the fastest 3 are unlocked.
- Every 30 seconds, an additional pair will unlock randomly. This is called “Optimistic Unchoke Unlock” (Optimistic Unchoke), which has two objectives: it allows to evaluate the downward capacity of new pairs in the set of pairs and also allows pairs that have no part to share to get their first piece.
Limitations and Attacks
It is considered fair to upload the same amount of data that has been downloaded, that is, the download/upload rate is 1, but this behavior is not common or guaranteed in BitTorrent. Typically users are quickly disconnected once they have obtained the full copy of their file. This is called leeching.
This way of acting means that although all new content is shared very quickly at the beginning, as the days go by the torrent expires because there are no seeds left, since BitTorrent only offers incentives to leechers but not to seeds that don't get any advantage by staying on the network.
To prevent this, so-called private trackers have appeared. Normally to use a private tracker you have to identify yourself previously. For each registered user, they keep the traffic statistics and use a percentage system that allows to know if the user shares or not the data that they have downloaded or is downloading. Many of these trackers tend to expel users who have a low percentage, since by not sharing they are not collaborating with the network.
Another weakness is that the crawler is a bottleneck, consuming 1 per thousand of the total network traffic. Taking into account the amount of data that circulates in a network of this type, it is a percentage to take into account. Furthermore, the tracker crash means that new peers cannot connect and those that are already connected cannot discover others. Conceptually, small disorganized islands are formed, where in reality all the nodes are connected to each other through neighbors but there is no communication between these islands (the communication between them was managed by the tracker) and therefore it has a very negative influence on the download.
One more weakness of BitTorrent is that it is inefficient for small file transfers (a few kb) since the bandwidth used for protocol messages is comparatively high.
Unlike other famous P2P programs such as Kazaa or Emule, Bittorrent clients lack any content search utility. The.torrent files are usually downloaded from web pages where the news is usually indexed or there are torrent search engines such as mininova.org or The Pirate Bay, this method guarantees that the correct file is being downloaded. The Swedish courts recently sentenced the webmasters of The Pirate Bay to fines and prison terms for collaboration in the distribution of copyrighted files, despite the fact that the content itself is not hosted on their servers.
On the other hand, BitTorrent has a protocol considered to be one of those of the economic model. It is innocent, it is not prepared for malicious clients, and therefore the protocol does not present solutions for possible attacks through this type of client.
Here are some examples of these attacks:
- Denial of service through a Sybil attack
Each peer generates a unique identifier at the start by hashing its IP address and the time. If a peer generated multiple identifiers, it could happen that the set of active peers of the other peers in the network would only be made up of a multitude of false identifiers of the same attacking peer, and in this way only data would be served to it.
- Attack of “Only Seed”
A peer has to move up to the other peers if it wants to be part of their set of active peers. If a pair only connects to seeds, the entire file could be downloaded without having to share a single piece.
- Attack of “Corrupted Up”
Bittorrent verifies the integrity of the piece once downloaded by comparing its hash code with the one that appeared in the torrent's metainformation. When a peer requests a block from a malicious peer, the malicious peer can claim to have it even though it doesn't. What he sends is garbage, although the other pair does not realize it until the entire piece has arrived and they ask for the block again. Even though corrupted data is being sent to it, the upload rates are updated and the malicious peer can stay on the preferred peer list.
Current BitTorrent clients when they receive several corrupted blocks from the same client add their IP address to a list of blocked users and thus solve this problem.
Advantages
- The equal-to-equal protocol provides a better and safe download, as it does not depend only on the main central server. If a source is not active, you can rely on other sources to complete the download process.
- If there are interruptions in your Internet connection or computer is suddenly switched off, it is not necessary to start from scratch. Once you are back online, you will continue the process.
- You can easily find and download Torrent files. There are many Torrent sources and sites that provide quality content.
- Even if you have a slow connection, you can download a Torrent file very quickly, compared to traditional download methods.
Disadvantages
- Seeds are crucial to download Torrent files. If a Torrent file has no seeds, it will not be able to continue the process.
- There is no way to check the file before downloading it. You can see what the file contains, but you will not be able to see the quality. The only way to perform quality control is by relying on other people's online comments.
- With BitTorrent customers, it is loading and downloading files at the same time. This process can significantly affect the speed of your Internet connection.
- The IP address is displayed to many people in the enjambre. If you want to protect privacy, you should use VPN.
- There are many online discussions on the ethical side of the use of torrents. However, in the Internet era and social networks, it is really difficult to keep your work protected and safe from duplication.
- Torrent customers are just a tool for faster and more convenient downloading of valuable online material. We don't have to blame customers for illegal distribution.
Protocol improvements
The Bittorrent protocol is capable of being improved, so some clients have incorporated various modifications such as DHT, Web Seeding and Super Seeding:
DHT
If the tracker goes down, we rely on the peers we're connected to to finish a download. It is not always possible since it is necessary for them not to disconnect and for all of them to have the complete file.
The DHT (Distributed Hash Table) is an official extension to the Bittorrent protocol. It consists of each node in the network keeping information from the neighboring nodes. In this way, the tracker bottleneck we talked about before is avoided, since if the tracker goes down all the information of the peers is still available in the peers themselves.
It can be considered as a decentralization of the protocol, although it is not yet complete because it still depends on the tracker to download the.torrent and start to know the first nodes.
The use of DHT and the communication between nodes without going through the tracker entails an independence of the node that sometimes may not be of interest, as in the case of private trackers, where certain restrictions are usually applied to the user according to their statistics and where it is a common requirement to be registered (at least) to be able to enter the network.
Because of this, and to prevent private tracker admins from banning users of DHT enabled clients, we have added the ability to set a new parameter in the.torrent file called “Private Flag”. When the client reads this option it automatically disables the DHT for that download. This type of parameter is supported by all clients. So if you don't implement DHT you'll just skip it.
Web seeding
On many occasions, on the same web page, HTTP or FTP links appear in addition to the.torrent file as possible alternatives to download a file. The idea of web seeding is to combine the power of direct downloading from the server with that of P2P. This way there would always be at least one full seed to download from initially. The first client to include this enhancement was BitTornado.
Superseeding
Superseeding — in English superseeding (Vuze) or initial seeding (µtorrent) — is used so that the first seed that uploads a new file can reduce the number of pieces that it has to upload to create the first seeds and so that the pairs who are downloading it can do it more quickly.
When an “A” seed enters a swarm in superseeding mode, it does not appear in the swarm as a normal seed with 100% file but hides as a normal leecher with no data. As other pairs enter the swarm, the initial seed (the disguised pair) sends them a message that it has a new piece — a piece that has never actually been sent. This will cause peer "B" in the swarm to request only that piece.
When pair “B” finishes unloading that piece, the seed "A" will not tell you that he has other pieces to send until he sees that the first piece he sent to "B" has been shared with at least one other “C” peer. As long as that doesn't happen, peer “B” won't have access to any of the other pieces of seed “A”, and thus “A” won't waste its upload bandwidth having to resend pieces it has already sent.
Torrent file search engines
Due to the great proliferation of this type of P2P downloads, personalized search engines specialized in finding torrent files hosted on servers such as The Pirate Bay or others have appeared on the web.
BitTorrent Vocabulary
The following list contains the main terms used in the jargon of the BitTorrent protocol.
- Availability (availability)
The number of complete copies of a file that are available for download. Each seed adds 1.0 to this number, because they have the entire file. A user with an incomplete file adds a fraction to the availability, if no other user has that part. For example, a user who has downloaded 65.3% of the file increases availability by 0.653. However, if two users have the same part of the file downloaded, 50% for example, and there is only one seed, the availability is 1.
Client (client)
It is the computer program that allows the exchange of files peer-to-peer using the BitTorrent protocol. Some examples of clients are Transmission, µTorrent, and Vuze.
- Health (health)
The health of a torrent file is related to its availability. In file directories torrent is usually displayed as a percentage and indicates the percentage of the file that is available. A file with 50% health means that only half of the file is available, so it is not possible to download the entire file.
- Directory (index)
A BitTorrent directory or index is a web page containing a list of torrent files (usually also including a description and other information) and a browser. Some directories also have their own crawler.
Contenido relacionado
Perforated card
Beonex Communicator
Windows Me