Magnet links for whole folders

Add your feature request here

Magnet links for whole folders

Postby jotab » 07 Dec 2011, 12:50

I have an idea that would expand the possibilities for DC++ quite a bit.

If we were able to create magnet links (or similar) for whole folders and import and export these into dc++, it would be possible to create databases of original releases. One possible use is that you connect a database, that has matched these magnets to imdb id's, you would be able to browse imdb and just give dc the imdb id and you would be able to download the release. Another possibility is that you make dc++ read an rss feed, and subscribe to fresh releases of a tv-series for example.

Let me know what you think about this. Best regards!
jotab
Beta Tester
 
Posts: 14
Joined: 19 Jun 2011, 22:53

Re: Magnet links for whole folders

Postby maksis » 07 Dec 2011, 13:53

I was about to write about this to my reply in the grouping thread but then I just got lazy :p

I noticed that the client already sends TTH values for folders with search (also in NMDC hubs). The bad thing is that the TTH values are pretty much random and aren't comparable. I haven't really understood yet that why are they even being sent since it also makes a bit trickier to add this kind of features because there is no way of knowing if the TTH is real one or just junk without adding a support flag for clients supporting it (or I don't know if the TTH could appromaximately be validated from the size)...

Creating the TTH itself is quite simple from each file in the directory. It will increase the resource usage a bit especially when refreshing the share but I'm not sure that how much would it really matter. But yeah I really think that this would be useful and it opens many other possibilities. For example all folder in the partial lists could also be matched from the TTH (and possibly be marked as partial dupe if only the folder name matches but the TTH doesn't).

AirDC 2.30 will already create a "bundle" to the download queue for each folder that is being downloaded from someone else, meaning that it could support such common TTH values for each bundle very easily. Neither the sharing part shouldn't require much new code. DC++ also seemed to have plans for TTH grouping but I'm not sure that what they are going to do with it. There were also some questions about the implementation, like should all files affect the TTH (for example the NFO files often have different TTH values that would change the TTH of the whole directory, as the downloaded packages aren't as standardized as for example with torrents). On the other hand, maybe the users would download more identical folders with this kind of feature, as choosing which files to pick for the TTH sounds a bit nasty solution.
User avatar
maksis
Site Admin
 
Posts: 816
Joined: 23 Nov 2010, 18:56

Re: Magnet links for whole folders

Postby jotab » 07 Dec 2011, 19:23

My idea was to create the "folder-magnet" by just creating a list of all the magnet links for the files in the folder, this way could cope with differing sfv and nfo files, as well as supporting downloading from users having only parts of the release.

The magnet-link-lists would be added directly to the download queue and dc would start searching for alternate locations (alternate to the 0 locations known). Another feature that would be awesome would be to have it read an rss feed of magnet-link-lists to download every new episode of a tv-series for example.

Best regards!
jotab
Beta Tester
 
Posts: 14
Joined: 19 Jun 2011, 22:53

Re: Magnet links for whole folders

Postby maksis » 07 Dec 2011, 20:50

In a folder with 200 files the link would be 20000+ chars long (more than 20 kilobytes). Such long link isn't suitable for being sent in chat or in any instant messaging program (as it will be cut). I also doubt that such links would be used in any RSS feed. Additionally, you couldn't even use regular magnet links because the folder structure also needs to be known.

Having a single TTH for a folder still wouldn't prevent adding sources not having all files. A source with a complete folder is only needed when adding it in the queue, and after that files could be searched for alternatives separately. There's also a feature in AirDC 2.30 which is used for spreading incomplete bundles faster (I'll write a post about the new bundle features later on the forum).
User avatar
maksis
Site Admin
 
Posts: 816
Joined: 23 Nov 2010, 18:56

Re: Magnet links for whole folders

Postby jotab » 07 Dec 2011, 21:23

You are correct that such a large hash couldnt be sent in an instant message etc. But I do not see why you would want to do that?

As for RSS, it could send links to packed files containing lists. Like mini-file-lists. The beauty of having a list with all the tth hashes in is that you can do reverse lookups on files in share, and find out to which release they belong.

Is it neccesary to use straight magnet links? Or could you mark up the magnets in the hash file using xml for example? I envision these "folder-magnets" as partial file lists, containing all the data neccesary to find each and every individual file.

Do you see my point of view?

Best regards!
jotab
Beta Tester
 
Posts: 14
Joined: 19 Jun 2011, 22:53

Re: Magnet links for whole folders

Postby maksis » 07 Dec 2011, 21:51

So you are proposing something similar than .torrent files? Yes, I kind of see the point. But is there any other real use than have them in RSS feeds or similar? Using files for that feels that it's limiting the usage of those... transferring the links as text is much more easier than transferring files. And no possibility to share those easily in chat or have in filelist or in the search etc.

"you can do reverse lookups on files in share, and find out to which release they belong."

What you really mean with this? If I understand it correctly, in which cases would this be necessary with a strict folder structure?
User avatar
maksis
Site Admin
 
Posts: 816
Joined: 23 Nov 2010, 18:56

Re: Magnet links for whole folders

Postby jotab » 08 Dec 2011, 01:54

Yes, something similar to torrent files, but look at the big picture: A database containing TTH hashes for a huge number of scene releases, making it possible to both keep your share tidy as you can search it for releases containing erronous files, add the ones thats missing to the queue, have them sorted according to what the database recommends etc.

Also look at the possibility to connecting to this database, for example, thhrough an XBMC add-on, browse imdb for a movie, add it to the queue via an RSS feed or some API, and then have it downloaded without leaving XBMC (or whereever you would search for movies to download).

I think that this would bring great possibilities, and making it kind-of like torrent files in the sense that they contain hashes for all files would let you use it in any imaginable way.

As for the need of transferring over text protocols, like IMs. If you have a central database, this wont be neccesary, and if you are both connected to the same central DB, it is only neccesary to transfer the DB-specific ID.

If you would like to discuss this further, eg. on skype or something i would be interested. Just let me know!

Best regards!
jotab
Beta Tester
 
Posts: 14
Joined: 19 Jun 2011, 22:53

Re: Magnet links for whole folders

Postby OCTAGRAM » 21 Oct 2017, 18:36

maksis wrote:I noticed that the client already sends TTH values for folders with search (also in NMDC hubs)


Do you know precise algorithm of TTH calculation? The TTH in question was calculated by remote peer with FlyLinkDC, but that wasn't originated there. It's some ancient feature that was replaced with recursive/nonrecursive dcls long ago, but I am in need to recreate exactly that ancient folder TTH feature (officially called "Hash set extension"). I recall reading about it on adcportal wiki, and I personally put link into Russian wikipedia. Here it is: http://www.adcportal.com/wiki/index.php ... _extension . But link is dead long ago, and the Internet Wayback Machine haven't got a snapshot of it. I was not considering that page important, I even hated DC++ (and still do) for ever having this "feature" compared to recursive/nonrecursive dcls. So I didn't save its contents.

I recall that all files' hashes must have been sorted and then concatenated and then TIGER or TTH should have been applied upon this octet sequence. But I can't recall exactly, and my attempts to recalculate folder TTH from scratch failed. I tried to sort or not sort hashes. I tried to reverse endianness or not reverse endianness. I tried to reverse endianness before or after sorting. I tried raw TIGER or TTH. I just can't make it match.

Then I downloaded FlyLinkDC sources to find how exactly does it calculate folder TTH. Surprisingly I spent several hours already, and I still haven't found it. Downloaded StrongDC sources, same result. Everything is so asynchronous, and folders indeed eventually get TTH somehow, but I have no clue how does it happen with them. I'm destroyed.

Can anybody point me to the sources how do folders get TTH?
OCTAGRAM
 
Posts: 3
Joined: 21 Oct 2017, 18:14

Re: Magnet links for whole folders

Postby maksis » 22 Oct 2017, 10:29

OCTAGRAM wrote:
maksis wrote:I noticed that the client already sends TTH values for folders with search (also in NMDC hubs)


Do you know precise algorithm of TTH calculation?


My comment was about dummy TTH values that are sent for folders for no reason, but that has been fixed in AirDC++ already. AirDC++ doesn't support TTHs for directories and I'm quite sure that StrongDC doesn't have such support either.
User avatar
maksis
Site Admin
 
Posts: 816
Joined: 23 Nov 2010, 18:56

Re: Magnet links for whole folders

Postby OCTAGRAM » 22 Oct 2017, 11:20

For me, it started with search by TTH. I've got response like this:

tth_collision_probably.png


In the balloon tooltips the TTHs were indeed fully matching for file and folder. I thought "WOW! Did I just catched TTH collision in the wild? I need to verify everything". Remote peer looked like sane FlyLink DC++ r503-x64-19663 with MediaInfo, TS and HIT attributes just like normal FlyLink DC++ does.

maksis wrote:My comment was about dummy TTH values that are sent for folders for no reason, but that has been fixed in AirDC++ already. AirDC++ doesn't support TTHs for directories and I'm quite sure that StrongDC doesn't have such support either.


I continued my research and I can confirm that. Indeed, DC++ has got TTH for bundles (Bundle.h), but that didn't got to other forks.

I've found several options about TTH in the search results:

  1. Memory garbage. TTHValue() constructor does nothing, so remote peers can literally read memory contents and maybe find something useful there like passwords. This can explain how I got result with precisely matching TTH, but doesn't explain why it was sent to me. That was topmost folder named "Фильмы", and the only way to request it is to search for this keyword, but I can't believe I was looking for such wide keyword.
  2. GreyLink's folder TTH. I forgot that it's GreyLink-specific feature, I thought it to be more ancient. I managed to reimplement it in Ada (to be completely independent) and can provide specification and test vectors. But it is not being calculated for topmost folders, and "Фильмы" is a topmost one. And I'm still convinced that remote peer is a real FlyLink user.
  3. DC++ bundles TTH. I found their code. FlyLink indeed does not have bundles.

The user in question is online now. I searched for my TTH in question and got nothing superficial. I searched for "Фильмы" just in case and got the following:

Code: Select all
$SR User_sa123434 Фильмы 14/15♣TTH:VABAUAAAAAAAASG5X53QAAAAAACAAAAAAAAAAAA (hub:411)|
$SR User_sa123434 Фильмы\Films 14/15♣TTH:6AAAAAAAAAAAAAAABIAAAAAAAAAPEABAAAAAAAA (hub:411)|
$SR User_sa123434 Фильмы\Iphone 14/15♣TTH:6AAAAAAAAAAAAAAABIAAAAAAAAAPEABAAAAAAAA (hub:411)|
$SR User_sa123434 Фильмы\Multfilms 14/15♣TTH:6AAAAAAAAAAAAAAABIAAAAAAAAAPEABAAAAAAAA (hub:411)|
$SR User_sa123434 Фильмы\Multfilms\Бюро находок 14/15♣TTH:AIAAAAAAAAAAAIAAAAAAAAAAAAAPEABAAAAAAAA (hub:411)|
$SR User_sa123434 Фильмы\Multfilms\Маша и медведь 14/15♣TTH:AIAAAAAAAAAAAIAAAAAAAAAAAAAPEABAAAAAAAA (hub:411)|
$SR User_sa123434 Фильмы\Multfilms\Мешок яблок 14/15♣TTH:AIAAAAAAAAAAAIAAAAAAAAAAAAAPEABAAAAAAAA (hub:411)|
$SR User_sa123434 Фильмы\Multfilms\Раз горох, два горох 14/15♣TTH:AIAAAAAAAAAAAIAAAAAAAAAAAAAPEABAAAAAAAA (hub:411)|
$SR User_sa123434 Фильмы\Multfilms\Росийские 14/15♣TTH:AIAAAAAAAAAAAIAAAAAAAAAAAAAPEABAAAAAAAA (hub:411)|
$SR User_sa123434 Фильмы\Multfilms\Росийские\Котенок Гав 14/15♣TTH:F4AAAAAAAAAAAMAAAAAAAAAAAC4AECQAAAAAAAA (hub:411)|


Indeed, there is no persistent folder TTH there. It's now reported to be VABAUAAAAAAAASG5X53QAAAAAACAAAAAAAAAAAA. If I search for VABAUAAAAAAAASG5X53QAAAAAACAAAAAAAAAAAA, I get nothing. However, I saw this behavior on ADC hub Babylon, and it's offline now, and there is no other ADC hub, so I can't check now. Maybe it's ADC-specific. I still don't know why this response was sent to me, so the mystery remains open.
You do not have the required permissions to view the files attached to this post.
OCTAGRAM
 
Posts: 3
Joined: 21 Oct 2017, 18:14

Re: Magnet links for whole folders

Postby maksis » 22 Oct 2017, 11:55

I'm not familiar with Flylink's code so I can't really comment on that, but in DC++ those TTH values are just garbage.

Bundle.h in DC++ is basically just dead code so I doubt that it's in use anywhere (I'm not sure if the author of that code even had a clear idea of how the concept should work). Bundles in AirDC++ have effectively nothing to do with that code and folder TTHs are not used anywhere by AirDC++.
User avatar
maksis
Site Admin
 
Posts: 816
Joined: 23 Nov 2010, 18:56

Re: Magnet links for whole folders

Postby OCTAGRAM » 22 Oct 2017, 15:19

There is a code in ShareManager.cpp that injects search results from search queue, and I thought that's how bundle TTH could be located.

So this can be nothing but very strange bug. Higgs bugson. Thanks for clarification, maksis.

If anybody is still interested in original topic, I've built a playground called HttpMagnetServer. You can produce tons of dcls and torrent files there to check software against them. HMS is still ahead of time, unmatched by client software, so most files can't be processed properly.

dcls specifications:

dcls is a specific extension for partial filelist (xml.bz2). Dedicated single-piece extension was required to make it being opened by p2p client from shell as opposed to archiver. The format was introduced by GreyLink DC++, but now is supported by FlyLink DC++ too. I recall that it was found in IceDC++ long ago, but extension was DcLst. It's not really that hard to connect existing "Open filelist" menu command to file association.

Several improvements had to be done to make it handy.

First, there is a new dl= parameter in magnet links to dcls. When somebody posts a link to the chat, xl= is the size of dcls, and dl (Display Length) is the size to be displayed in parentheses (overriding xl). Otherwise "SerialName.dcls (1kb)" looks very odd. HttpMagnetServer transfers dcls by HTTP, so this can't be seen there. HMS could probably add dl= for dcls found in shares if only this information was present in XML, but it's not. It makes sense to think about XML version of magnet dl=.

Second, for in-p2p usage there is a recursive flag. It solves the following task:

Imagine you have created a dcls and put it into the folder with files it references. Then you give a link to this dcls to others. They download dcls to their "Downloads" directory, then open it, then download the folder described by this dcls. This way they get a copy of your folder, but dcls and other files are not in a single place anymore. Or they can even delete dcls forgetting to move it closer to files it refers to. And if you go offline, magnet link to dcls no longer works. <FileListing IncludeSelf="1"> attribute instructs client to put dcls inside of virtual folder described by this dcls. So when people download folder, they automatically get dcls near the files it references.

HttpMagnetServer is not in-p2p thing, so dcls there has no IncludeSelf="1" attribute. However, if it was a moderated portal, it might make sense to put IncludeSelf="1" into complete series to let it be distributed further in p2p. In p2p client UI the dcls generator should have it enabled by default. Also, the checkbox should be shown every time dcls is being generated. GreyLink DC++ is violating these design guidelines. First, there is no checkbox when creating dcls, it's in the settings instead (WUT???). Second, it's not the default.

Third, I actually didn't think it will be an issue, but it turned out DC++ clients do not tolerate BZip2 restarts. If you bzip2 two files, concatenate and then bunzip2 the result, you'll get concatenation of source files. It is true for official command line bzip2 utility and FAR Manager arclite plugin. But not for DC++. I relied on this feature to implement smart surgery. The idea was to have a web server that is able to create dcls and torrent for any subfolder just like HttpFileServer is able to create zip and tar for any subfolder. Online compression was considered to be too heavy. Offline pregeneration of dcls for every subfolder was considered to be space consuming. Instead I pregenerated a file in a special format with bzip2 restarts before and after <Directory /> tags, and stored offsets in the internal dcls file for every subfolder. This way web server can concatenate prolog, middle and epilog to make valid xml.bz2 file. Not valid for most DC++ clients, it turned out. Only FlyLink DC++ fixed this issue at the moment IIUC.

bzip2 surgery can enable almost instant share update even for huge shares. Moving bytes is faster than recompressing complete text from scratch. Also, partial filelist with subdirectories could be served by p2p client just like my webserver does.

torrent specifications:

In 2006 specification for including per-file TTH into torrent metadata was published. It also included useless SHA1 and far-not-as-useful-as-AICH ED2K. Life would be much easier if it was adopted. In a world more perfect than ours moderated torrent portals would yell on TTH-less torrents asking to recreate them. And torrent clients will show warning every time TTH-less torrent is being opened. Eventually only TTH-ful torrents remain. And then the protocol could be changed and even drop legacy info.pieces. NMDC, ADC, BitTorrent and Gnutella2 would be nothing but different use cases. They could be integrated in a truly multiprotocol client. But it didn't happen so far. The only torrent client able to read TTH is Shareaza, but only for single-file torrents, and HMS does not serve ones (although it can be implemented).

In the modern BitTorrent I have found similar BEP 47 Padding files and extended file attributes. Their way of embedding useless SHA1 matches 2006 specification, but they dared to forget about TTH.

So I decided to refresh people's awareness of 2006 year specification and implement it. Unfortunately, HMS does not has access to remote files and cannot calculate info.pieces for every subfolder. info.pieces is just not compatible with the way HMS works. HMS relies on clients that are able to live without info.pieces, but there are none I'm aware of. Also I had to generate fake info.pieces, because otherwise most torrent clients do not even show file list and comments.

In the end torrents were not as good as dcls. They are both subject of surgery in HMS, but surgery works better for dcls. Despite TTH being stored in binary form, torrents are often much bigger compared to dcls due to every parent folder name repeating over and over again in every file description as opposed to single opening tag in dcls. Also, since subdirectory torrent is generated using surgery, every file description must be ready to appear in torrent of any level. It results in subfolder torrents having parent folders names inside anyway. Subfolder torrent's file list looks like several nested in each other folders, each containing just one subfolder. And starting from some level the folder is actually filled with something interesting. Also, there is no way to represent empty subdirectory.

So without compatibility with legacy BitTorrent clients (all of them are legacy from my point of view) I can see no point in using bencoded torrent format compared to dcls.

Magnet to whole folder specification:

I'm against this thing. dcls and magnet link to dcls should be used instead, but just for completeness:

TTH of folder is subitems' TTH in binary form sorted in ascending order, concatenated and passed into raw TIGER. Directory structure is not made plain before sorting. Instead, TTH is being calculated for subdirectories, and then recursively TTH for superdirectory. So TTH of folder is independent of file names, directory names and order, but depends on exact directory structure. Also, empty directory has a special hash MRRSWKZAMVWXA5DZEBSGS4TFMN2G64TZAAAAAAA, it means "dc++ empty directory" & (4 * NUL).

For instance, folder contains two empty subfolders:
TIGER(MRRSWKZAMVWXA5DZEBSGS4TFMN2G64TZAAAAAAA & MRRSWKZAMVWXA5DZEBSGS4TFMN2G64TZAAAAAAA) = ESX7RUNWSLA2OVQWZ2LIICIUFNZUCKXCVP3LTSI;


Folder contains several files and two subfolders, each of them containing one file:
TIGER(L3J2JNQW46L2XQ2K3LVT2UKO53TTCQJKD7U4JDI) = NL3Q2MZUF5VN6MRAKP36MMLERI6KRUWFAKXP2DQ;
TIGER(45KI4QJ2I2PQYKGK6VRMMHW3OTHHQCKTGEWNPBY) = OXGTECJW4KAUKBJMJHLWXNCJXMYA6DDOZHL4MUI;



Magnet link to folders have xl=-1. It disables parentheses with file size when posted to chat. GreyLink DC++ does not generate dl=, but it would make sense.

Bundles:

Do they have anything in common with what I wrote?
OCTAGRAM
 
Posts: 3
Joined: 21 Oct 2017, 18:14

Re: Magnet links for whole folders

Postby maksis » 22 Oct 2017, 17:02

OCTAGRAM wrote:
Bundles:

Do they have anything in common with what I wrote?


Yeah... they are created from plain file listings (possible parent directories are included in each filename, empty directories are not supported) so they have something in common with torrent files. However, there is no defined format for importing/exporting bundles from/to external sources (excluding the API) as they are created based on the directory content that the user decides to queue from filelists.

For more information how the data is structured, I'd suggest that you look at the API docs: http://docs.airdcpp.apiary.io

Especially the queue section (including the bundle creation methods) might be useful. More information about bundles is also available at viewtopic.php?f=4&t=1856
User avatar
maksis
Site Admin
 
Posts: 816
Joined: 23 Nov 2010, 18:56


Return to Feature Requests

Who is online

Users browsing this forum: Bing [Bot] and 2 guests

cron