Updated 2012-09-10 14:32:35 by LkpPo

The acronym ETEN (Exhaustive Tcl Extension Network) was coined by Charles L. Brooks [1], Murray S. Mazer [2] and Frederick J. Hirsch [3] in their paper [4] for the Tcl'98 conference. Many thanks to them for this.

ETEN is the archive part of the overall infrastructure. It manages all packages created by the various users, the dependencies between, documentation, etc. Another important exported functionality is the ability to search for and retrieve managed packages.

It has been suggested that a more catchy name may be helpful.

Related page: Cantcl

I recommend you modifie the acronym to [TEEN] Tcl Exhaustive Extension Network

Let's start with the overall architecture of this thing.

It is currently composed of 3 interacting layers:

  • At the lowest level we have the physical storage layer.
  • Above that sits the indexing system.
  • And that one is accessed through the user interface.

Here is a picture.

It expresses some more ideas, which are not necessarily obvious:

  • It is possible to have more than one server at each level. This avoids the bottleneck a single server would be, but adds the complication of synchronizing their contents. We will deal later with that.
  • The user interface might belong logically to ETEN, but IMHO a placement in the SEE is more natural. It is basically the archive explorer in the picture given on that page.
  • Another point is the treatment of the locally installed packages as an archive of their own, with associated physical storage and index. This has the advantage of presenting the SEE a coherent interface without any special cases. Searching locally is no different from searching in the net.

The architecture described above seems to be very flexible, but its implementation will be much more complex than for simpler ones. Is that worth the effort ?

I believe it is, and will try to explain that in coming paragraphs.

Let us start with a one-layer architecture.

There is essentially one application performing all tasks, from the management of the physical storage, the indices and up to the user interface. This architecture looks conceptually simple, but that is about its only advantage. Among the disadvantages are:

  • This architecture draws heavy on the resources of the net, as it will cause the transfer of low-level mouse, keyboard and drawing events.
  • Another problem is scalability. This architecture does not scale at all. The more users connect to it, the slower the system will be.
  • Related to the last point: The central application is a single point of failure (= SPOF). If it goes down the whole system will be inaccessible. This is intolerable.

The next step is a two-layer architecture, it differentiates between archive and user interface. That already solves the issues above, but introduces a new one, stemming from the fact that technical systems like this do not float in the air by themselves, but are embedded in society.

The problem is this: To solve the second point in the last list we have to create more than one archive server. This introduces the necessity for synchronization, but this is only a technical problem. Well, partly, cause the data transfer required for syncing the archives may cross borders between countries and therefore systems of law. And these laws may forbid the transfer at all. The easiest example for this are cryptographic packages (like Trfcrypt [5]) which can be imported into the United states without a hitch, but exported only under very restricted circumstances.

So, to avoid violation of law we have to base the synchronization facility on a framework which is able to permit or forbid a transfer based on information about the involved archives and information in the package description of the package to copy around.

The solution in the last paragraph stays in the boundaries of the two-layer architecture, but does not address all of the arising problems. Which/what was forgotten ?

The user. In the two-layer architecture indexing and physical storage are coupled together. Disallowing a transfer between archives causes a fragmentation of the index information as well, and the user has to know in which archive/index to search for a specific functionality !

There are 2 solutions to this:

  1. We stay with the two-tier architecture, but have the user interface query all of the archives. My problem with this solution is the higher consumption of network bandwidth.
  2. We change the architecture. This is the path I've decided to follow, and it result in architecture from the picture above.

Some more remarks:

  • The archive level is still governed by law, this architectural change does not remove the necessity to implement a framework checking the executed/requested transfers.
  • The index level on the other hand is not restricted anymore by this as the information exchanged between the nodes consists of references to something, and not of the thing itself. Now the user interface can attach itself to one of the available index nodes and nevertheless sees a unified and complete index of all available packages, without care for their location.

Open issues:

  • Synchronization, on the physical level.
  • Synchronization, on the index level.
  • Physical storage.
  • Indexing
  • The user interface.

-- AK