Open and Distributed Systems

________________________________________________________________________________________________

While the Department has had a long history of activity in this area, it has been changed significantly in the last couple of years. We have stopped our work in the development of Directory Services, the ISODE OSI stack, and X.400 message services. We consider this activity to be no longer in the research phase, and have transferred our work to the ISODE Consortium and to NEXOR. We have good relations with both companies, and have retained the rights to use their commercial products in our research activities and with our partners for research purposes. We have also halted our work on the development of ODA editors, partly because we were not convinced that that Standard was going to lead to commercial products, and partly because again we thought it had gone beyond the research phase. We have concentrated, instead, on more advanced topics arising from our earlier work.

In the Directory area, we have continued the PARADISE project for the first half of 1994. In this project, we are coordinating the work on piloting distributed X.500 directory services. During the current year, the main aim was to transition the activity towards services that would be run either commercially or as normal computer service activities. For this reason, we have concentrated on a number of important improvements which were required in order to run a proper service. It is hoped that this will now be taken up by commercial interests. We have still pursued a number of applications of directories; one promising one is the use of X.500 for document catalogues. This has been found quite feasible, and a pilot system has been set up in collaboration with Brunel University under British Library R&D Department support. Finally, we have taken a few research aspects of Directory algorithms for further research; in particular, we have endeavoured to find out how users normally make queries of directories, and are trying to devise algorithms to provide better searching facilities and response times.

We have continued our work on Open Systems Security. Here again much of the work on the basic toolkit has been transitioned to NASA, though we have also agreed with the ISODE Consortium that it will be integrated into future releases of their system. We have been concerned in having the toolkit support smart cards, and high integrity CAs. There is a major activity in support of a US Navy testbed of their Defence Message System - which is a variant of X.400 message services, supported by X.500 directories. We have provided a pilot system for the secured directories required, which required support for the FORTES card, for securing specific attributes in the Directory with strong authentication, and secured versions of our management tool OSIMIS. These advances will have much more general applicability. We have completed also some applications which are considered more user friendly than before; in particular we have concentrated on a version of PEM/MIME and a secured version of ODA. Finally, we continued to pilot OSI security services - in conjunction with our PASSWORD partners in France and Germany. We will continue our work with secured directories next year, and apply the same techniques to secure conferencing

Another aspect of our OSI work is with the Open Document Architecture (ODA) standard. Again the work on the basic ODA editors and utilities has ended; we have concentrated on ODA applications. The vehicle for this has been, for some years, a large database of the journals of the American Chemical Society. This database is now largely complete, and is available in several forms - including ODA. In the process we have had to develop an SGML ODA converter, because the text was provided in the SGML form. During the year, we have both been starting to provide prototype services to users, and also to develop further certain key ideas. One interesting advance, was to build a Hypertext mechanism into the search engine, and then to attempt to guide the user in choosing paths through the system based on the search pattern of previous users. Finally, we felt that a key requirement for larger scale adoption of electronic journals was the inclusion of much more sophisticated auditing and integrity control; we have therefore integrated such techniques into a version of the ACS database. Our approach relies heavily on the securing of ODA, which is a development of the work in the previous paragraph.

We continue to support ULCC and several other organizations (including DRA) on running X.400, X.500 SMTP and other services. This work is now reported under "Services" rather than the research of this chapter.

Directory Services

PARADISE

Paul Barker, David Goodman, Peter Kirstein and Kevin McCarthy

The PARADISE project has provided coordination and central support for world-wide X.500 activities for the past three years. UCL's partners in PARADISE are RARE (Reseaux Associes pour la Recherche Europeenne), University of London Computer Centre (ULCC), Nexor and INRIA (Institut National de Recherche en Informatique et Automatique). One of the goals of the project is to pave the way for a full X.500 service, funded by national networks and service providers.

Until April 1994, when the project ended, we continued to provide a number of core services:

INRIA and Nexor have worked on directory interoperability problems, and INRIA have published several reports describing both details of specific problems, and a general critique of awkward areas for interoperability. In particular they have:

UCL has continued to enhance the user interface DE and the BLT bulk loading tools. There have been two major enhancements to DE. First, the UFN (user-friendly name) search algorithm has been completely over-hauled, and is now more successful at finding entries. Second we have enhanced the QOS feature whereby users are informed when they are searching slow or unreliable parts of the directory. The major change to BLT has been to improve its name matching algorithm, so that less manual intervention is required when updating from large data sets. These changes were made following extensive use of the tools by Brunel University.

UCL and Nexor have also played a leading role, along with EEMA, in establishing the European Directory Services Group (EDSG). David Goodman of UCL was elected chair of this group. The EDSG is a coordinating body for software suppliers, service providers and user organisations to develop full-scale X.500 services.

Use of X.500 for Document
Catalogues

Paul Barker and Peter Kirstein

The ABDUX (Accessing Bibliographic Data Using X.500) project, funded by the British Library, continued its investigation into the use of the X.500 Directory for a number of bibliographic purposes. Design work for the project was completed in 1993, and in the past year the emphasis of the project has been to assess the project designs by building a system with larger amounts bibliographic data and allowing more users to try the system.

In practice we were unable to build as large a system as we would have liked. Limitations inherent in our implementation of X.500 prevented us from building very large scale databases, without incurring very poor performance. However, we succeeded in creating a catalogue of almost 50000 entries, comprising thirteen collections of computing research papers, and four maths and computing book extracts from London university library catalogues. Recent versions of the X.500 software appear to remedy the problem of scale, and the ABDUX system still needs to be tested on very large amounts of data.

Throughout the project, we sought to demonstrate how X.500 could be used for bibliographic purposes. In practice, solutions to networked information retrieval problems are using a variety of tools. ABDUX concluded its work by proposing a possible model for bibliographic information retrieval. The key features of this proposal are as follows. Users would access the system using a World Wide Web client. X.500 would be used as a directory to guide users to the most appropriate resource databases. It would also be used to distribute the queries. For non-X.500 databases, the queries would have to be mapped in gateways onto the local database access protocol, such as SQL or Z39.50. However these gateways are a small price to pay for such a powerful distributed system.

X.500 White Pages Directory
Querying Algorithms

Paul Barker and Graham Knight

The DE white pages directory user interface (DUI) was written during the PARADISE project to provide ordinary users with a simple yet powerful querying tool. The design of this interface was strongly influenced by observing what existing DUIs often did badly: there were often too slow, found poorly matched entries ahead of good matches, and often did not find entries when they should have done. While DE was a considerable improvement over previous DUIs, a number of elements of the design were based on little more than hunches about the way the directory service and its users behaved.

Our research is in two broad areas which, taken together, will allow us to design improved querying algorithms, with a clear view of the trade-offs between querying power, response times and the load on the directory. First, we are examining in detail how users formulate their queries. Of particular interest is how closely the names users type correspond with the names held in the directory. We have found that the match between users' input and directory names is generally quite poor, with any given name component having about a 25% chance of matching the name in the target entry. However, most directory entries also have alternative names, and their presence is effective in increasing this exact match ratio to over 50% for name components. We are also interested in analysing the problems where queries fail to find entries in the directory. We have been able to identify a number of simple transformations of users' input which will reduce the number of query failures.

Second, we are analysing the response times of the read, list and search operations, in particular the many different variations of the search operation. One example of our work is determining the best policy for evaluating a combination of exact match, substring match and approximate match search filters: is it by a boolean 'or' of the three filter items, a sequence of three searches (where often the second and third searches will not be required), or searching asynchronously with the three filters being evaluated in parallel? We hope to have the results next year.

Open Systems Security

The Security Toolkit

Yoshiki Sameshima, Andrew Young and Peter Kirstein

With Peter Williams moving at the end of 1993 to NASA, much of the development of the OSISEC Toolkit has continued during the year from there - but still in close collaboration with UCL. The toolkit provides support for the generation, distribution and application of keys for use during presentation of security credentials in protocol exchanges. It also provides tools for the setting up and operation of Certification Authorities. Version 2.3 was released at the start of July with improved structuring, hardware signing support and additional applications. Version 3 is currently under development (led by Sterling Software under contract to NASA)

OSISEC v3 has been restructured to allow it to take advantage of Solaris 2.3's kernel support for symmetric multiprocessing. All routines are now reentrant, and threading has replaced the use of global variables. Version 3 also provides added facilities in terms of maturity, with respect to a standard API; and usability, especially regarding handling of the multiple personalities possible with advanced smartcards such as FORTESSA. It has also adopted the Mosaic certificate format.

Versions 2.3 and 3 are currently aligned to the ISODE Consortium release ICR1.1. Work to port them to the latest release ICR2 and beyond is currently in progress. Version 3 has also been ported to the HP platform in order to support it's use by the US Navy.

Secured Directories

Yoshiki Sameshima and Andrew Young

During the year, UCL has continued to work alongside the US Navy. In its National Software and Telecommunications Centre in Washington, the Navy is running a pilot project to develop a secured version of a directory service as part of the US DOD's Defence Message System project. This has involved setting up a large pilot installation with over 60000 entries over several DSAs. UCL is operating a parallel system to provided support for the setting up and maintenance of the system and has provided advice on design issues relating to DUAs. Also, applications have been ported to the latest versions of OSISEC and of ISODE Consortium software, and the impact of these has been assessed.

An important part of the Navy Pilot is its use of the FORTESSA card, which contains the NIST DSA algorithm for public key authentication. The use of this card with OSISEC was one of the motivations for Version 3. Because certain flaws were found in the early versions of the cards (which are the ones UCL has so far), it will be necessary to have further development of OSISEC to accommodate the newer cards as they become available.

The Navy Pilot requires considerable flexibility in the way the Directory must provide access to individual attributes. It is also necessary that the Directory Service Agent can require strong authentication, and that it is able to sign results of searches if so requested by the User. Further work has been done on the recent versions of the Secure DSA to allow these facilities to be included. Finally, in the Navy Pilot, it is necessary to manage the MTAs of a secured message system; for this reason, a secured version of the OSIMIS management system has been developed (by making calls from it to OSISEC); this secured version has also been provided for the US Navy; this work has been discussed in Section 5.2.6.

Piloting Security

Yoshiki Sameshima, Andrew Young, David Goodman and Peter Kirstein

The PASSWORD project finished at the end of June and the pilot sites have been supported, where required, throughout the year. A meeting of all pilot sites was held in London in January, and one of the project deliverables describes their experiences and lessons learnt. The security infrastructure remains in place, and UCL continues to operate the European Top Level CA and the UK Policy CA as well as an internal departmental CA

The final release of software was made to pilot sites at the beginning of July. This comprised version 2.3 of OSISEC and an improved set of secure applications. This has been made available on request to users in eight countries, and is being used commercially by two companies for product development.

The principal activity on the secured applications during the year has been in the area of usability. The PDOCSEC filters which apply security to parts of ODA documents have been integrated into a proprietary compound document editor (the BBN Slate editor) thereby providing for the first time the facility to secure parts of a document differently for different addressees. The Privacy Enhanced Mail (PEM) filters have been built into a custom X Window front end which fully integrates PEM functionality with normal mail user agent functionality. This type of seamless integration into security aware user agents is essential if user reluctance is to be overcome. There has also been some extra functionality with the addition of a PKCS#7 Cryptographic Message Syntax library and implementation of one form of PEM-MIME integration. Full PEM-MIME integration is expected to be provided in the future. It is still our impression that, while take-up of the toolkit has been encouraging, there has been little serious use of the secured applications. In particular, the lack of an X.400(88) infrastructure and the limited number of PEM certificates make systematic use of secured email impossible. Where it can be used, it often has an unacceptably large runtime overhead. There are also still considerable regulatory political problems in the export, and sometimes even in the usage, of secured products. Large-scale deployment plans must await the resolution of this type of political quandary.

Commercial interest in the PASSWORD results is evidenced by the partners having obtained significant development and deployment contracts for sizable security pilots, both in the defence and commercial sectors. The major deployments of OSISEC are its use in the US Navy's proof of concept pilot for the Defence Message System (DMS); and use by NASA's Advanced Network Application group, including collaboration plans with the US Postal Service to set up a national Certification Authority for providing the general public with certificates for use in IRS and FAA applications.

Work with the ACS Database

The ACS Database

Peter Kirstein, Michael Lesk and Goli Montasser-Khosari

Having an on-line database of scientific journals offers many advantages over the conventional paper-based journals; many of these advantages fall into the areas of search and access. Electronic searching texts for information is much easier than manual; far more productive searching can be undertaken using a computer system. Electronic access provides additional advantages. Firstly access is non-exclusive - any number of people can access the same journal simultaneously. Second, access is distributed, so it is not necessary to be in close proximity to the database in order to access its information. Third access can be integrated with the users' facilities, so that it is possible to extract information for other purposes - always subject, of course, to consideration of copyright. We have set up a document database which can be queried in a convenient manner, and allows the user to browse the results on-screen using a number of different tools. We have provided facilities for end-user chemists to access the database at various location within University of London.

The database consists of the contents of the journals of the American Chemical Society (ACS) - including both text and image. The data has been provided by Bellcore (via Mike Lesk, a visiting Professor); the text comes from a processed form of the original typesetting tapes, and the images are derived from scanning the original journals. The text data is provided in SGML, and the image in a form of compressed facsimile. Eventually there will be 10 years of data - producing about 4 GB of SGML text, and about 50 GB of image data scanned at 300 dpi. The UCL project C-ODA funded by British Library Research and Development Department mirrors a similar one called CORE in the US (centred on Cornell U, but with Bellcore, OCLC and the ACS involved). The CORE project has similar aims, but is really concentrating on the access to the data in a bitmap form; it is a hypothesis of UCL that ODA - the ISO Open Document Architecture (since ODA is a blind open interchange format for which a number of converters are available) is a particularly convenient form for storing and accessing the complementary parts of the data. The second difference between the work being done in the Cornell University CORE projects the USA and the UCL center is the network access. The CORE project is concentrating upon high-bandwidth, local area networks which can deliver rapidly large amounts of data; there is a significant emphasis upon the bitmap representation of the journals. At UCL-CS we are particularly interested in widening the scope of the project to include remote access to the document database; this involves relatively low-bandwidth communications - for example, Basic-Rate ISDN lines operating at 64 Kbps.

The access methods include the use of WAIS (a text retrieval system abased on ANSI Z39.50), Xpixlook (a browser from Bellcore used with image data) and SuperBooks (a hypertext browser also from Bellcore). The data is stored in a 90 GB jukebox; the text is fully indexed, and is stored on normal magnetic disc - because of the comparative slow access time to data on the jukebox (at worst 15s).

With the integrated ODA format, it is necessary to store only one database; Bibliographic searches can be made on the ODA DMA attributes, and images files are compatible. We have compared the bitmap and compound document forms, because many of the publisher-driven projects are delivering only the bitmaps. We find that the total ACS database will be about 12.5 GB in compound document form - and about 50 GB at 300 dpi bitmap. Moreover, the users actually prefer the compound document form, because it allows different search patterns - e.g. ones based on viewing small versions of the diagrams or just looking at highlighted text. Moreover, while it is easy to browse through bitmaps of documents on a LAN, it is quite impractical to do so over switched WANs like the ISDN. Using the ISDN with the ODA format, we are able to scroll through pages at the rate of 1 - 3 secs/page. By overlapping the transmission with browsing, very acceptable browsing rates can be achieved over the ISDN.

During the last year, we have set up worksations in the Chemistry Department and in a number of libraries in the University - particularly at King's College, but to a lesser extent at UCL and QMW. The chemists have started to access the journals via our workstations, and we have noted the user feedback to improve our facilities.

We will be completing the provision of the database, and having much more extensive trials with users, over the next year. We will be looking at the Commercial Wais server which is much better than the public domain version which we been using. It allows much larger database, incremental indexing and deleting, MOSAIC access. We will be investigating the journal access via Mosaic.

SGML-ODA Conversion

Goli Montasser-Khosari

As part of the C-ODA project, we had a requirement to convert a large mass (approximately 100,000 articles representing 500,000 pages of technical papers) of SGML documents into an ODA representation - unlike SGML, in which the interchange is dependent on the DTD. All of the ACS SGML documents share the same DTD, and so we had a range of options for writing our SGML to ODA converter.

During the last year, the SGML->ODA converter has been updated for fonts and tables. The new SGML database have been reprocessed to produce a FOD26 database. The raw data has been reprocessed - to eradicate earlier errors in which figures were incorrectly identified in the processing of the scanned document images. The complete text data for 1980-1994 and image data for 1991-1992 have now been made available by Bellcore - about 4 GB of text and 15 GB of image data. This has been processed into ODA by UCL, and we have some operating experience both of generating the database and its use.

Enhanced SuperBook: An Intelligent Hypertext System

Michael Hu (supervised by Peter Kirstein )

This reseach investigates the applications of machine intelligence in information generation, organization, manipulation, search and retrieval.

In order to alleviate and solve some problems in the present information retrieval (IR) community, and to increase the efficiency and effectiveness of information systems, a new data structure is proposed. The conceptual index is external to the collection of information components (documents). It integrates the conventional global index with a special semantic network. As a result, a much richer set of concepts, as well as the relationships between concepts, can be represented in the data structure. It is shown in our system that such a data structure is more suitable for sophisticated IR environments such as hypertext, and could make automatic information generation, self-adjustment and evolvement, inferencing and reasoning possible.

Based on the conceptual index, a soft-link hypertext model is developed and investigated. The soft-link hypertext model covers the Boolean search model, the probability model, the traditional hard-link hypertext model and the soft-link hypertext in one infrastructure. Its main features include automatic generation of the conceptual index, self-adjustment and evolvement, user-centered services for information retrieval and application of machine intelligence in all aspects of the model.

The soft-link hypertext model, including its state-space and all operations occurring in the space, is fully presented and evaluated in our research work. It is implemented in a soft-link hypertext system called the Enhanced SuperBook and assessed in several small-scale controlled IR experiments. Its strengths, weakness, similarity with, and difference from other information models and systems are also studied in depth.

Based on such an investigation, our project concludes that many aspects of information retrieval should benefit from the extensive application of machine intelligence. Automation in information generation, organization, self-adjustment and evolvement, and assistance for information retrieval can not only increase the efficiency and effectiveness of the information systems, but also represents the essence of, and should have great impact on, future generations of IR systems.

Securing Electronic Databases

Phillip Goudal, Peter Kirstein, Goli Montasser-Khosari and Yoshiki Sameshima

Several years ago, we developed in the department a system called DOCSEC; this provided facilities for securing whole documents. The system provides authentication, key distribution and content integrity by use of public key cryptography; it also uses DES encryption for providing content confidentiality. The mechanism used was consistent with the Security Addendum of the ODA Standard - but with certain simplifications to deal with securing only complete documents.

As part of the C-ODA project, funded by the British Library Research and Development Department (BLRDD), we are providing a large database of ODA documents, with facilities to allow the documents to be searched and browsed by chemists in a convenient manner, using a number of different, on-line tools. The complete 1991, 1992 collection of ACS journals was used. As part of this project, we are setting up part of the database as a secured ODA document database. The access methods use the Wide Area Information Service technics (a text retrieval system based on ANSI Z39.50) - but with certain modifications to deal with access control. The system has been integrated with the BBN Slate document editor. The database's access environment is able to enforce personals access rights for read and search operations founded on strong authentication of the users identity - using the OSISEC security toolkit. This allows us also to build up a non-repudiable audit trail for later charging if this were desired. The service also enables users to trust the authenticity and integrity of the information retrieved, by providing facilities for the database server to "seal and sign" the data in the database. This allows the database user agents to check the integrity of the information retrieved.

Using the Public Domain WAIS software, the client functionnality has been improved to work with the WAIS server which manages the secure database. The Secure WAIS Database Client is a powerful client, which offers users access to the full range of Secure WAIS Database Server services. From a security point of view, the Client/Server environment covers the following requirements:

There are further facilities which may be invoked optonally (in order to offer reasonable look-up speed):