Search for contacts, projects,
courses and publications

TagMatch: Fast Partial Matching for Content Filtering and Routing

People

Leaders

Carzaniga A.

(Responsible)

Collaborators

Rogora D.

(Collaborator)

Abstract

This proposal is based on the notion of an information centric network in which files, streams, short messages, and even individual data packets are described and thereby addressed through sets of "tags". Routing and forwarding in such a network require an efficient solution of a problem known as partial matching. This problem is also essential to many other applications beyond information centric networking. We propose to revisit the problem from a theoretical perspective and then to study it more extensively from a systems perspective. An information centric network is intended to deliver information of interest to a user--either when the user requests it explicitly (request/reply) or as soon as information becomes available that matches the user's long-term interests (publish/subscribe)--but without requiring that the user specify the location of that information in the network. Thus an information centric network allows users to directly address information rather than network locations. One way to address information is by name, within a flat or hierarchically structured name space. This is the approach taken by many researchers in information centric networking. We propose a more expressive network service in which users describe information. In particular, we propose to use expressive descriptors consisting of sets of tags. Tags are already a very common and intuitive way of annotating and retrieving information in personal databases and on the Web. We propose to adopt them as a form of addresses. For example, if a user is interested in {quantum, physics} and {jazz, music} then the network should forward to that user information tagged as {montreaux, jazz, festival, music, program, 2014}. At the heart of a tag-based information centric network, or in any other information system that selects and disseminates information based on tags or features, is a matching engine that determines which information descriptors match which user requests or interests. We propose to study and develop highly scalable solutions for this matching problem. Our long-term objective is to be able to support tag-based information selection and routing at the scale of the current Internet. This means supporting hundreds of millions of users world-wide at the throughput of current high-end routers, which means hundreds of millions of matching operations per second. Notice, however, that IP forwarding amounts to prefix matching. By contrast, matching tag-based descriptors amounts to subset matching (or partial matching) and is considered a fundamentally more complex problem. Indeed, this complexity is what accounts for the greater expressiveness of tag-based descriptors. Our approach is to first aggregate the sets of tag sets that represent the routing information base, and then to use a staged and highly-parallel architecture for the matcher. Address aggregation is an essential component of IP networking and plays a similar role in tag-based information centric networking. The specific research questions regarding aggregation are whether tag-based descriptors would aggregate well enough to significantly reduce the amount of descriptors that must be processed by the matcher, and then how to effectively perform this aggregation, which again involves partial matching. Therefore, the central contribution of this project will be a highly scalable system for tag-based information centric networking. We propose to integrate general-purpose CPUs with highly-parallel modern GPUs and also FPGAs and TCAMs. The challenge will be to design a distributed matching algorithm across these heterogeneous computing units and then to implement it so as to (1) assign to each unit the most appropriate algorithmic function to best exploit its specific computational capabilities, and to (2) balance the workload induced by the global matching algorithm onto each unit, so as to optimize the overall throughput. This project is an integral part of a broader ongoing effort to realize the innovative idea of an Internet architecture based on the direct addressing of content by both consumers and producers.

Additional information

Start date
01.02.2015
End date
31.01.2018
Duration
36 Months
Funding sources
Status
Ended