Chapter 2 LITERATURE REVIEW
The previous chapter explained the importance of investigating the problem of effectively supporting personal document management. This section provides a critical review of the current knowledge, practice and theory in this area.
Section 2.1 defines and describes personal document management, setting out the basic concepts and terminology that will be used throughout this thesis and positioning this research in the context of the field of Human-Computer Interaction.
Section 2.2 reviews empirical studies conducted of personal information management practices, with a particular emphasis on the studies that have involved examinations of document management. The lack of information specifically about document management will be highlighted.
Section 2.3 reviews a number of prototype systems that have been developed to support personal document management, as well as examining the capabilities of a few notable commercial systems.
Section 2.4 discusses theory relevant to personal document management, noting the lack of theory directly relevant to personal document management. This section will include theory from related fields such as psychology, library science and information retrieval
TERMS AND DEFINITIONS
This section defines the key concepts that will be used throughout this thesis relating to document management and provides definitions for all the key terms used.
One of the problems with the current state of HCI research into everyday activities is a lack of agreement on common terms and definitions (Whittaker, Terveen, & Nardi, 2000). For this reason, this section will start by reviewing definitions of personal information management (PIM) and from there develop a definition of personal document management as a subset of PIM
Personal Information Managemen t
Personal Information Management has several definitions. Bellotti, Ducheneaut, Howard, Neuwirth and Smith (2002) emphasise categorisation for later retrieval when they define it as “ordering of information through categorization, placement, or embellishment in a manner that makes it easier to retrieve when it is needed”. Likewise, Lansdale (1988) defines it as “the methods and procedures by which we handle, categorise and retrieve information on a day-to-day basis”, although he notes that retrieval is not the only purpose for which information is handled and categorised.
Barreau (1995) defines a personal information management system as “an information system developed by, or created for, an individual in a work environment”, elaborating on the five functions the system must provide: acquisition, organisation, maintenance, retrieval and presentation. The emphasis in this definition is on the system for supporting PIM, rather than the user or the user’s activities.
Boardman (2004) refines this definition into “the management of personal information as performed by the owning individual”. He elaborates on management as consisting of acquiring, organising, maintaining and retrieving, and explicitly conceptualises PIM as a user activity rather than a system activity. Figure 1 below shows this model of Personal Information Management.
Ownership in this sense implies that the person has control over the document. The owner need not be the author of the document – in fact many people manage documents that they have acquired from others, or work on collaboratively with others. As long as the person has the ability to manipulate their own copy of document (e.g. the ability to rename it, move it or delete it), they are considered to own that representation of that document.
This definition of PIM will be used as the basis for formulating a definition of personal document management in the following section
Personal Document Management
This section will construct a definition of personal document management step by step, through examining and then combining the component terms personal, document and management.
P e r s o n a l
There are two possible interpretations of the word personal in personal information management.
The first interpretation is that personal refers to information about an individual. For instance, information an organisation might store about a person, such as their address, date of birth, credit history. This type of information is often personally identifying and may be sensitive. It is usually not controlled or managed by the person the information is about, and hence is often the subject of privacy concerns.
The second interpretation is that is it is information owned by and under the control of the individual. This is the sense we use when we refer to ‘his documents’ or ‘her email’. The individual has the ability to add to, change or delete this information at will.
This second interpretation is the meaning used in the context of personal information management, and therefore is the meaning that will be used in this research.
Defining a document is surprisingly difficult (Buckland, 1997). A typical dictionary definition (Dictionary.com Unabridged, v1.1) is:
- A written or printed paper furnishing information or evidence, as a passport, deed, bill of sale, or bill of lading; a legal or official paper.
- Any written item, as a book, article, or letter, esp. of a factual or informative nature.
- A computer data file.
These definitions cannot readily translate into a definition of digital documents. The first definition is problematic in that it focuses on paper as the medium, which would exclude digital documents. The third definition is also problematic, since many computer files represent executable programs or program libraries and components and therefore would not be considered by most people to be documents. The second definition focuses on writing of a factual or informative nature. While factual and informative are debateable (fiction? propaganda?), the main problem with this is in the emphasis on writing. This rules out audio books or images being considered documents, and probably also eliminates spreadsheets and presentations.
The International Telecommunications Union has a definition of a document in a digital context in their Open Document Architecture standard: “a structured amount of information intended for human perception, that may be interchanged as a unit between users and/or systems.” This definition makes no mention of the physical format, meaning that a document can be either physical or digital, and also doesn’t constrain the document to be textual; merely that it is a structured amount of information. It also contains the important idea that a document is a unit that is packaged for a human audience, rather than structured for computer access like a database. Documents can be either static in nature – they are not changed or modified after their initial creation, or dynamic, being continuously or regularly changed or updated over a period of time. This definition includes both types of documents. This definition of document will be used throughout this thesis.
It is important to elaborate on the difference between a document and a file. A document is a logical and human-meaningful package of information. A file is a computer science context is a collection of related data stored as a unit with a single name (The American Heritage® Dictionary of the English Language, 2004). While each document is usually stored in a single file, they are not synonymous. A user may write a report in Microsoft Word which he saves as a file called report.doc. He may then create an updated version of this and save it as a file called report2.doc, and then save a copy of this report in PDF (Portable Document Format) as finalreport.pdf. These are three separate files but they represent a single logical document. Also, a user may split a single logical document such as a long report into multiple smaller files each representing a section of this report. This means that there is not always a direct 1:1 correspondence between a document and a file, particularly with dynamic documents that are modified over time.
M a n a g e m e n t
Managing refers to directing or controlling the use of something (The American Heritage® Dictionary of the English Language, 2004). In this context, it refers to the activities and actions involved in controlling and using personal information. Following Boardman’s (2004) modification of Barreau’s (1995) management activities, these are the sub-activities that make up management:
Creation and Acquisition. Documents enter a collection either by being created by the user or being acquired from another source. Common sources include receiving documents via email, downloading them from the web and copying them from another computer or memory device.
Retrieval. Documents are retrieved from a collection periodically in order to use them for various purposes, including reading, editing, sending to others, printing or performing organisation and maintenance activities with them. Document editing, printing and use as attachments in an email system are not part of the scope of document management.
Organisation. Organisation refers to the process of arranging and categorising documents, as well as applying metadata to documents. This includes renaming of documents and folders.
Maintenance. Maintenance activities include deleting documents that are no longer needed, making backup copies of documents, and moving documents that are no longer regularly used to an archive storage location.
While in theory it is possible to ‘manage’ a single document, in practice the subject of the management efforts is a collection of documents. Also, the subject of the document is not important. They may be work related documents, but they equally may be documents related a hobby or private information
Personal Document Management
From the combination of the above definitions, we can reach the following definition of personal document management:
Personal document management is the activity of managing a collection of digital documents performed by the owner of the documents.
The unit of analysis in personal document management is an individual user and the collection of digital documents he or she owns. The process of document management incorporates the creation/acquisition, retrieval, organising and maintenance activities described above, provided they are performed by the document owner. Personal document management is an activity that is performed intermittently, embedded in the daily life of users.
Comparison to related terms
It is helpful in coming up with a definition of personal document management to compare it with similar terms and concepts.
Persona l Info rmation M anagement
Personal information management is the parent term of personal document management. The primary difference is in the broader definition of information which encompasses any type of digital information (e.g. emails, tasks, calendar items, contacts) in contrast to the focus on documents as a subtype of information
Informa tion Mana geme nt
Information management is “the application of management principles to the acquisition, organisation, control, dissemination and use of information relevant to the effective operation of organizations of all kinds” (Wilson, 2003). It is distinguished from personal document management through both the emphasis on the organisation rather than the individual, and through the application to all forms of information, structured and unstructured, rather than only documents.
Genera l Informa tion Ma nagement
The term General Information Management has been used to describe the type of information management performed by librarians or other professionals to organise and manage information on behalf of others (Bergman, Beyth-Marom, & Nachmias, 2003). Organising and managing information so that it is applicable for a large number of people with differing requirements is a different problem from managing information for an individual’s own personal use.
The term Document Management can be defined as “the process of overseeing an enterprise’s official business transactions, decision-making records, and transitory documents of importance, which are represented in the format of a document” (Sutton, 1996). It usually occurs in an enterprise context, overlapping with the term enterprise content management. It differs from personal document management in that the focus is on an organisation, and that usually the content of a knowledge management system has been structured using a taxonomy or ontology by a librarian or information architect (making it a type of General Information Management). The goal is to find a taxonomy that encompasses all of the content across the organisation and that is generic enough that all employees can use it.
Know ledge Managemen t
Knowledge Management is the process of identifying and leveraging the collective knowledge in an organisation in order to improve an organisation’s competitiveness (Alavi & Leidner, 2001). It is similar to Document Management in that both are at the level of the organisation. Knowledge management is a broader term and usually includes document management plus measures to capture tacit knowledge, make visible the knowledge resources in an organisation and to reuse various forms of knowledge.
Informa tion Retrie val
The focus of Information Retrieval (IR) is dealing with the representation, storage and access to information items (Baeza-Yates & Ribeiro-Neto, 1999). The term is usually used in the context of locating information in large information sets such as the internet or digital libraries, and the focus has historically been on classification and indexing (Järvelin, 2003). However, IR is an integral part of document management
1.1 Research Problem
1.4 Research Approach
2.1 Terms and Definitions
2.2 Empirical studies of Personal Information Management
2.3 Prototypes and Systems
2.4 Theory Related To Personal Document Management
3.1 Research Questions
3.2 Type of Research
3.3 Research Strategy
3.4 Data Collection Methods
3.5 Participant Selection
3.6 Ethical Considerations
INTERVIEWS AND FILE SYSTEM SNAPSHOT
4.1 Study Design
4.2 Interview Results
4.3 File System Snapshot
4.4 Document Management Strategies
5.1 Survey Design
5.2 Survey Results
5.3 File System Snapshot Results
6.1 Conceptual Model validation and refinement
6.2 Document Management System Capabilities
7.1 Summary of Research Problem and Approach
7.3 Limitations And Future Work
GET THE COMPLETE PROJECT