
        IMS Open Corpus Workbench (CWB)
        Release 3.5 BETA

        Installation Guide


This file describes how to build and install the CWB from source code.  Binary
packages for popular platforms are available from the CWB homepage

    http://cwb.sourceforge.net/

together with detailed installation instructions.

The CWB development wiki 

    http://cwb.sslmit.unibo.it/

offers hints for building the CWB on specific platforms and addresses some
common problems.  If you encounter a problem that you cannot solve with the
information provided on the wiki, you should join the CWBdev mailing list

    http://devel.sslmit.unibo.it/mailman/listinfo/cwb

and ask your question there.

        PREREQUISITES

 - any modern Unix flavour (must be POSIX compatible)
 - GCC 3.3 or newer recommended (other ANSI C compilers might also work)
 - the ar and ranlib code archive utilities
 - GNU make or a compatible program
 - GNU bison & flex for updating automatically generated parsers
 - the pkg-config program
 - the ncurses library (or a similar terminal control library)
 - the PCRE library (see notes below)
 - the Glib library (see notes below)


        RECOMMENDED SOFTWARE (not essential)

 - GNU install or a compatible program
 - Perl with pod2man script for rebuilding manual pages
 - GNU less pager for interactive display of query results in CQP


        QUICK INSTALLATION

[There are one-step setup scripts for some operating systems: see 
"AUTO-INSTALL SCRIPTS", below. You will also find specific instructions
for Windows and Mac OS X in sections "BUILDING WINDOWS BINARIES" and
"INSTALLATION GUIDE FOR MAC OS X". Otherwise, follow the instructions here.]

Edit "config.mk", selecting suitable platform and site configuration files
for your system (available options are documented in "config.mk").  You
can also override individual settings manually there.  If you cannot find an
appropriate configuration, see WRITING YOUR OWN CONFIGURATION FILES below.

Now, to compile the Corpus Library, CQP, CQPcl, CQPserver, and the
command-line utilities, type

        make clean
        make depend
        make all

If your default make program is not GNU make, you may have to type "gmake"
instead of "make" (the current Makefile system only works with GNU make and
fully compatible programs).  To install the corpus library, all programs, and
the man pages, type

        make install

Note that you must have write permission in the installation directories in
order to do so (usually the "/usr/local" tree, but site configuration files may
specify a different location with the PREFIX configuration variable).

You are now set to go.  If you are new to the CWB, you should read the 
"Corpus Encoding Tutorial" and "CQP Query Language Tutorial" available from
the CWB homepage.  You may also want to install pre-encoded sample corpora
for your first experiments.

If you want to make sure that all automatically generated files are up to
date, you should type

        make realclean

before starting the build process.  This will update makefile dependencies,
the generated bison/flex parsers and all man pages.  Note that this will only
work if the recommended software is installed (bison, flex and pod2man).


        AUTO-INSTALL SCRIPTS

There are now configuration/installation scripts for common Linux
systems - note that these are single-step ALTERNATIVES to following the
instructions above.

The Linux variants with specific scripts are currently Ubuntu and Fedora.

The Ubuntu script will probably work on other Debian variants, and the
Fedora script will probably work on other RPM-based Linux distros.  

From the main CWB directory (the one containing this INSTALL file), run

        sudo ./install-scripts/cwb-install-fedora
or
        sudo ./install-scripts/cwb-install-ubuntu

These must be run as root (e.g. with sudo as shown above). Here's what these 
scripts do for you:

 - downloads and installs all prerequisite software packages
 - sets up the configuration file ("config.mk")
 - compiles CWB from the source code
 - installs the CWB programs to the "usual" place on your system. 
 
After running these scripts, you are ready to start using CWB.

For recent Mac OS X releases, a semi-automatic install script is provided,
which requires either the HomeBrew or the MacPorts package manager to be
available. Use the package manager to install the prerequisite external 
libraries, then run

         sudo ./install-scripts/cwb-install-mac-osx

This script will check whether all prerequisites are satisfied and prompt
you to install missing libraries; then it will compile and install CWB.

However, it is recommended to follow the step-by-step instructions in
section "INSTALLATION GUIDE FOR MAC OS X" for a trouble-free install process.

If you are on another common system (e.g. SunOS, Cygwin) for which there
isn't yet an auto-install script, you can still take a shortcut by using the 
autoconfigure script:

        ./install-scripts/cwb-config-basic

This removes the need to manually edit "config.mk": you can go straight to
compiling.

Note that the autoconfigure/auto-install scripts may not work if you are
using Linux on an opteron system. The autoconfigure script is also unable
to distinguish most variants of Darwin and will mostly use the 
Darwin-universal configuration even if a more specific configuration file
exists. In this case, manually editing "config.mk" may be better.


        WRITING YOUR OWN CONFIGURATION FILES

If you cannot find a suitable platform and site configuration files, or if 
you need to override some settings and expect to install future CWB releases
on the same system, you can write your own configuration files.

All configuration files can be found in the "config/platform/" and
"config/site/" subdirectories.  A listing of configuration variables with
short usage explanations can be found in the template files (aptly named
"template") in these directories, which provide good starting points for
your own configuration files.  In many cases, the easiest solution is to 
make a copy of a sufficiently similar configuration file and add your own
settings, or to inherit from this configuration with an appropriate 
"include" statement.  The "linux-*" and "darwin-*" configuration files in
the standard distribution are good examples of this strategy.

It is recommended that you store your personal configuration files in a
separate directory outside the CWB tree, so you can easily re-use them with
future versions of the software.  You just have to modify the "include"
statements in "config.mk" to use absolute paths to your configuration files.
If your configuration files inherit from standard configurations, use include
paths of the form "$(TOP)/config/...".


        BUILDING BINARY RELEASES
        
If you want to create a binary package for your platform, type

        make release

This will install the CWB locally in a subdirectory of "build/" and wrap it
in a ".tar.gz" archive for distribution.  The filename of this archive (which
is the same as the installation directory) indicates the CPU architecture and
operating system which the binary package has been compiled for.

It is recommended to select a site configuration named "*-release", which will
build statically linked programs if possible (some operating systems do not
support static linking). Note that individual settings for installation
directories (except for the general PREFIX) and access permissions will be
ignored when building a binary release.


        BUILDING SOURCE RELEASES

In order to "clean up" the source code tree for a standard source distribution,
the recommended command sequence is

        make realclean
        make depend
        make clean

This will remove all automatically generated files, and then recreate the 
makefile dependencies and bison/flex parsers, so that the CWB can be compiled
from source with a minimal set of prerequisites.


        BUILDING RPM PACKAGES FOR LINUX

In order to create a binary Linux distribution in RPM format, edit the file
"rpm-linux.spec" as necessary, then copy the sourcecode archive (whose precise
name must be listed in the "Source:" field of the RPM specification) into
"/usr/src/packages/SOURCES", and run

        rpmbuild -bb --clean --rmsource rpm-linux.spec

The ready-made binary RPM package will then be available in the appropriate
subdirectory of "/usr/src/packages/RPMS/".  It may be necessary to select the
appropriate Linux configuration (e.g. to build a 64-bit version of the CWB) in
"config.mk" and rewrap the source archive before building the RPM package.
Otherwise, the build process will automatically select the generic Linux 
configuration for standard i386-compatible processors.


        INSTALLATION GUIDE FOR MAC OS X

On recent versions of Mac OS X (10.6 "Snow Leopard" and 10.7 "Lion"), it is quite
easy to compile and install CWB release 3.4 and above.

 - Lion:         install XCode from the Mac App store (XCode 4.2.1 or newer)
   Snow Leopard: install XCode from https://developer.apple.com/xcode/ (XCode 3.2.6 or newer)
                 [you will need a free Apple developer account to download XCode]

 - install HomeBrew package manager from http://mxcl.github.com/homebrew/

Make sure that no other package managers (Fink, MacPorts) are installed on your
system, as they might conflict with the build process described here.  Now install
the prerequisite external libraries with the following shell commands:

        brew -v install pkg-config
        brew -v install glib --universal
        brew -v install pcre --universal

Now edit the file "config.mk", setting the platform entry to

        include $(TOP)/config/platform/darwin-universal

and the site configuration as desired.  Then enter the shell commands

        make clean
        make depend
        make all
        make install

as shown in the section "QUICK INSTALLATION" above.

NB: The universal build of glib 2.30.2 seems to be broken at least on OS X 10.7 (Lion)
with XCode 4.2.1; the same problem might apply to other software versions. There are 
two possible ways of dealing with this problem:

1) Omit the flag "--universal" when installing glib and change the platform configuration
   setting from "darwin-universal" to "darwin-64". This will compile the CWB only for the
   64-bit architecture (x86_64), which is the default on Snow Leopard and Lion.

2) Downgrade glib to the previous version 2.28.8 by entering the shell command

    (cd /usr/local; git checkout 4b4760a /usr/local/Library/Formula/glib.rb)

   before installing glib with "brew". See "brew versions glib" for other available
   versions and how to activate them.


        BUILDING WINDOWS BINARIES
        
Currently, it is only possible to create Windows binaries via cross-compilation
on a Unix-like system (if you want to compile on a Windows box your best option
for now is Cygwin). For cross-compilation, you will need to install:

 - the MinGW system including the Windows cross-compiler version of gcc

On Debian/Ubuntu, this can be done by installing the package "mingw32" together
with two other "mingw32-.*" packages on which it depends. There are also 
various "mingw32-.*" packages available for Fedora and other RPM-based Linux 
variants, although cross-compilation probably doesn't require all of them. 

Alternatively the MinGW cross-compiler can be downloaded from the MinGW project
website:

        http://www.mingw.org/wiki/LinuxCrossMinGW

However you install MinGW, make sure the cross-compiler binary file
(i586-mingw32msvc-gcc) and the other executables with this prefix are all 
available via your path.

Then manually edit config.mk to select "mingw" as the configuration file for 
the "platform".

When you have changed the config.mk file, building the CWB binaries is the same
as above:

        make clean
        make depend
        make all

Running "make install" is meaningless in this case, because of course you will
want to install the binaries on a different machine. To create an archive for
easy transport of the Windows .exe files, together with PDFs of the man files,
run

        make release

This will generate a zip file containing the binary release installer, in the
"build" directory. Its name is of the form cwb-$version-windows-i586.zip.

Once again, to speed up these steps there is a script which automates the changes
to config.mk and the various steps of "making":

        ./install-scripts/cwb-build-win32

To actually install, move the zip file to your Windows system, and decompress 
it.

Then run the install-cwb-win batch file. (If it doesn't find the folders it 
expects to find, it may ask you some questions.) This script will

 - create a folder like C:\Program Files\CWB
 - put all the binary files (plus necessary DLLs) in its "bin" subdirectory
 - also copy across include/library files (whether or not the latter will 
   actually work if you try to link against them is untested).

You might also want to
 - add the "bin" sub-folder to your PATH environment variable
 - move the PDF files from the "man" sub-folder to somewhere more convenient

Note that so far, cross-compilation has been tested with the following 
Unix-like starting points:

 - Ubuntu
 - [add others here]

The resulting Windows binaries have been tested on the following Windows 
variants:

 - Windows XP SP3 (32 bit)
 - [add others here: windows server, vista, 7, 64-bit windows?]
 
 
        INSTALLING PREREQUISITE EXTERNAL LIBRARIES

The Glib and PCRE libraries are needed to compile CWB. If you are using a 
Linux flavour such as Debian, Fedora etc. then the easiest way to do this is
via your package repository, which should almost certainly include both. 
In fact, the auto-install scripts will actually check that you have got
the packages in question using the package-management tool. So if you are
using the auto-install scripts, you don't need to worry about it. 

Otherwise, PCRE and Glib are available to download from these addresses:

        http://www.pcre.org/
        http://www.gtk.org/download.html

Use the instructions included in the source code downloads to build the 
libraries.

For PCRE, at least version 7 is recommended. For Glib, at least version 2
is required.

You must use a copy of PCRE which has been compiled with Unicode (UTF-8)
support and Unicode properties support. (You can find out whether this is
the case using the pcretest utility with the -C option.)

Note that the CWB build process makes the following assumptions:

 - that the libpcre.* file is installed in a directory in the linker's
   library search path. If this is NOT the case, edit your configuration
   file to add a -L<directory> command to the LDFLAGS variable to allow
   the linker to find your libpcre.a
 - that pcre.h and other PCRE header files can be found via the compiler's
   include-path. If this is NOT the case, edit your configuration file 
   to add a -I<directory> command to the CFLAGS variable to allow
   the compiler to find your pcre.h
 - that the location of header/library files for Glib can be discovered
   using the pkg-config utility (should be the case if you have used the
   standard installation procedure).

If these assumptions are not true, you may need to hack some of the 
makefile includes a bit to make things work.
 
 
        PREREQUISITE EXTERNAL LIBRARIES FOR WIN32 BUILD

When cross-compiling for Windows, things are a bit different: you need
versions of these libraries that have been compiled *to run under Windows*,
not the usual *nix versions. 

For PCRE, we have found it easiest to download the source of the library 
and use the MinGW cross-compiler to build a copy of the library files. This
then needs to be placed into the right library directory for the MinGW 
cross-compiler to find it. The same applies to the header file. We have 
found that PCRE's "make install" will do this for you, IF it has been
configured with the right options (if you run the ./configure script without
the right options, it will be compiled for *nix and placed in the normal
gcc places, possibly overwriting what you previously had there). 

The "right options" for configuring PCRE are something like the following 
(assuming that the cross-compiler is i586-mingw32msvc-(gcc|ld|...):

CC=i586-mingw32msvc-gcc CC_FOR_BUILD=gcc  ./configure                     \
    --host=i586-mingw32msvc                                               \
    --enable-utf8 --enable-unicode-properties                             \
    --enable-newline-is-any --disable-cpp --enable-static                 \
    --exec-prefix=$A_DUMMY_DIRECTORY_FOR_FILES_YOU_WILL_NOT_NEED          \
    --prefix=$HOME_OF_CROSS_COMPILER_FILES                                \
    --libdir=$HOME_OF_CROSS_COMPILER_LIB_FILES                            \
    --oldincludedir=$HOME_OF_CROSS_COMPILER_INCLUDE_FILES

If you don't use PCRE's "make install", but instead move the library/header 
files  you have built manually to the cross-compiler's directories, 
these are the files you must make sure get copied:

 - Library files:
    - libpcre.a 
    - libpcre.dll.a
    - libpcre.la
    - libpcreposix.a
    - libpcreposix.dll.a
    - libpcreposix.la
 - Header file:
    - pcre.h
 - Dynamic link libraries:
    - libpcre-0.dll
    - libpcreposix-0.dll
 
The directories they need to be inserted into are the MinGW compiler's library 
and include directories. These will usually be something like

        /$libdir/gcc/i586-mingw32msvc/$version
        /$libdir/gcc/i586-mingw32msvc/$version/include
        /$libdir/gcc/i586-mingw32msvc/bin

... for library, header, and DLL files respectively (and indeed these are the 
paths you would also need to specify as --libdir, --prefix, 
and --oldincludedir when  building PCRE), where the default for $libdir is 
/usr/lib .  

But, in general, you should make sure to consult the PCRE documentation on
building for cross-compilation. See http://www.pcre.org/readme.txt .

For Glib, we have had success using pre-built Windows DLL files made available
by the GTK project, available at 

        http://www.gtk.org/download-windows.html

but building Glib from source using the cross-compiler, as with PCRE, might
also be an option. We used the following procedure:

 - download the binary release (glib_$version_win32.zip) from gtk.org
 - download the dev release (glib-dev_$version_win32.zip) from gtk.org
 - extract the list of header, library and DLL files below from the two
   download zips, and place them into the appopriate places in the
   cross-compiler's tree.
 - cross your fingers and build!

The files to extract are listed below. Note this is a maximal, "better safe
than sorry" list, to avoid problems based on intra-Glib dependencies; ypackagesou
might also be able to build with a more limited set of files.

 - Library files (from glib-dev_$version_win32.zip/lib):
    - gio-2.0.lib
    - glib-2.0.lib
    - gmodule-2.0.lib
    - gobject-2.0.lib 
    - gthread-2.0.lib
    - libgio-2.0.dll.a
    - libglib-2.0.dll.a
    - libgmodule-2.0.dll.a
    - libgobject-2.0.dll.a
    - libgthread-2.0.dll.a 
 - Header files (from glib-dev_$version_win32.zip/include/glib-2.0) :
    - gio/*
    - glib/*
    - gobject/*
    - glib-object.h
    - glib.h
    - gmodule.h
 - Dynamic link libraries (from glib_$version_win32.zip/bin):
    - libgio-2.0-0.dll
    - libglib-2.0-0.dll
    - libgmodule-2.0-0.dll
    - libgobject-2.0-0.dll
    - libgthread-2.0-0.dll

The appropriate places for these files will be, in general, the same as for 
the PCRE files; see above.

It is important to note that if you have a *nix system and you have Glib 
installed with pkg-config, then the results you get back from pkg-config WILL 
NOT WORK for the cross-compiler. pkg-config will always tell you about
the files/directories for the *nix system itself, not the differently-
compiled files you need for the cross-compiler.

For both PCRE and Glib, the cross-compiler needs to know where the header and
library files are. Assuming you have followed the procedure above, they will
be in places the cross-compiler checks anyway. But if not, then you may need 
to add extra compiler/linker flags in the config file to inform it about their
location.

The cross-compiler doesn't need to know about the location of the DLLs, 
because it doesn't use them. Instead, they are actually copied into the 
release file as part of the "make release" command. So the location of 
the DLLs needs to be available as part of the makefile system. 

You can specify this yourself by adding the appropriate information to 
your config.mk file (as explained in the comments to that file). You need to
set either LIB_DLL_PATH (path to a directory containing both PCRE and Glib
DLLs) or both LIBPCRE_DLL_PATH and LIBGLIB_DLL_PATH.
 
If you are using the auto-build script (./install-scripts/cwb-build-win32),
this script will make a "guess" as to the location of the DLLs. So in this
case, there is no need to set LIB(PCRE|GLIB|)_DLL_PATH - but if you do, 
whatever you specify will override the "guess".


        PACKAGE CONTENTS

Makefile                top-level makefile
config.mk               makefile configuration
definitions.mk          standard settings and definitions for make system
rpm-linux.spec          configuration file for building binary RPM packages
install.sh              a GNU-compatible install program (shell script)
README, INSTALL, ...    the usual open source "boilerplate"

doc/                    some technical documentation

config/                 platform and site configuration files 
  config/platform/        compiler flags and settings for various platforms
  config/site/            site-specific settings (installation paths etc.)

instutils/              utilities for installing and binary packages

install-scripts         shell scripts that automate building/installing for
                        common systems

cl/                     corpus library (CL) source code
cqp/                    corpus query processor (CQP) source code
CQi/                    cqpserver source code (inc client-server interface CQi)
utils/                  source code of command-line utilities

man/                    manpages for CQP and the command-line utilities

editline/               local copy of the CSTR Editline library (no longer used)

