Monday 13 February 2012

Move your QlikTech ProgramData folder

There are a number of reasons why you would want to move the C:\ProgramData\QlikTech folder to somewhere else.  For example:

-  Disk size issues 

The C: drive too small and the QlikView performance logs, reload logs, etc. are filling it up.

-  Failover

You have a "cold" standby server and it would be much better to have the ProgramData\QlikTech folder on the SAN drive so that when the cold server boots up, it will already have all the settings, schedules, etc. from the old server.

There are ways of moving the ProgramData folder - the supported one from Microsoft (with caveats!) is to do it during the installation of Windows - http://support.microsoft.com/kb/949977

This isn't really going to be an option for you if the server is already built by the nice friendly folks in the IS department.  They might also object to you trying to move the whole ProgramData folder anywhere else, just for QlikView.

I have run into this situation a couple of times now and there is a really neat solution - move the QlikTech folder to its new home and then create a symbolic link in the old ProgramData folder.  Everything will still work perfectly!

To do this:

1.  Stop all of the QlikView services.

2.  Move the QlikTech folder from C:\ProgramData to its new location, for example E:\ProgramData.

3.  Run a Command Prompt in Admin mode.  CD to the C:\ProgramData folder and run the following command:

     MKLINK /D QlikTech E:\ProgramData\QlikTech

4.  Restart all of the QlikView services.

All done.  Do the same on the "Cold" server (you will need to bring down the "hot" server first) and create the same symbolic link to the same SAN location and it will work as if it was the "hot" server and no need to ship settings files.

Hopefully you will find this useful.


Stephen Redmond is CTO of CapricornVentis a QlikView Elite Partner

Printing Reports to PDF using PDFCreator

PDF Creator is a nice tool for generating PDFs and it comes at a very attractive price.  Like other tools of this ilk, it creates a Printer driver which you can print your documents to.  Unlike other drivers (especially at this price point), it gives you additional control on what is printed using a COM interface.  This makes is quite handy to use from VBScript - either inside or outside QlikView.

I am going to give you some starter code here on how to do this from within a QlikView document using a Macro.  You can also do this from a .vbs file that is external to QlikView.  Just note that if you are using a .vbs then you will still need an appropriately licensed copy of QlikView Desktop to run this.

*** All script here is completely unsupported by myself or anyone at QlikTech.  Use at your own risk ***

If you are going to use a .vbs then you will need to create your own ActiveDocument variable.  This is how I do that:


    Dim QV, ActiveDocument


    set Qv = CreateObject("QlikTech.QlikView")
    

    QV.OpenDoc Document,"",""

    set ActiveDocument = Qv.ActiveDocument


In this case, the variable Document contains the full path to my .qvw.  The 2nd and 3rd parameter are a QVUser and Password - if you have those in Section Access.  Once you have an ActiveDocument object, the code is the same between the Macro and the .vbs.

Usually when I do this, I am going to loop across all the value in a field.  For example, I may have a field called "Seg" and I want to run a report for each value in this field.

I may choose to clear all values first:




    
ActiveDocument.ClearAll


Either way, I can call the GetPossibleValues to get a list of all the values in that field:



    Dim FieldName
    
    FieldName = "Seg"


    set mySelections = ActiveDocument.Fields(FieldName).GetPossibleValues


Now, I can loop through the values, select each one in the document and call a function that I have made called Print_PDF for each value:


    Dim i


    for i = 0 to mySelections.Count - 1


        Dim FieldValue


        FieldValue = mySelections.Item(i).text
        
        ActiveDocument.Fields(FieldName).Select FieldValue


        Print_PDF FieldValue, "My Report", "RP01"
        
    Next


Print_PDF is, essentially, using the reference code from the PDF Creator documentation:


Sub Print_PDF(FieldValue, ReportName, ReportID)


    ' Designed for early bind, set reference to PDFCreator
    Dim pdfjob
    Dim sPDFName
    Dim sPDFPath


    '/// Change the output file name here! ///
    sPDFName = ReportName & " - " & FieldValue
    sPDFPath = "C:\PDFReports"


    Set pdfjob = CreateObject("PDFCreator.clsPDFCreator")


    With pdfjob


      If .cStart("/NoProcessingAtStartup") = False Then
         If .cStart("/NoProcessingAtStartup", True) = False Then

          Exit Sub

         End if
         .cVisible = True
      End If


      .cOption("UseAutosave") = 1
      .cOption("UseAutosaveDirectory") = 1
      .cOption("AutosaveDirectory") = sPDFPath
      .cOption("AutosaveFilename") = sPDFName
      .cOption("AutosaveFormat") = 0 ' 0 = PDF
      .cClearCache


    End With



    ' Print the QlikView Report
    ActiveDocument.PrintReport ReportID, "PDFCreator"



    'Wait until the print job has entered the print queue
    Do Until pdfjob.cCountOfPrintjobs = 1
     ActiveDocument.GetApplication.Sleep 20
        ' in VBScript use WScript.Sleep(20)
    Loop
    pdfjob.cPrinterStop = False


    'Wait until PDF creator is finished then release the objects
    Do Until pdfjob.cCountOfPrintjobs = 0
     ActiveDocument.GetApplication.Sleep 20
        ' in VBScript use WScript.Sleep(20)
    Loop
    pdfjob.cClose
    Set pdfjob = Nothing


End Sub


The only additional piece here from the PDF Creator documentation is the QlikView call to Print the report:


    ActiveDocument.PrintReport ReportID, "PDFCreator"


I pass the field value and a ReportName value into the function so that I can generate a different PDF file name for each field value.

Last thing is that you will need to enable System Access in the Macro settings.  In a way, this is where .vbs has an advantage.  Because the script is already external, there is no issue with enabling System Access.

Enjoy.  Just remember, this is all unsupported so use at your own risk.


Stephen Redmond is CTO of CapricornVentis a QlikView Elite Partner

Tuesday 6 December 2011

Saving Space by Nulling Zeros

If reducing the memory requirement of your QlikView document is important, this tip might help shave a few bytes that could add up to a significant performance gain.

There are several documents on performance improvement that will suggest that you should shed fields that do not contain data that will be used.  My tip is to look at fields that has data that you do use and shed the data in those that you won't!

One example would be a numeric field that may contain a lot of zeros.  If the only use of this field is in a sum, then there is no different in that zero being there or not.  The only difference would be in a count or avg.  Therefore, if you have a lot of zeros in a field that will only be summed, consider transforming them to null() in your load.

   ...
   if(Field1=0,Null(),Field1) as Field1,
   ...

If you need to use that field in a horizontal calculation, you can use the Alt function:

   Alt(Field1, 0) + Alt(Field2, 0)

Another type field that we are often told to look at getting rid of is key fields.  They are often highly unique so will take up the most space in the dataset.  It is often straightforward to remove them, but what if they are needed?  Perhaps as a key link in a 1:M relationship.

For example, say you had an Account table with a key field called AccountID.  There is also an AccountHistory table which associates on AccountID.  You can't readily join the two tables so you need to keep the key field.  However, if there is not an entry for every Account in the AccountHistory table, you don't need to keep the value for that particular AccountID in the field!  Null it!

I might create a mapping table:

   KeyMap:
   Mapping Load Distinct
      AccountID, AccountID
   Resident AccountHistory;

Then, in my Account table load:

   ...
   ApplyMap('KeyMap', AccountID, Null()) As AccountID,
   ...

Now, this is not going to be significant over a million records.  However, over a hundred million, you will see a difference.


Stephen Redmond is CTO of CapricornVentis a QlikView Elite Partner

Sunday 4 December 2011

The Right Background

I have talked about color choice recently, but whatever choice you make for the colors you are going to use, the choice of background color is critical.

You may have seen an image like this before:


If you haven't, the question is - which of the 5 squares inside the colored box is the same color as the square outside the box.  The answer is that all of the 5 squares are the same color.  The only difference is the background.

The rule, therefore, is that when choosing a background, it must be a single color.  You can't use any type of "gradient" effect or, even worse, use an image as your background.

So, what is the best color to use for the background?

In the world of software development, most IDEs (Integrated Developments Environments), such as Microsoft Visual Studio, have a default of a white background for the code and darker contrasting text.  There have been many debates on the use of white as a background.  The main one being that the white is wearing on the eye.

Some prefer black or dark blue with lighter contrasting text.  However, there is a visual effect that works against this - well known by photography experts - called Halation.  This is the effect of brighter colors seeming to spread when on darker backgrounds and hence look a little blurred.  This can cause eye strain.

Some suggest that softer grey backgrounds.  However, the issue here is that the softer contrast can have an effect on interpretation of color coding in scripting, and it is difficult to find a right choice of usable colors to offset this.  This is a feature that increases productivity in developers, so any reduction in its efficacy is not a welcome.

So, it appears that white might just be the right background color for developers, but this post is not about the best color to use for developing - it is for use in the display of quantitative data.

In an article called "Choosing Colors for Data Visualization", Maureen Stone, of StoneSoup Consulting and formerly of the Xerox Palo Alto Research Center, tells us that most color palettes are designed to be printed on white paper so using a white background is the right choice in digital form.  If your presentation is going to end up on paper, then designing on a white background is the right choice.

Further, Maureen tells us that there are perceptual advantages in using white as the human visual system is designed to adapt its color perception relative to the local definition of white and giving a white background gives us a stable definition of "white".


As a last word on this subject, I looked at six of the worlds most popular websites:


All use a lighter background and darker foreground  (not necessarily black, blue/purple are popular) and, with the exception of YouTube (which has only changed recently), that color is white.  These companies have the resources to spend a lot of money on choosing the right background colors.

Who am I to disagree with them.



Stephen Redmond is CTO of CapricornVentis a QlikView Elite Partner

Friday 2 December 2011

Easy Ad-Hoc Analysis in QlikView

I was demonstrating server object collaboration recently (the ability for end-user to create different charts and other objects and then share these with other users) to a group of users.  For me, it is quite straightforward to create a new chart but as their eyes glazed over, I thought to myself, "there must be an easier way."

So I created a view something similar to this one:



What I have here is 2 charts and a few list boxes (main filter list-boxes in a container).  The top two list boxes are for "Measure" and "Sort Order".  These are "Data Island" values loaded like this:


Set HidePrefix='%';


Expressions:
Load 
%MeasureID, 
%Measure, 
Replace(%MeasureExpression, '##', '$') As %MeasureExpression 
Inline [
%MeasureID, %Measure, %MeasureExpression
1, Total Sales, Sum(LineSalesAmount)
2, Cost of Sales, Sum(COGS)
3, Sales Margin, Sum(Margin)
4, Sales Current Year, Sum({} LineSalesAmount)
5, Sales Last Year, Sum({} LineSalesAmount)
];


ExpressionSort:
Load * Inline [
%SortOrder
Y-Value
Default
];


Now, the expressions table here would probably not be inline.  It is more likely to be in an Excel document or in a database and then added to based on user demand.  You will note that I have to use "##" and then replace with the "$" as it doesn't work with the "$" in situ (QlikView thinks it is a dollar-expansion).  You wouldn't have this problem if the source was an external one.

So, this table is a list of the most common expressions that would be used in this particular document.  It could be 2, it could be 100.  There is no limit here.

A couple of things to note.  I have prefixed with "%" and set the HidePrefix value.  This is important because your Expression selection does not then appear in Current Selections - which wouldn't make sense to the user.

The second thing to note is that you need to include the MeasureID field because we will use that later to sort the measures.

My bar chart (although it has a fast-change to line) has a Cyclic group as its dimension so I can choose any of my main dimensions.  It has two expressions:

1:  $(=FirstSortedValue(%MeasureExpression, %MeasureID))

and

2:  if(Count(DISTINCT %Measure)=2,
     $(=FirstSortedValue(%MeasureExpression, -%MeasureID)),
     Null())

This structure allows me to pick up to 2 expressions (I check for this in the calculation condition of the chart).  If only one is selected then the 2nd expression is Null()  (I have an equivalent formula for the expression label) and QlikView treats it as if it isn't there - which is what I want.

For the sort order, I override the default group order and set as a descending expression:

    if(%SortOrder='Y-Value', $(=FirstSortedValue(%MeasureExpression, %MeasureID)), Null())

If all the other sort options (e.g. Text Value) are not checked, the null in this expression will default to the default sort order of the dimension anyway.

The pivot table (which fast-changes to a straight table) is similarly constructed except I use 3 cyclic groups as the dimensions.  This gives the users a lot of options.

Worth having a play with it and seeing what you can come up with.

BTW - this will, of course, work with Small Business edition of QlikView which does not support server objects.  Also, users selections can be stored as bookmarks.


Stephen Redmond is CTO of CapricornVentis a QlikView Elite Partner

Saturday 26 November 2011

Delivering BI to Customers

It was great to see a project that I have been involved with being mentioned in the press:

   Electricity company pushes out BI tool to customers

It would have been nice to see CapricornVentis being mentioned in the article.

We now have several clients who deliver BI to clients using QlikView.  These clients range from:

Utilities - delivering usage information to end users to encourage behavior change, saving energy.

Banking - delivering information to their customers to allow better decision making.

Global Farm Feed Company - allowing farmers in 25 countries to measure feed efficiency.

While none of these implementations is trivial, it is great to work with a great technology like QlikView as the enabler.  And looking forward to delivering many more - especially on version 11.

If you need to deliver BI to your customers, feel free to give CapricornVentis a call.


Stephen Redmond is CTO of CapricornVentis a QlikView Elite Partner

Sunday 6 November 2011

Denormalize for Performance

It was gratifying for me to find out, after many QlikView implementations, that most of what I was doing was close to best practice in Dimensional Modelling.  I thought that I was mostly just building key tables to fix data connection issues and the denormalizing to clean the data structure and improve performance.

If you are unfamiliar with Dimensional Modelling, it has been around for a very long time - as long as QlikView!  Ralph Kimball, one of the early data warehouse pioneers, has literally written the book on the subject and has many great articles on his website:

http://www.kimballgroup.com/html/articles.html

If you are not familiar with relational database theory, the idea of normalization is to remove redundancy.  For example, if I have a simple product table like this:

Product Table
ProductIDProductNameCategory
1BingBongBings
2BingBangBings
3BingBungBings


The category name of "Bings" is repeated several times.  In a relational database this value is actually stored several times.  From a storage point of view, it is more efficient to split the product table into a product and a category table like this:

Product Table
ProductIDProductNameCategoryID
1BingBong1
2BingBang1
3BingBung1


Category Table
CategoryIDCategory
1Bings


In this case, the numeric ID "1" is stored several times but that takes up a lot less space than the word "Bings" which is now only stored once.  It also means that if I need to change the name of that category, I only need to change it in one location, thus improving my data consistency.

This system has been implemented widely.  Some architectures that you might come across, such as SAP or Oracle Financials, have thousands of tables, all perfectly crafted to store the data most efficiently in the relational database.  Normalization works great for transactional databases like this.

For designers, the problem occurred when people started reading data out of the transactional system, especially for reporting.  Because the queries now had to hop across several joins, the performance suffers immensely.  So, they started either undoing the normalisation process (denormalising) or, worse, keeping the normalised tables but duplicating the data into other tables so that queries ran quicker - a nightmare for data consistency.  Both approaches lead to an increase in the data storage requirements meaning that databases bloat more an more - but we can just throw hardware at that issue, right?

So, what is the best thing to do in QlikView?  A colleague of mine who was recently trained asked me about this because of the exercise that was to move the category table's data into the product table, hence denormalizing.  "Why?", he asked.  "Doesn't QlikView work just as well either way?".

There are a couple of reasons as to why you might do this.  And a couple of reasons why you might not.

Reasons for:


In one of his articles, Kimball suggests that such "snow-flaking" (leaving the category out of the product table) "compromises cross-attribute browsing performance".  Not only that, it "may interfere with the legibility of the database".

So that gives us 2 reasons:

- Performance
- Improved schema readability

And I will add a 3rd:

- Data size.

In the QlikView underlying data cloud, there will be an association between the product fields and the CategoryID field.  There will also be an association between the CategoryID field and the CategoryName field.  If I move the CategoryName into the product table (either map or join), I can then remove the CategoryID field and hence remove a redundant set of associations.  The CategoryName is not duplicated in QlikView because the columnar database still only stores it once.

Reasons against:

- Performance
- Improved schema readability
- Data size

I'm sure that you will be thinking that there is some sort of error here.  How can I have the same set of reasons "against" as I did "for"?

It depends!  It depends really on the number of records in each table and the number of fields.  For a relationship like product and category, there is probably only a few fields in the category table and it can really make sense to denormalise (it almost always make sense to do this for a table with only 2 fields - using ApplyMap).  However, with a couple of tables like, say, Order and Calendar, where Order may have millions of records and Calendar could have ten or more fields, it makes sense to keep them apart and just have the key association on the DateID field.  If you did join them in, it would mean that you could drop the DateID field but you would then have associations maintained between each field in the Order table and Year, and Month, and Day, and Week, etc., etc.  Many more associations being maintained than before and increasing your data size and possibly reducing your performance.  It may also become difficult to see the date fields in the schema whereas they are plain to see in their own Calendar table.


Stephen Redmond is CTO of CapricornVentis a QlikView Elite Partner