Categories
Notes Old site

Lookups in Excel

The HLOOKUP() and VLOOKUP() functions are two of the most commonly used functions in Excel, but there are a number of pitfalls that should be avoided.

Both functions take either three or four arguments (the fourth is optional):

  1. lookup_value the value that you are trying to match.
  2. table_array the range in which you are trying to look up the value. The function will try to match your lookup_value in the leftmost column (VLOOKUP) or topmost row (HLOOKUP) of this range.
  3. col_index_num (VLOOKUP) or row_index_num (HLOOKUP) the column or row containing the corresponding values you want to find. The numbering starts at 1 for the column or row containing the values to be matched.
  4. range_lookup whether to perform an exact or approximate lookup. If this is TRUE you’ll get an approximate lookup, if it is FALSE you’ll get an exact lookup. This arrgument is optional: if no value is specified, it will be assumed to be TRUE.

When should you use an approximate lookup, and when an exact lookup? It’s easier to decide if you understand what is going on as the lookup is performed.

Approximate lookups

If you are doing an approximate lookup, Excel will take each cell in the leftmost column (VLOOKUP) or topmost row (HLOOKUP) in turn, and see how it compares to the value it is looking for (lookup_value). If the value of the current cell is the same as the lookup_value, it stops, having found the correct row (VLOOKUP) or column (HLOOKUP). If the value of the current cell is less than the lookup_value, it moves on to the next cell. If the value of the current cell is greater than the lookup_value, it stops, and uses the previous row (VLOOKUP) or column (HLOOKUP). In other words, Excel returns the next largest value, as long as the lookup range is sorted in ascending order.

This provides a way of remembering the significance of the optional argument: use TRUE when the lookup range is sorted, and FALSE when it is not sorted.

Using TRUE when the lookup range is not sorted will lead to unpredictable results: for VLOOKUP, the row that will be used to provide the value will be the row above the first row whose leftmost column has a value greater than lookup_value.

If the lookup_value is smaller than the first value in the leftmost column (VLOOKUP) or topmost row (HLOOKUP), Excel will return the #N/A value.

Exact lookups

When Excel performs an exact lookup, it will take each cell in the leftmost column (VLOOKUP) or topmost row (HLOOKUP) in turn, and see how it compares to the value it is looking for (lookup_value). If the value of the current cell is the same as the lookup_value, it stops, having found the correct row (VLOOKUP) or column (HLOOKUP). If it gets to the end of the column (VLOOKUP) or row (HLOOKUP) without finding a match, it will return the #N/A value.

Approximate or exact?

You should always specify the optional fourth argument to these functions, so that it is obvious which sort of lookup you intend.

Whether you use an approximate or exact lookup will, of course, depend on the use to which you are going to put the values that you are looking up. However, there are some situations that you should avoid:

  • Don’t use an approximate lookup if the lookup table isn’t sorted in ascending order by the values in the leftmost column (VLOOKUP) or topmost row (HLOOKUP).
  • Don’t use an exact lookup if your lookup values are floating point numbers (ie, any numbers with decimal points). This is because floating point numbers have to be exactly the same in order to match: 4.9999999999999999 won’t match 5. Remember that even if a number is displayed as 5.0, it may actually be 4.9999999999999999.
  • Be very careful when using floating point numbers even with approximate lookups. 4.9999999999999999 is less than 5, so Excel wouldn’t find the correct row even with an approximate lookup.

Calculation speed

Lookups can take a long time to compute, if the lookup table is large or if many lookups are performed. Approximate lookups are faster, as Excel doesn’t always have to go through every row or column in the lookup table. It is often possible to use an approximate lookup, if the lookup table is sorted and is known to contain all possible values.

Many people use a formula such as =IF(ISNA(VLOOKUP(Currency,Rates,2,FALSE)),0,VLOOKUP(Currency,Rates,2,FALSE)). Because of the way Excel works, this always does both lookups in the formula, thus doubling the calculation time. It is much more efficient to have a column (or row) to contain the raw lookup results, and then test them with the ISNA() separately.

Categories
Notes Old site

Bookmarks in Excel

Wouldn’t it be nice if you could create bookmarks in Excel, in the same way that you can bookmark pages in your browser? The ideal would be to record the exact screen position, so that you could automatically return to the same view at some point in the future.

Well, you can! But they are not called bookmarks. Instead, you can use Custom Views.

Custom Views

To define a Custom View, set things up so that they look the way you want them to look. The following settings are recorded when you define a Custom View:

  • The currently active worksheet
  • The size of the window, zoom percentage, position on screen
  • Headings that have been set using Freeze Panes
  • Hidden sheets

You can choose whether to record the following settings:

  • Print settings
  • Hidden rows, columns and filters

When things look the way you want them to, open the Custom Views dialog from the menu, using View\Custom Views...

Picture of Custom Views dialog

Then click on the Add button, to show the Add View dialog.

Picture of Add View dialog

Select the options you require, type in a name, click on OK, and your new Custom View has been defined.

To view it again, bring up the Custom Views dialog and select its name in the list. You can either double click on the name, or select it and click on Show.

Defined Names

You can also use Names, by defining a Name for a range. You can then go to that range by selecting it in the Name Box.

Picture of Name Box

When you choose a Name from the Name Box, Excel adjusts the view so that the range to which the Name refers is visible on the screen. If the range is too large, as much as possible of the range, starting at the top left hand corner, is shown. In general it appears that Excel changes the view as little as possible, so that often single cell ranges are shown towards the bottom right of the screen.

It is only possible to choose Names that are defined in the current workbook. Names that are local to a worksheet are shown only when you are already viewing that worksheet.

It is somewhat more convenient to define a Name by typing in the Name Box, and then viewing the range by selecting it in the Name Box, but you don’t get the fine control over what you see that you do when you use Custom Views.

Categories
Notes Old site

Statistical functions in Excel 2003

There were errors in the implementations of some of the statistical functions in the Analysis ToolPak in Excel 2002 and earlier, which have been corrected in Excel 2003. This means that a workbook developed in an earlier version and recalculated in Excel 2003 may produce different results.

The article gives the impression that it is only in rare cases that the corrections will make any difference. That may be true, but it’s not very comforting for those people that they do affect. One of the errors concerns the calculation of the standard normal cumulative distribution function; the old implementation is basically wrong out in the tail. Another one comes up in functions that involve sums of squares: the old version is inaccurate if there are many significant digits in the data but very little difference in the values. I don’t find it hard to imagine either of those situations coming up in practice.

If you think your calculations are affected, you then have to decide whether to stick with the old version and go for consistency, or upgrade and cope with the differences.

Resources

The following external links are relevant:

Categories
Notes Old site

RAND() in Excel 2003

It appears that a bug crept into the final release of Excel 2003 that wasn’t present in the betas. The RAND() function in Excel 2003 was apparently upgraded so that it produced a better distribution of pseudo-random numbers between 0 and 1. The trouble is that the numbers it produces may be more random, but aren’t always in the correct range; sometimes they are negative. The problem affects both the RAND() and RANDBETWEEN() functions.

Microsoft have now (January 2004) released a hotfix (their term, not mine) that they claim fixes the problem. They also claim that it fixes several other problems, a number of which they had not previously mentioned. Some of these problems cause Excel to quit unexpectedly; at least it’s obvious to the user when this happens (although you may lose your work). Others are more subtle:

  • Sometimes the cells in a range are not actually updated when the range is recalculated.
  • When you use a VBA macro to calculate your worksheet, a custom function from a different worksheet may appear to run.
  • When you create multilevel subtotals for your data in an Excel 2003 worksheet, the totals may appear staggered incorrectly, and may exclude grand totals for some functions.

The hotfix is not downloadable. You have to contact Microsoft and convince them that you need it. Also, the installation process includes editing the registry by hand.

Resources

The following external links are relevant:

Categories
Notes Old site

Dynamic Ranges

Usually, when you specify a range in a formula, you specify it through its address: for example, $B$5:$C$10. However, you can specify ranges through formulae, effectively making them dynamic.

There are a number of formulae you can use to specify dynamic ranges, including OFFSET and IF.

Using the OFFSET formula to define a dynamic range

The formula that is most often used to define a dynamic range is OFFSET. The result of an OFFSET formula is a range. You have to specify the top left hand corner and the dimensions.

Specifically, there are five arguments:

  • Reference: A range that forms the basis of the new range. This can be a single cell, or a range of adjacent cells.
  • Rows: The number of rows up (negative) or down (positive) from the top of the reference range that you want the new range to start at.
  • Cols: The number of columns left (negative) or right (positive) from the left of the reference range that you want the new range to start at.
  • Height: The height (number of rows) of the new range. This argument is optional. If you omit it, the new range will have the same number of rows as the reference range.
  • Width: The width (number of colums) of the new range. This argument is optional. If you omit it, the new range will have the same number of columns as the reference range.

So, for example, the result of the formula OFFSET($B$5:$C$10, 3, -1) would be the range A8:B13: it starts 3 rows down and one column to the left of B5, and has 6 rows and 2 columns, just like the range $B$5:$C$10.

The real utility of this formula comes when you use formulae for some of the arguments, and define a name to refer to it.

For example, suppose you have monthly sales data in a spreadsheet; you want to plot a chart, but don’t want to edit the chart each time you update the sales data.

sales data in column B

You can define a name SalesData to refer to the following formula:
=OFFSET(chartData!$B$5,0,0,COUNT(chartData!C:C)-COUNT(chartData!$B$1:$B$4),1)
and then set the data series in the chart to refer to SalesData.

You can then update the chart simply by adding a new line of data.

Note that we have to be a bit careful how we specify the number of rows in the dynamic range, so as not to count any values that are not sales data.

Using IF to define a dynamic range

There is an even simpler way to define a dynamic range, which is especially useful for data validation. Suppose you have related sets of options: if you choose a service level of premier, for example, you can choose one of three delivery methods, but if the service level is standard, you only have a choice of two.

sales data in column B

For the cell B5 we can set the data validation type to list and enter the following formula as the range:
=IF($B$4="Premier", $E$3:$E$5,$E$4:$E$5)
The drop down box for cell B5 will then show the valid delivery options for the service level chosen in cell B4.

Categories
Notes Old site

Using Names in charts

There are some ways of using Excel names in charts that can help make your charts easier to set up and maintain. You can use names to define the data series; in conjunction with dynamic ranges, you can then build dynamic charts without the use of macros.

Using names in charts

You can use a name in most of the places that you can use a range, when setting up a chart. However, you must always prefix the name with the worksheet or workbook name and an exclamation mark.

In particular, you can use names when specifying data series.

Chart data series

If you are using a name with workbook visibility (ie not local to a worksheet), you can prefix the name with either the workbook name or the sheet name when you type it in. Excel will automatically change the prefix to the workbook name. If you are using a name that is local to a worksheet, you must prefix the name with the worksheet name.

The useful thing about using a name to specify a data series is that if you change the range the name refers to, the chart will automatically show the new data. You don’t have to edit the chart in any way. As editing charts can be fiddly, this can be very helpful.

You cannot use names when specifying a data range. If you type a name in, Excel will automatically change it to the absolute address.

Chart data range

Categories
Notes Old site

Defining Names

In Excel, you can give a range a name. You can do this in two ways, using the Name Box or by using the Define Name dialog. You can change the range that a name refers to, or delete a name, using the Define Name dialog.

Using the Name Box is quicker and easier for straightforward name definitions, but the Define Name dialog provides a more flexible interface.

Using the Name Box to define names

Picture of Name Box

Select the range that you wish to give a name to, and type the name into the Name Box, which is the box just above the top left of the spreadsheet area. In the example, we are defining the name Total to refer to the single cell
A2. Multi-celled ranges canbe given names in exactly the same way.

The Name Box shows all the names that are currently defined, through the drop-down just to its right. Select any name in the drop down to select that range in the workbook.

If you happen to select a range that has a name defined for it, the name will appear in the Name Box.

Using the Define Name dialog to define names

Open the Define Name dialog from the menu, using Insert\Name\Define... Type the name that you wish to define in the top box (Excel will often suggest one for you) and then use the Refers to box to enter the range. You can type the range in, or select it in the usual way. If you type it in, make sure it is an absolute range (with dollar signs) unless you know what you are doing (see related note on using names for formulae). Finally, use the Add button to add the new name to the list.

To delete a name, select it in the list and use the Delete button.

To change the definition of an existing name, select it in the list and change its definition in the Refers To box, then use the Add button.

Categories
Notes Old site

Names can be local to worksheets

Many names in Excel are global in scope: the same name means the same thing wherever it is used. For example, the name
Inflation might refer to cell B2 on worksheet Costs.

Often, though, you may have many similar sheets; for example, an Costs sheet for each year of a 5 year business plan. In such cases you might want to use a different inflation rate for each year, but you would still want to use a named range. The solution is to use a name that is local to the worksheet in question.

Names local to worksheets

You can define a name that is local to a specific worksheet by prefixing the name with the worksheet name followed by an exclamation mark. So, for example, you might define CostsYear1!Inflation to refer to B2 on sheet CostsYear1, CostsYear2!Inflation to refer to B2 on sheet CostsYear2, and so on.

If you use the name Inflation in a formula on sheet CostsYear1, it will refer to the value in cell B2 on that sheet. Similarly, the name Inflation when used in a formula on sheet CostsYear2 will refer to the value for year 2.

You can use a name that is local to a specific worksheet on another sheet by using the full name with the sheet prefix, for example CostsYear1!Inflation.

If you have a local and a global name with the same name, the global name is not visible on the worksheet in which the local name is defined.

The Define Name dialog shows only global names and those that are local to the currently active worksheet. Similarly, the Name Box also omits names that are local to other worksheets.

Categories
Notes Old site

Using Names for formulae

The usual way to use Names in Excel is to use them to refer to absolute ranges. Typically, the name VATRate might refer to the range $D$2, for example.

However, it is possible to define names to refer to formulae that use relative ranges.

Names that refer to formulae can be useful when they are used to define dynamic ranges. On the whole, however, they should be used with caution: make sure they are documented somewhere, so that a future maintainer will find it easy to understand what is going on. One source of confusion is that their definition may depend on the currently active cell.

Names for formulae

When you define a name using the Define Name dialog, you can actually type any formula you like in the Refers to box. The formula is interpreted as being relative to the currently active cell in the workbook.

For example, if you have cell B9 selected, so that it is currently active, you could define a name AboveAndLeft as referring to =B8+A9. You could then use this name in cell D12, say, to add the
values in D11 and C12.

Picture of formula name