CS614 Final Term Latest Past Papers 2025

Moving Away from the Purist Perspective

Before delving into the topic, it’s essential to clarify what a purist is. A purist is someone who strictly adheres to traditional or idealized methods, insisting on doing things exactly “by the book” or following established, often outdated, principles. This approach lacks flexibility and practicality. Purists seek perfection, and as a result, they often justify inaction by arguing that the world isn’t perfect enough to meet their standards.


In the realm of data warehousing, as these systems and their underlying technologies have become more widespread, certain conventional characteristics have been altered to accommodate the growing and varied demands of users. For instance, it’s now generally accepted that a data warehouse is not a comprehensive store of all organizational data. Other significant adjustments involve the concepts of time variance and non-volatility, which have been somewhat relaxed to fit practical needs.

Yield Management Explained

A good example of adapting traditional approaches can be seen in yield management, especially in industries like aviation. Airlines often use sophisticated pricing strategies where the price of the same seat can vary widely depending on when the ticket was purchased, the number of seats still available, Whether the ticket is for a one-way journey or a round trip, along with other considerations. For instance, two passengers sitting next to each other on a flight might have paid different prices for their seats despite those seats being identical. This dynamic pricing approach allows airlines to maximize their revenue by adjusting prices based on demand and timing.

Normalization and Practical Adjustments

Normalization refers to a set of guidelines aimed at organizing database tables to reduce redundancy and improve data integrity. However, it’s crucial to understand that these guidelines are not rigid standards but best practices. In environments like Decision Support Systems (DSS), strict adherence to normalization might be impractical due to performance constraints. Sometimes, deviations from these purist norms, known as denormalization, become necessary to improve query response times or meet specific business needs.

Whenever denormalization is considered, it’s important to weigh the potential trade-offs, such as increased data redundancy or the possibility of inconsistencies, to ensure the benefits outweigh the downsides.

Denormalization Techniques: Collapsing Tables

One common denormalization method is collapsing tables, which involves merging entities that share a one-to-one relationship. If each record in Table A corresponds to exactly one record in Table B, combining them into a single table can simplify queries and improve performance. This technique works well when the two entities are tightly linked, even if their key attributes differ.

Denormalization Techniques: Splitting Tables

Unlike collapsing tables, denormalization can also involve dividing a single table into several smaller ones. This process, which can be horizontal (dividing rows) or vertical (dividing columns), is frequently used in distributed DSS environments. Splitting allows for better data distribution and optimized query performance in complex systems where large datasets are involved.

Use of Derived Attributes in Data Warehousing

Another practical approach in data warehousing is the inclusion of derived attributes—data that is calculated from existing information rather than directly collected. Adding derived attributes makes sense when these calculations are done often and the resulting values remain stable. This practice helps reduce the computation needed during queries because the derived data is pre-calculated and stored.

Derived attributes enhance system performance by minimizing the time spent on runtime calculations. Additionally, once these derived values are correctly computed and validated, their accuracy is generally reliable, reducing the chances of errors in subsequent use. This, in turn, increases the trustworthiness of the data stored in the warehouse.

The Balance Between Storage Space and Performance

Theoretically, two extremes exist in managing data warehouses: unlimited storage with maximum pre-computation versus unlimited processing power with no pre-computation.

In the first extreme, if storage capacity were unlimited and free, every possible aggregation or data summary (such as data cubes combining various dimensions) could be pre-calculated and stored. This would ensure the fastest possible query responses because all answers are already computed in advance. However, this approach is impractical due to the enormous storage costs and the time required to build these pre-aggregates.

At the other end, if processing power were infinite and instant, there would be no need to store pre-computed summaries because queries could be calculated on the fly without delay. While this might save storage space, it would demand extremely powerful and costly hardware, and in reality, such performance is unattainable.

Most practical systems must find a balance between these two extremes—deciding how much to pre-calculate and store versus how much to compute dynamically based on available resources and performance needs.

Conclusion

In summary, the rigid, purist approach to data warehousing and database design is often impractical in today’s fast-evolving technological landscape. To meet real-world demands, compromises such as denormalization, yield management, and the inclusion of derived attributes are necessary. These adaptations allow systems to be more flexible, efficient, and responsive to user needs, even if they stray from traditional ideals.

Ultimately, successful data warehouse design is about striking the right balance between theoretical perfection and practical usability. By carefully considering trade-offs in performance, storage, and data integrity, organizations can build systems that are both robust and capable of handling complex, dynamic workloads effectively.