Tuesday, August 26, 2014

Start BIDS Visual Studio From "Run" Command Prompt

Open Run Command Prompt(Start->Programs->Run or Windows Logo + R) and type the text given below.a) devenv : Opens VS IDE With Splash Screen.b) devenv...
Many developers face this issue of delay in opening Visual Studio IDE.
Here is a small solution to resolve this.

Open Run Command Prompt(Start->Programs->Run or Windows Logo + R) and type the text given below.

a) devenv : Opens VS IDE With Splash Screen.
b) devenv /nosplash : Opens VS IDE Without Splash Screen.
 

Monday, August 11, 2014

Microsoft Parallel Data Warehouse : Building a data warehouse in SQL Server: Tips to get started

When building a data warehouse using SQL Server technologies, there are some challenges you might face. I'll offer you some suggestions for overcoming them. Naturally, each environment is likely to add its own twist to the challenges of building a data warehouse, but most of these issues are generic enough to apply to many companies embarking on architecting business intelligence (BI) applications.

1.  Management support. This is the first and foremost challenge you must overcome. It is annoying to technical people (including yours truly) because it has to do with politics and not software. Let's keep in mind who signs your paycheck, however. If management isn't convinced they need a data warehouse all your technical skills will be in vain.

Unfortunately upper management often thinks of a data warehouse as yet another system they have to invest in without any immediate return. Your job is to convince the powers that be they're not paying for bells and whistles – data warehouse will help them make better, more informed decisions by correlating data scattered across the enterprise. It is not uncommon to totally change the way an organization operates after a data warehouse application is implemented. Sometimes it helps to build a proof-of-a-concept (POC) project to demonstrate the power of BI. POC data warehouse will only contain a small portion of all data

and will only be exposed to a few users for testing. Even if you don't have an approval for a POC you should demonstrate parts of the warehouse as they become available to your users; doing so keeps them interested and assured that the system will help them be more effective in their work.

2.  Data availability. Data warehouse usually combines data from multiple data sources, which aren't necessarily obvious or easily accessible. Depending on the company size, you might have to travel to multiple offices and speak to numerous employees to figure out where you can get the data that is necessary to build analytical views desired by your users. Again, depending on the political climate of your enterprise, you might find it difficult to get your hands on each necessary data element. Data you work with might contain confidential and highly sensitive details. Furthermore, some data might only be accessible through reports provided by an outside entity who will not grant you access to their data source.

How do you overcome this challenge? First, be sure to convince everyone that you're not taking their job. Data warehouse is supposed to supplement and not replace any existing systems. If you have management on your side you shouldn't have too many issues with internal employees. But if you cannot get access to the data source – for unforeseen reasons – then you may have to get creative. If data is only available on report screens or on paper you might have to find a way to capture this data perhaps through scraping report screens or scanning documents.

3.  Complexity of data sources. Sometimes you get lucky and all data elements that you need reside in the database management system (DBMS) of your choice. More often, data is dispersed across multiple DBMS, spreadsheets, email systems, electronic documents and even on paper. Yes, we're in 21st century, but take my word - there are still companies out there who keep certain information only on paper. It's your job to figure out how to get data into your warehouse from all disparate sources and give it a common shape.

4.  Data quality. Many transaction processing applications are thrown together using rapid development techniques, often by people who have very limited knowledge or experience with the tools they're using. This is not a derogatory statement; one must start somewhere and often junior programmers are most affordable resources that a company can use. The trouble is that if the application does not validate data you're likely to encounter string values that are abbreviated, misspelled or completely omitted. For transaction level reports, this might not be a huge deal, but when you're trying to group data and empower your users with the ability to make decisions, data quality is essential.

For example, take a look at the following customer names:

a. ACME Boot Outlet
b. ACME bt outlt
c. A.C.M.E
d. Boots (ACME)
e. A c m e boot store (outlet)


Human eyes can easily spot that each of the above refers to the same business entity. But to the computer program, each of these values represents a separate customer. Unfortunately there is no easy way to correct all bad data using Integration Services (or Data Transformation Services) packages. SQL Server 2005 Integration Services offers the Fuzzy Lookup Transformation which may greatly simplify your data cleansing efforts. Even so, you might also have to enlist the help of clerical workers to correct the data.

5.  Multitude and technical savvy of your users. Helping a handful of users get to the data they need in order to make decisions is not a small fish to fry. Historically, data warehouse was reserved for upper-level management's use. However, data warehousing has slowly made its way to the masses. It's not uncommon to see data warehousing and BI applications used by users that don't necessarily make strategic decisions.

Once your users see the power of data warehousing, they'll want to use it for everything from data entry to retrieving transactional level reports. The larger your user community, the tougher it becomes to keep everyone happy and to educate them on appropriate uses of your application. BI applications perform great when used appropriately, but they are not suitable for every business requirement. Often, you must be the bearer of bad news that data warehouse is not the tool of choice for the functionality your users desire.

6.  Slowly changing dimensions. Your application must reflect data changes that occur in data sources populating the warehouse. Data within dimension tables is particularly prone to change. But how do you maintain the history of such changes?

    a:    The first and easiest method is to overwrite the existing record without tracking changes. Fortunately              this method is acceptable for many dimensions. For example, if a department name changes from                  "finance" to "finance and accounting" you probably won't be required to maintain the history of such              change. However, in a customer or student dimension it is often necessary to keep track of changing              names, marital status, education level, and other attributes - your application must be able to retrieve              current as well as historical values.

  b:      The second method for managing slowly changing dimensions is to create a new record when the                   value changes and tag the older record as obsolete.

  c:        The third and final method is to maintain historical value(s) of the changing domain in separate                        columns of the same row within the dimension table.

So far we've covered the handling of changing values in the data warehouse dimension. But what about Analysis Services dimensions? To reflect changes in attribute values you must re-process dimensions. But if you process dimensions fully you'd have to also reprocess cubes, which might be impractical if your cube contains large volumes of data. Fortunately you can often use the incremental update (process update) option to avoid reprocessing cubes.

7.  Changing facts. Normally one would think that records in the fact tables are static – once the record gets to the warehouse your job is finished, right? Unfortunately the answer is "it depends." In some cases like in a data warehouse tracking patients' hospital stays, all records are generally static. If your hospital stay lasted from January 1 to January 5, that record isn't likely to change.

But consider the retail industry; all sales aren't final – I'm sure you know people who often take the goods they purchased back to the store for various reasons. Some companies would manage such transactions as a series of credits and debits that balance each other out. But in other cases you must either update or delete fact table records, even after they're added to the data warehouse. For example, if a stock trade is recorded incorrectly, balancing it with a corresponding negative trade might not be acceptable. Here's another thing to consider: you might not want your customers to know about problems in your transactional systems. Rather, you want them to see the data only after it is corrected.

One method for dealing with changing facts is to keep the records in the staging area until the data is examined for quality and then migrate it to the warehouse. Even the most thorough testing won't catch all errors in the data sources, however. Occasionally, you might have to refresh cubes by processing those partitions that contain incorrect data. This is why it's essential to keep your Analysis Services partitions small so that processing will complete relatively quickly.

Another way to deal with this challenge is through a write-back partition. With cube write-back, you're not really changing data in the underlying relational warehouse; instead you're adding a record to a separate partition. When a user queries a particular measure group Analysis Services combines data from read-only partition with the data in write-back partition and exposes the result. Not surprisingly, performing such calculations at query execution time imposes additional overhead on Analysis Server and can cause performance degradation.

8.  Implementing security. Much like with any other business application, it's imperative to secure access to your data warehouse and cubes. Each user might only be permitted to see a small portion of the warehouse, depending on their job title or area of responsibility. Analysis Services implements security through Windows accounts which you can associate with Analysis Services roles. You can secure data at a very granular level, all the way down to individual cube cells.

Unfortunately creating a separate role for each user is not only tedious and cumbersome but could also instigate performance problems – this is particularly true of Analysis Services versions prior to 2005. The advice is to group your users into a handful of roles, if possible. If every single user must have her own set of permissions, you'll have to implement a security dimension. Once you have such dimension every MDX query will have to filter the output based on the identity of the current user.

Naturally I can't cover every data warehousing challenge in a single tip. Nor can any one person claim to be familiar with every difficulty involved in building data warehousing solutions. However, if you're contemplating an attempt to implement such solutions, you should carefully consider what you're up against. Do your homework and be prepared for a battle.



For your Reference






























Microsoft Parallel Data Warehouse : SQL Server 2012 Parallel Data Warehouse (PDW) – What’s new?

I want share some insights about the new SQL Server 2012 version of Parallel Data Warehouse. In this post I will talk about some new features as well as architecture changes.

Architecture Update

The SQL Server 2012 version of PDW introduces some major architecture changes:
  • As show in the picture below the Landing Zone and the Backup node have been removed from the appliance and there good reasons for. The standard sizing of these component didn’t met most of the customer requirements and it was very hard to find a configuration that meets 80% of PDW customers. So decision has been made to remove these components and customers have now more flexibilities to size the ETL and the Backup node to their needs.
image

  • Expensive storage components have been replace by an economical high density Direct Attached Storage
  • Virtualization of the Software Components based on Windows Server 2012 and Hyper-V
  • Modular Design and Smarter Scale Out: SQL Server 2012 PDW has reshaped the very hardware specifications required of an appliance through innovations from the software to deliver optimal value to customers. The new version introduces a new modular design, as show in the following picture:
image

The smallest PDW appliance consists of a Base Scale Unit (with a Passive Scale Unit for HA) and can be extended by a number of Capacity Units. If you reach the physical space limit of the rack you can add up to 6 more racks.
Depending of the vendor the units are shipped differently: HP offers a Base Scale Unit with 2 compute nodes and every Capacity Unit is also shipped with 2 compute nodes (left side of the picture). Dell offers the Base Scale Unit and the Capacity Unit with 3 compute nodes each (right side of the picture).
If we have a look at the capacity the smallest PDW appliance from HP (Quarter Rack with only the Base Scale Unit) offers a capacity of 53 – 227 TB (depending on the compression rates) and a raw disk space of 45 TB (with 3 TB disks). The Quarter Rack of DELL provides a capacity of 79 – 3470 TB (depending on the compression rates) and a raw disk space of 68 TB (with 3 TB disks).
  • Further Scale: Based on the new modular hardware design and a hardware refresh PDW offers now a scale out capacity up to 5 Petabytes.

Software Architecture Overview

The new version of PDW provides also some major software updates. As the product name already states PDW is now running on SQL Server 2012. The operating system on all hosts is Windows Server 2012 Standard edition. All fabric and workload activity happens in Hyper-V virtual machines which also run on Windows Server 2012 Standard edition. A PDW Agent runs on all hosts and all VMs and collects appliance health data on fabric and workload.

image

A special PDW version of SQL Server 2012 Enterprise Edition is used on the Control node and on all Compute nodes to provide high scale database capabilities.

New Features

Additionally to the new hardware and software architecture I want to highlight some more very interesting features for customers:
  • Columnar Storage: Microsoft continues the rollout of xVelocity and provides with SQL Server 2012 PDW a new primary storage type for databases. Customers can now choose between a row store and a new updateable version of the xVelocity memory optimized columnstore as table storage format. This means that we can define a writable Clustered Columnstore Index (Updates and bulk load are fully supported) at a table so that the whole table is stored into memory and we benefit of a much higher compression by the column oriented storage format.
  • Visual Studio 2012 Integration: As you probably know the tool used for database administration and database development in PDW Version 1 was Nexus. With the new version of PDW we have now full support forSQL Server Data Tools for Visual Studio 2012.
image   image
Also the project types for SSIS, SSRS & SSAS are fully supported with Visual Studio 2012.
  • Monitoring Enhancements: The whole monitoring of sessions, queries, loads, appliance health status and performance information has been completely redesign. Microsoft did a very good job here and the design looks like the Azure Portal. This means also for customers that monitoring and operations will look like the same On Premises as well as in the Cloud on Windows Azure. The following screenshot gives an impression of the new web user interface:
image
    • Polybase & Hadoop Integration: Today almost every big database vendor also provide a “Big Data” solutions based on Hadoop. Whether it’s a vendor specific distribution like IBM’s BigInsights solution, which is based on Cloudera or Microsoft’s specific implementation of Hortonworks Hadoop distribution called HDInsight or a full appliance which comes pre-installed with Hadoop. No matter which Hadoop platform you choose you still have the challenge of integration. Some type of data and analytics will happen on Hadoop (like text, log or sensor analysis) but we will still use databases for BI & Reporting.
While this might be something you have decided to do, you realize that there is a big learning curve when your IT department needs to re-orient themselves around HDFS, MapReduce, Hive, Hbase, etc. rather than T-SQL and a standard RDBSMS design. It will require a significant re-training around Hadoop and the ecosystem as well as a major effort to integrate the Hadoop implementation with the data warehouse.
In order to meet the requirements of a modern data platform, it must provide insights to your end users without having to acquire another tool from a third party or another expensive appliance offering to purchase.
The unfortunate reality is that no one vendor can deliver on all the options you need at a cost that you want to pay. They either have a data warehouse solution but no BI or provide BI but no data warehousing. Some vendors provide a Big Data solution but is disconnected with their data warehouse solution. Finally, some vendors might have a solution for each workload and will happily charge you millions of euros or dollars for them all.
image
Microsoft goes a different way and brings with SQL Server 2012 PDW an integrated query layer called “Polybase” which enables queries across Hadoop and SQL Server. Data structures stored in Hadoop are described by tables in PDW and customers can consume these data by standard T-SQL and can also join those tables with normal relational ones. So to end users that only consume this data it’s totally transparent where the data comes from and IT departments can use their normal database skillset to work with Hadoop. How does those kind of tables look like?
image
Now we can easily query and join this table.
image
Summary
As you can see from this post Microsoft did very heavy investments in its modern Big Data platform and gives customers the possibility to invest build high scale solutions on SQL Server 2012 PDW as well as on HDInsight for Hadoop based workloads. With Polybase customers get a fully integrated Hadoop query engine into the Data Warehouse Layer that can easily consumed with T-SQL knowledge.
The new architecture and the modular design of the appliance gives customers the possibility to start with small investments and scale very cost efficient on demand.
In my next post I will talk in more detail about a POC I did with the new version and the results and lessons learned.
Source: Microsoft slide decks of SQL Server 2012 Parallel Data Warehouse


For your Reference





























Microsoft Parallel Data Warehouse : Building a data warehouse in SQL Server: Tips to get started

When building a data warehouse using SQL Server technologies, there are some challenges you might face. I'll offer you some suggestions for overcoming them. Naturally, each environment is likely to add its own twist to the challenges of building a data warehouse, but most of these issues are generic enough to apply to many companies embarking on architecting business intelligence (BI) applications.

1.  Management support. This is the first and foremost challenge you must overcome. It is annoying to technical people (including yours truly) because it has to do with politics and not software. Let's keep in mind who signs your paycheck, however. If management isn't convinced they need a data warehouse all your technical skills will be in vain.

Unfortunately upper management often thinks of a data warehouse as yet another system they have to invest in without any immediate return. Your job is to convince the powers that be they're not paying for bells and whistles – data warehouse will help them make better, more informed decisions by correlating data scattered across the enterprise. It is not uncommon to totally change the way an organization operates after a data warehouse application is implemented. Sometimes it helps to build a proof-of-a-concept (POC) project to demonstrate the power of BI. POC data warehouse will only contain a small portion of all data

and will only be exposed to a few users for testing. Even if you don't have an approval for a POC you should demonstrate parts of the warehouse as they become available to your users; doing so keeps them interested and assured that the system will help them be more effective in their work.

2.  Data availability. Data warehouse usually combines data from multiple data sources, which aren't necessarily obvious or easily accessible. Depending on the company size, you might have to travel to multiple offices and speak to numerous employees to figure out where you can get the data that is necessary to build analytical views desired by your users. Again, depending on the political climate of your enterprise, you might find it difficult to get your hands on each necessary data element. Data you work with might contain confidential and highly sensitive details. Furthermore, some data might only be accessible through reports provided by an outside entity who will not grant you access to their data source.

How do you overcome this challenge? First, be sure to convince everyone that you're not taking their job. Data warehouse is supposed to supplement and not replace any existing systems. If you have management on your side you shouldn't have too many issues with internal employees. But if you cannot get access to the data source – for unforeseen reasons – then you may have to get creative. If data is only available on report screens or on paper you might have to find a way to capture this data perhaps through scraping report screens or scanning documents.

3.  Complexity of data sources. Sometimes you get lucky and all data elements that you need reside in the database management system (DBMS) of your choice. More often, data is dispersed across multiple DBMS, spreadsheets, email systems, electronic documents and even on paper. Yes, we're in 21st century, but take my word - there are still companies out there who keep certain information only on paper. It's your job to figure out how to get data into your warehouse from all disparate sources and give it a common shape.

4.  Data quality. Many transaction processing applications are thrown together using rapid development techniques, often by people who have very limited knowledge or experience with the tools they're using. This is not a derogatory statement; one must start somewhere and often junior programmers are most affordable resources that a company can use. The trouble is that if the application does not validate data you're likely to encounter string values that are abbreviated, misspelled or completely omitted. For transaction level reports, this might not be a huge deal, but when you're trying to group data and empower your users with the ability to make decisions, data quality is essential.

For example, take a look at the following customer names:

a. ACME Boot Outlet
b. ACME bt outlt
c. A.C.M.E
d. Boots (ACME)
e. A c m e boot store (outlet)


Human eyes can easily spot that each of the above refers to the same business entity. But to the computer program, each of these values represents a separate customer. Unfortunately there is no easy way to correct all bad data using Integration Services (or Data Transformation Services) packages. SQL Server 2005 Integration Services offers the Fuzzy Lookup Transformation which may greatly simplify your data cleansing efforts. Even so, you might also have to enlist the help of clerical workers to correct the data.

5.  Multitude and technical savvy of your users. Helping a handful of users get to the data they need in order to make decisions is not a small fish to fry. Historically, data warehouse was reserved for upper-level management's use. However, data warehousing has slowly made its way to the masses. It's not uncommon to see data warehousing and BI applications used by users that don't necessarily make strategic decisions.

Once your users see the power of data warehousing, they'll want to use it for everything from data entry to retrieving transactional level reports. The larger your user community, the tougher it becomes to keep everyone happy and to educate them on appropriate uses of your application. BI applications perform great when used appropriately, but they are not suitable for every business requirement. Often, you must be the bearer of bad news that data warehouse is not the tool of choice for the functionality your users desire.

6.  Slowly changing dimensions. Your application must reflect data changes that occur in data sources populating the warehouse. Data within dimension tables is particularly prone to change. But how do you maintain the history of such changes?

    a:    The first and easiest method is to overwrite the existing record without tracking changes. Fortunately              this method is acceptable for many dimensions. For example, if a department name changes from                  "finance" to "finance and accounting" you probably won't be required to maintain the history of such              change. However, in a customer or student dimension it is often necessary to keep track of changing              names, marital status, education level, and other attributes - your application must be able to retrieve              current as well as historical values.

  b:      The second method for managing slowly changing dimensions is to create a new record when the                   value changes and tag the older record as obsolete.

  c:        The third and final method is to maintain historical value(s) of the changing domain in separate                        columns of the same row within the dimension table.

So far we've covered the handling of changing values in the data warehouse dimension. But what about Analysis Services dimensions? To reflect changes in attribute values you must re-process dimensions. But if you process dimensions fully you'd have to also reprocess cubes, which might be impractical if your cube contains large volumes of data. Fortunately you can often use the incremental update (process update) option to avoid reprocessing cubes.

7.  Changing facts. Normally one would think that records in the fact tables are static – once the record gets to the warehouse your job is finished, right? Unfortunately the answer is "it depends." In some cases like in a data warehouse tracking patients' hospital stays, all records are generally static. If your hospital stay lasted from January 1 to January 5, that record isn't likely to change.

But consider the retail industry; all sales aren't final – I'm sure you know people who often take the goods they purchased back to the store for various reasons. Some companies would manage such transactions as a series of credits and debits that balance each other out. But in other cases you must either update or delete fact table records, even after they're added to the data warehouse. For example, if a stock trade is recorded incorrectly, balancing it with a corresponding negative trade might not be acceptable. Here's another thing to consider: you might not want your customers to know about problems in your transactional systems. Rather, you want them to see the data only after it is corrected.

One method for dealing with changing facts is to keep the records in the staging area until the data is examined for quality and then migrate it to the warehouse. Even the most thorough testing won't catch all errors in the data sources, however. Occasionally, you might have to refresh cubes by processing those partitions that contain incorrect data. This is why it's essential to keep your Analysis Services partitions small so that processing will complete relatively quickly.

Another way to deal with this challenge is through a write-back partition. With cube write-back, you're not really changing data in the underlying relational warehouse; instead you're adding a record to a separate partition. When a user queries a particular measure group Analysis Services combines data from read-only partition with the data in write-back partition and exposes the result. Not surprisingly, performing such calculations at query execution time imposes additional overhead on Analysis Server and can cause performance degradation.

8.  Implementing security. Much like with any other business application, it's imperative to secure access to your data warehouse and cubes. Each user might only be permitted to see a small portion of the warehouse, depending on their job title or area of responsibility. Analysis Services implements security through Windows accounts which you can associate with Analysis Services roles. You can secure data at a very granular level, all the way down to individual cube cells.

Unfortunately creating a separate role for each user is not only tedious and cumbersome but could also instigate performance problems – this is particularly true of Analysis Services versions prior to 2005. The advice is to group your users into a handful of roles, if possible. If every single user must have her own set of permissions, you'll have to implement a security dimension. Once you have such dimension every MDX query will have to filter the output based on the identity of the current user.

Naturally I can't cover every data warehousing challenge in a single tip. Nor can any one person claim to be familiar with every difficulty involved in building data warehousing solutions. However, if you're contemplating an attempt to implement such solutions, you should carefully consider what you're up against. Do your homework and be prepared for a battle.



For your Reference