1629078780

**Hash functions** are used to map large data sets of elements of an arbitrary length (*the keys*) to smaller data sets of elements of a fixed length (*the fingerprints*).

The basic application of hashing is efficient testing of equality of keys by comparing their fingerprints.

A *collision* happens when two different keys have the same fingerprint. The way in which collisions are handled is crucial in most applications of hashing. Hashing is particularly useful in construction of efficient practical algorithms.

A **rolling hash** (also known as recursive hashing or rolling checksum) is a hash function where the input is hashed in a window that moves through the input.

A few hash functions allow a rolling hash to be computed very quickly — the new hash value is rapidly calculated given only the following data:

- old hash value,
- the old value removed from the window,
- and the new value added to the window.

An ideal hash function for strings should obviously depend both on the *multiset* of the symbols present in the key and on the *order* of the symbols. The most common family of such hash functions treats the symbols of a string as coefficients of a *polynomial* with an integer variable p and computes its value modulo an integer constant M:

The *Rabin–Karp string search algorithm* is often explained using a very simple rolling hash function that only uses multiplications and additions - **polynomial rolling hash**:

H(s0, s1, ..., sk) = s0 * pk-1 + s1 * pk-2 + ... + sk * p0

where p is a constant, and *(s1, ... , sk)* are the input characters.

For example we can convert short strings to key numbers by multiplying digit codes by powers of a constant. The three letter word ace could turn into a number by calculating:

key = 1 * 262 + 3 * 261 + 5 * 260

In order to avoid manipulating huge H values, all math is done modulo M.

H(s0, s1, ..., sk) = (s0 * pk-1 + s1 * pk-2 + ... + sk * p0) mod M

A careful choice of the parameters M, p is important to obtain “good” properties of the hash function, i.e., low collision rate.

This approach has the desirable attribute of involving all the characters in the input string. The calculated key value can then be hashed into an array index in the usual way:

function hash(key, arraySize) { const base = 13; let hash = 0; for (let charIndex = 0; charIndex < key.length; charIndex += 1) { const charCode = key.charCodeAt(charIndex); hash += charCode * (base ** (key.length - charIndex - 1)); } return hash % arraySize; }

The hash() method is not as efficient as it might be. Other than the character conversion, there are two multiplications and an addition inside the loop. We can eliminate one multiplication by using **Horner's method*:

a4 * x4 + a3 * x3 + a2 * x2 + a1 * x1 + a0 = (((a4 * x + a3) * x + a2) * x + a1) * x + a0

In other words:

Hi = (P * Hi-1 + Si) mod M

The hash() cannot handle long strings because the hashVal exceeds the size of int. Notice that the key always ends up being less than the array size. In Horner's method we can apply the modulo (%) operator at each step in the calculation. This gives the same result as applying the modulo operator once at the end, but avoids the overflow.

function hash(key, arraySize) { const base = 13; let hash = 0; for (let charIndex = 0; charIndex < key.length; charIndex += 1) { const charCode = key.charCodeAt(charIndex); hash = (hash * base + charCode) % arraySize; } return hash; }

Polynomial hashing has a rolling property: the fingerprints can be updated efficiently when symbols are added or removed at the ends of the string (provided that an array of powers of p modulo M of sufficient length is stored). The popular Rabin–Karp pattern matching algorithm is based on this property

- Where to Use Polynomial String Hashing
- Hashing on uTexas
- Hash Function on Wikipedia
- Rolling Hash on Wikipedia

TheOriginal Articlecan be found onhttps://github.com

#javascript #algorithms #datastructures #cryptography

1620466520

If you accumulate data on which you base your decision-making as an organization, you should probably think about your data architecture and possible best practices.

If you accumulate data on which you base your decision-making as an organization, you most probably need to think about your data architecture and consider possible best practices. Gaining a competitive edge, remaining customer-centric to the greatest extent possible, and streamlining processes to get on-the-button outcomes can all be traced back to an organization’s capacity to build a future-ready data architecture.

In what follows, we offer a short overview of the overarching capabilities of data architecture. These include user-centricity, elasticity, robustness, and the capacity to ensure the seamless flow of data at all times. Added to these are automation enablement, plus security and data governance considerations. These points from our checklist for what we perceive to be an anticipatory analytics ecosystem.

#big data #data science #big data analytics #data analysis #data architecture #data transformation #data platform #data strategy #cloud data platform #data acquisition

1629078780

**Hash functions** are used to map large data sets of elements of an arbitrary length (*the keys*) to smaller data sets of elements of a fixed length (*the fingerprints*).

The basic application of hashing is efficient testing of equality of keys by comparing their fingerprints.

A *collision* happens when two different keys have the same fingerprint. The way in which collisions are handled is crucial in most applications of hashing. Hashing is particularly useful in construction of efficient practical algorithms.

A **rolling hash** (also known as recursive hashing or rolling checksum) is a hash function where the input is hashed in a window that moves through the input.

A few hash functions allow a rolling hash to be computed very quickly — the new hash value is rapidly calculated given only the following data:

- old hash value,
- the old value removed from the window,
- and the new value added to the window.

An ideal hash function for strings should obviously depend both on the *multiset* of the symbols present in the key and on the *order* of the symbols. The most common family of such hash functions treats the symbols of a string as coefficients of a *polynomial* with an integer variable p and computes its value modulo an integer constant M:

The *Rabin–Karp string search algorithm* is often explained using a very simple rolling hash function that only uses multiplications and additions - **polynomial rolling hash**:

H(s0, s1, ..., sk) = s0 * pk-1 + s1 * pk-2 + ... + sk * p0

where p is a constant, and *(s1, ... , sk)* are the input characters.

For example we can convert short strings to key numbers by multiplying digit codes by powers of a constant. The three letter word ace could turn into a number by calculating:

key = 1 * 262 + 3 * 261 + 5 * 260

In order to avoid manipulating huge H values, all math is done modulo M.

H(s0, s1, ..., sk) = (s0 * pk-1 + s1 * pk-2 + ... + sk * p0) mod M

A careful choice of the parameters M, p is important to obtain “good” properties of the hash function, i.e., low collision rate.

This approach has the desirable attribute of involving all the characters in the input string. The calculated key value can then be hashed into an array index in the usual way:

function hash(key, arraySize) { const base = 13; let hash = 0; for (let charIndex = 0; charIndex < key.length; charIndex += 1) { const charCode = key.charCodeAt(charIndex); hash += charCode * (base ** (key.length - charIndex - 1)); } return hash % arraySize; }

The hash() method is not as efficient as it might be. Other than the character conversion, there are two multiplications and an addition inside the loop. We can eliminate one multiplication by using **Horner's method*:

a4 * x4 + a3 * x3 + a2 * x2 + a1 * x1 + a0 = (((a4 * x + a3) * x + a2) * x + a1) * x + a0

In other words:

Hi = (P * Hi-1 + Si) mod M

The hash() cannot handle long strings because the hashVal exceeds the size of int. Notice that the key always ends up being less than the array size. In Horner's method we can apply the modulo (%) operator at each step in the calculation. This gives the same result as applying the modulo operator once at the end, but avoids the overflow.

function hash(key, arraySize) { const base = 13; let hash = 0; for (let charIndex = 0; charIndex < key.length; charIndex += 1) { const charCode = key.charCodeAt(charIndex); hash = (hash * base + charCode) % arraySize; } return hash; }

Polynomial hashing has a rolling property: the fingerprints can be updated efficiently when symbols are added or removed at the ends of the string (provided that an array of powers of p modulo M of sufficient length is stored). The popular Rabin–Karp pattern matching algorithm is based on this property

- Where to Use Polynomial String Hashing
- Hashing on uTexas
- Hash Function on Wikipedia
- Rolling Hash on Wikipedia

TheOriginal Articlecan be found onhttps://github.com

#javascript #algorithms #datastructures #cryptography

1620629020

The opportunities big data offers also come with very real challenges that many organizations are facing today. Often, it’s finding the most cost-effective, scalable way to store and process boundless volumes of data in multiple formats that come from a growing number of sources. Then organizations need the analytical capabilities and flexibility to turn this data into insights that can meet their specific business objectives.

This Refcard dives into how a data lake helps tackle these challenges at both ends — from its enhanced architecture that’s designed for efficient data ingestion, storage, and management to its advanced analytics functionality and performance flexibility. You’ll also explore key benefits and common use cases.

As technology continues to evolve with new data sources, such as IoT sensors and social media churning out large volumes of data, there has never been a better time to discuss the possibilities and challenges of managing such data for varying analytical insights. In this Refcard, we dig deep into how data lakes solve the problem of storing and processing enormous amounts of data. While doing so, we also explore the benefits of data lakes, their use cases, and how they differ from data warehouses (DWHs).

*This is a preview of the Getting Started With Data Lakes Refcard. To read the entire Refcard, please download the PDF from the link above.*

#big data #data analytics #data analysis #business analytics #data warehouse #data storage #data lake #data lake architecture #data lake governance #data lake management

1621986060

If I ask you what is your morning routine, what will you answer? Let me answer it for you. You will wake up in the morning, freshen up, you’ll go for some exercise, come back, bath, have breakfast, and then you’ll get ready for the rest of your day.

If you observe closely these are a set of rules that you follow daily to get ready for your work or classes. If you skip even one step, you will not achieve your task, which is getting ready for the day.

These steps do not contain the details like, at what time you wake up or which toothpaste did you use or did you go for a walk or to the gym, or what did you have in your breakfast. But all they do contain are some basic fundamental steps that you need to execute to perform some task. This is a very basic example of algorithms. This is an algorithm for your everyday morning.

In this article, we will be learning algorithms, their characteristics, types of algorithms, and most important the complexity of algorithms.

Algorithms are a finite set of rules that must be followed for problem-solving operations. Algorithms are step-by-step guides to how the execution of a process or a program is done on a machine to get the expected output.

- Do not contain complete programs or details. They are just logical solutions to a problem.
- Algorithms are expressible in simple language or flowchart.

No one would follow any written instructions to follow a daily morning routine. Similarly, you cannot follow anything available in writing and consider it as an algorithm. To consider some instructions as an algorithm, they must have some specific characteristics :

**1. Input:** An algorithm, if required, should have very well-defined inputs. An algorithm can have zero or more inputs.

**2. Output:** Every algorithm should have one or more very well-defined outputs. Without an output, the algorithm fails to give the result of the tasks performed.

**3. Unambiguous:** The algorithm should be unambiguous and it should not have any confusion under any circumstances. All the sentences and steps should be clear and must have only one meaning.

**4. Finiteness:** The steps in the algorithm must be finite and there should be no infinite loops or steps in the algorithm. In simple words, an algorithm should always end.

**5. Effectiveness:** An algorithm should be simple, practically possible, and easy to understand for all users. It should be executable upon the available resources and should not contain any kind of futuristic technology or imagination.

**6. Language independent:** An algorithm must be in plain language so that it can be easily implemented in any computer language and yet the output should be the same as expected.

**1. Problem:** To write a solution you need to first identify the problem. The problem can be an example of the real-world for which we need to create a set of instructions to solve it.

**2. Algorithm:** Design a step-by-step procedure for the above problem and this procedure, after satisfying all the characteristics mentioned above, is an algorithm.

**3. Input:** After creating the algorithm, we need to give the required input. There can be zero or more inputs in an algorithm.

**4. Processing unit:** The input is now forwarded to the processing unit and this processing unit will produce the desired result according to the algorithm.

**5. Output:** The desired or expected output of the program according to the algorithm.

Suppose you want to cook chole ( or chickpeas) for lunch. Now you cannot just go to the kitchen and set utensils on gas and start cooking them. You must have soaked them for at least 12 hours before cooking, then chop desired vegetables and follow many steps after that to get the delicious taste, texture, and nutrition.

This is the need for algorithms. To get desired output, you need to follow some specific set of rules. These rules do not contain details like in the above example, which masala you are using or which salt you are using, or how many chickpeas you are soaking. But all these rules contain a basic step-by-step guide for best results.

We need algorithms for the following two reasons :

**1. Performance:** The result should be as expected. You can break the large problems into smaller problems and solve each one of them to get the desired result. This also shows that the problem is feasible.

**2. Scalability:** When you have a big problem or a similar kind of smaller problem, the algorithm should work and give the desired output for both problems. In our example, no matter how many people you have for lunch the same algorithm of cooking chickpeas will work every single time if followed correctly.

Let us try to write an algorithm for our lunch problem :

1. Soak chickpeas in the night so that they are ready till the next afternoon.

2. Chop some vegetables that you like.

3. Set up a utensil on gas and saute the chopped vegetables.

4. Add water and wait for boiling.

5. Add chickpeas and wait until you get the desired texture.

6. Chickpeas are now ready for your lunch.

The real-world example that we just discussed is a very close example of the algorithm. You cannot just start with step 3 and start cooking. You will not get the desired result. To get the desired result, you need to follow the specific order of rules. Also, each instruction should be clear in an algorithm as we can see in the above example.

#algorithms in data structure #data structure algorithms #algorithms

1617959340

Companies across every industry rely on big data to make strategic decisions about their business, which is why data analyst roles are constantly in demand. Even as we transition to more automated data collection systems, data analysts remain a crucial piece in the data puzzle. Not only do they build the systems that extract and organize data, but they also make sense of it –– identifying patterns, trends, and formulating actionable insights.

If you think that an entry-level data analyst role might be right for you, you might be wondering what to focus on in the first 90 days on the job. What skills should you have going in and what should you focus on developing in order to advance in this career path?

Let’s take a look at the most important things you need to know.

#data #data-analytics #data-science #data-analysis #big-data-analytics #data-privacy #data-structures #good-company