国产99久久精品_欧美日本韩国一区二区_激情小说综合网_欧美一级二级视频_午夜av电影_日本久久精品视频

最新文章專題視頻專題問答1問答10問答100問答1000問答2000關(guān)鍵字專題1關(guān)鍵字專題50關(guān)鍵字專題500關(guān)鍵字專題1500TAG最新視頻文章推薦1 推薦3 推薦5 推薦7 推薦9 推薦11 推薦13 推薦15 推薦17 推薦19 推薦21 推薦23 推薦25 推薦27 推薦29 推薦31 推薦33 推薦35 推薦37視頻文章20視頻文章30視頻文章40視頻文章50視頻文章60 視頻文章70視頻文章80視頻文章90視頻文章100視頻文章120視頻文章140 視頻2關(guān)鍵字專題關(guān)鍵字專題tag2tag3文章專題文章專題2文章索引1文章索引2文章索引3文章索引4文章索引5123456789101112131415文章專題3
問答文章1 問答文章501 問答文章1001 問答文章1501 問答文章2001 問答文章2501 問答文章3001 問答文章3501 問答文章4001 問答文章4501 問答文章5001 問答文章5501 問答文章6001 問答文章6501 問答文章7001 問答文章7501 問答文章8001 問答文章8501 問答文章9001 問答文章9501
當(dāng)前位置: 首頁 - 科技 - 知識百科 - 正文

NewHash-basedShardingFeatureinMongoDB2.4

來源:懂視網(wǎng) 責(zé)編:小采 時間:2020-11-09 13:22:57
文檔

NewHash-basedShardingFeatureinMongoDB2.4

NewHash-basedShardingFeatureinMongoDB2.4:Lots of MongoDB users enjoy the flexibility of custom shard keys in organizing a sharded collections documents. For certain common workloads though, like key/value lookup, using
推薦度:
導(dǎo)讀NewHash-basedShardingFeatureinMongoDB2.4:Lots of MongoDB users enjoy the flexibility of custom shard keys in organizing a sharded collections documents. For certain common workloads though, like key/value lookup, using

Lots of MongoDB users enjoy the flexibility of custom shard keys in organizing a sharded collection’s documents. For certain common workloads though, like key/value lookup, using the natural choice of _id as a shard key isn’t optimal bec

Lots of MongoDB users enjoy the flexibility of custom shard keys in organizing a sharded collection’s documents. For certain common workloads though, like key/value lookup, using the natural choice of _id as a shard key isn’t optimal because default ObjectId’s are ascending, resulting in poor write distribution. ?Creating randomized _ids or choosing another well-distributed field is always possible, but this adds complexity to an app and is another place where something could go wrong.

To help keep these simple workloads simple, in 2.4 MongoDB added the new Hash-based shard key feature. ?The idea behind Hash-based shard keys is that MongoDB will do the work to randomize data distribution for you, based on whatever kind of document identifier you like. ?So long as the identifier has a high cardinality, the documents in your collection will be spread evenly across the shards of your cluster. ?For heavy workloads with lots of individual document writes or reads (e.g. key/value), this is usually the best choice. ?For workloads where getting ranges of documents is more important (i.e. find recent documents from all users), other choices of shard key may be better suited.

Hash-based sharding in an existing collection

To start off with Hash-based sharding, you need the name of the collection you’d like to shard and the name of the hashed “identifier" field for the documents in the collection. ?For example, we might want to create a sharded “mydb.webcrawler" collection, where each document is usually found by a “url" field. ?We can populate the collection with sample data using:

shell$ wget http://en.wikipedia.org/wiki/Web_crawler -O web_crawler.html
shell$ mongo 
connecting to: /test
> use mydb
switched to db mydb
> cat("web_crawler.html").split("\n").forEach( function(line){
... var regex = /a /; if (regex.test(line)) { db.webcrawler.insert({ "url" : regex.exec(line)[1] }); }})
> db.webcrawler.find()
...
{ "_id" : ObjectId("5162fba3ad5a8e56b7b36020"), "url" : "/wiki/OWASP" }
{ "_id" : ObjectId("5162fba3ad5a8e56b7b3603d"), "url" : "/wiki/Image_retrieval" }
{ "_id" : ObjectId("5162fba3ad5a8e56b7b3603e"), "url" : "/wiki/Video_search_engine" }
{ "_id" : ObjectId("5162fba3ad5a8e56b7b3603f"), "url" : "/wiki/Enterprise_search" }
{ "_id" : ObjectId("5162fba3ad5a8e56b7b36040"), "url" : "/wiki/Semantic_search" }
...

Just for this example, we multiply this data ~x2000 (otherwise we won’t get any pre-splitting in the collection because it’s too small):

> for (var i = 0; i < 12; i++) { db.webcrawler.find().toArray().forEach( function(doc) { db.webcrawler.insert({ url : doc.url }) }) }

Next, we create a hashed index on this field:

> db.webcrawler.ensureIndex({ url : "hashed" })

As usual, the creation of the hashed index doesn’t prevent other types of indices from being created as well.

Then we shard the “mydb.webcrawler" collection using the same field as a Hash-based shard key:

> db.printShardingStatus(true)
--- Sharding Status ---
sharding version: {
 "_id" : 1,
 "version" : 3,
 "minCompatibleVersion" : 3,
 "currentVersion" : 4,
 "clusterId" : ObjectId("5163032a622c051263c7b8ce")
}
shards:
 { "_id" : "test-rs0", "host" : "test-rs0/nuwen:31100,nuwen:31101" }
 { "_id" : "test-rs1", "host" : "test-rs1/nuwen:31200,nuwen:31201" }
 { "_id" : "test-rs2", "host" : "test-rs2/nuwen:31300,nuwen:31301" }
 { "_id" : "test-rs3", "host" : "test-rs3/nuwen:31400,nuwen:31401" }
databases:
 { "_id" : "admin", "partitioned" : false, "primary" : "config" }
 { "_id" : "mydb", "partitioned" : true, "primary" : "test-rs0" }
 mydb.webcrawler
 shard key: { "url" : "hashed" }
 chunks:
 test-rs0 4
 { "url" : { "$minKey" : 1 } } -->> { "url" : NumberLong("-4837773290201122847") } on : test-rs0 { "t" : 1, "i" : 3 }
 { "url" : NumberLong("-4837773290201122847") } -->> { "url" : NumberLong("-2329535691089872938") } on : test-rs0 { "t" : 1, "i" : 4 }
 { "url" : NumberLong("-2329535691089872938") } -->> { "url" : NumberLong("3244151849123193853") } on : test-rs0 { "t" : 1, "i" : 1 }
 { "url" : NumberLong("3244151849123193853") } -->> { "url" : { "$maxKey" : 1 } } on : test-rs0 { "t" : 1, "i" : 2 }

you can see that the chunk boundaries are 64-bit integers (generated by hashing the “url" field). ?When inserts or queries target particular urls, the query can get routed using the url hash to the correct chunk.

Sharding a new collection

Above we’ve sharded an existing collection, which will result in all the chunks of a collection initially living on the same shard. ?The balancer takes care of moving the chunks around, as usual, until we get an even distribution of data.

Much of the time though, it’s better to shard the collection before we add our data - this way MongoDB doesn’t have to worry about moving around existing data. ?Users of sharded collections are familiar with pre-splitting - where empty chunks can be quickly balanced around a cluster before data is added. ?When sharding a new collection using Hash-based shard keys, MongoDB will take care of the presplitting for you. Similarly sized ranges of the Hash-based key are distributed to each existing shard, which means that no initial balancing is needed (unless of course new shards are added).

Let’s see what happens when we shard a new collection webcrawler_empty the same way:

> sh.stopBalancer()
Waiting for active hosts...
Waiting for the balancer lock...
Waiting again for active hosts after balancer is off...
> db.webcrawler_empty.ensureIndex({ url : "hashed" })
> sh.shardCollection("mydb.webcrawler_empty", { url : "hashed" })
{ "collectionsharded" : "mydb.webcrawler_empty", "ok" : 1 }
> db.printShardingStatus(true)
--- Sharding Status ---
...
 mydb.webcrawler_empty
 shard key: { "url" : "hashed" }
 chunks:
 test-rs0 2
 test-rs1 2
 test-rs2 2
 test-rs3 2
 { "url" : { "$minKey" : 1 } } -->> { "url" : NumberLong("-6917529027641081850") } on : test-rs0 { "t" : 4, "i" : 2 }
 { "url" : NumberLong("-6917529027641081850") } -->> { "url" : NumberLong("-4611686018427387900") } on : test-rs0 { "t" : 4, "i" : 3 }
 { "url" : NumberLong("-4611686018427387900") } -->> { "url" : NumberLong("-2305843009213693950") } on : test-rs1 { "t" : 4, "i" : 4 }
 { "url" : NumberLong("-2305843009213693950") } -->> { "url" : NumberLong(0) } on : test-rs1 { "t" : 4, "i" : 5 }
 { "url" : NumberLong(0) } -->> { "url" : NumberLong("2305843009213693950") } on : test-rs2 { "t" : 4, "i" : 6 }
 { "url" : NumberLong("2305843009213693950") } -->> { "url" : NumberLong("4611686018427387900") } on : test-rs2 { "t" : 4, "i" : 7 }
 { "url" : NumberLong("4611686018427387900") } -->> { "url" : NumberLong("6917529027641081850") } on : test-rs3 { "t" : 4, "i" : 8 }
 { "url" : NumberLong("6917529027641081850") } -->> { "url" : { "$maxKey" : 1 } } on : test-rs3 { "t" : 4, "i" : 9 }

As you can see, the new empty collection is already well-distributed and ready to use. ?Be aware though - any balancing currently in progress can interfere with moving the empty initial chunks off the initial shard, balancing will take priority (hence the initial stopBalancer step). Like before, eventually the balancer will distribute all empty chunks anyway, but if you are preparing for a immediate data load it’s probably best to stop the balancer beforehand.

That’s it - you now have a pre-split collection on four shards using Hash-based shard keys. ?Queries and updates on exact urls go to randomized shards and are balanced across the cluster:

> db.webcrawler_empty.find({ url: "/wiki/OWASP" }).explain()
{
 "clusteredType" : "ParallelSort",
 "shards" : {
 "test-rs2/nuwen:31300,nuwen:31301" : [ ... ]
...

However, the trade-off with Hash-based shard keys is that ranged queries and multi-updates must hit all shards:

> db.webcrawler_empty.find({ url: /^\/wiki\/OWASP/ }).explain()
{
 "clusteredType" : "ParallelSort",
 "shards" : {
 "test-rs0/nuwen:31100,nuwen:31101" : [ ... ],
 "test-rs1/nuwen:31200,nuwen:31201" : [ ... ],
 "test-rs2/nuwen:31300,nuwen:31301" : [ ... ],
 "test-rs3/nuwen:31400,nuwen:31401" : [ ... ]
...

Manual chunk assignment and other caveats

The core benefits of the new Hash-based shard keys are:

  • Easy setup of randomized shard key

  • Automated pre-splitting of empty collections

  • Better distribution of chunks on shards for isolated document writes and reads

  • The standard split and moveChunk functions do work with Hash-based shard keys, so it’s still possible to balance your collection’s chunks in any way you like. ?However, the usual “find” mechanism used to select chunks can behave a bit unexpectedly since the specifier is a document which is hashed to get the containing chunk. ?To keep things simple, just use the new “bounds” parameter when manually manipulating chunks of hashed collections (or all collections, if you prefer):

    > use admin
    > db.runCommand({ split : "mydb.webcrawler_empty", bounds : [{ "url" : NumberLong("2305843009213693950") }, { "url" : NumberLong("4611686018427387900") }] })
    > db.runCommand({ moveChunk : "mydb.webcrawler_empty", bounds : [{ "url" : NumberLong("2305843009213693950") }, { "url" : NumberLong("4611686018427387900") }], to : "test-rs3" })
    

    There are a few other caveats as well - in particular with tag-aware sharding. ?Tag-aware sharding is a feature we released in MongoDB 2.2, which allows you to attach labels to a subset of shards in a cluster. This is valuable for “pinning" collection data to particular shards (which might be hosted on more powerful hardware, for example). ?You can also tag ranges of a collection differently, such that a collection sharded by { “countryCode" : 1 } would have chunks only on servers in that country.

    Hash-based shard keys are compatible with tag-aware sharding. ?As in any sharded collection, you may assign chunks to specific shards, but since the chunk ranges are based on the value of the randomized hash of the shard key instead of the shard key itself, this is usually only useful for tagging the whole range to a specific set of shards:

    > sh.addShardTag("test-rs2", "DC1")
    sh.addShardTag("test-rs3", "DC1")

    The above commands assign a hypothetical data center tag “DC1” to shards -rs2 and -rs3, which could indicate that -rs2 and -rs3 are in a particular location. ?Then, by running:

    > sh.addTagRange("mydb.webcrawler_empty", { url : MinKey }, { url : MaxKey }, "DC1" )

    we indicate to the cluster that the mydb.webcrawler_empty collection should only be stored on “DC1” shards. ?After letting the balancer work:

    > db.printShardingStatus(true)
    --- Sharding Status ---
    ...
     mydb.webcrawler_empty
     shard key: { "url" : "hashed" }
     chunks:
     test-rs2 4
     test-rs3 4
     { "url" : { "$minKey" : 1 } } -->> { "url" : NumberLong("-6917529027641081850") } on : test-rs2 { "t" : 5, "i" : 0 }
     { "url" : NumberLong("-6917529027641081850") } -->> { "url" : NumberLong("-4611686018427387900") } on : test-rs3 { "t" : 6, "i" : 0 }
     { "url" : NumberLong("-4611686018427387900") } -->> { "url" : NumberLong("-2305843009213693950") } on : test-rs2 { "t" : 7, "i" : 0 }
     { "url" : NumberLong("-2305843009213693950") } -->> { "url" : NumberLong(0) } on : test-rs3 { "t" : 8, "i" : 0 }
     { "url" : NumberLong(0) } -->> { "url" : NumberLong("2305843009213693950") } on : test-rs2 { "t" : 4, "i" : 6 }
     { "url" : NumberLong("2305843009213693950") } -->> { "url" : NumberLong("4611686018427387900") } on : test-rs2 { "t" : 4, "i" : 7 }
     { "url" : NumberLong("4611686018427387900") } -->> { "url" : NumberLong("6917529027641081850") } on : test-rs3 { "t" : 4, "i" : 8 }
     { "url" : NumberLong("6917529027641081850") } -->> { "url" : { "$maxKey" : 1 } } on : test-rs3 { "t" : 4, "i" : 9 }
     tag: DC1 { "url" : { "$minKey" : 1 } } -->> { "url" : { "$maxKey" : 1 } }
    

    Again, it doesn’t usually make a lot of sense to tag anything other than the full hashed shard key collection to particular shards - by design, there’s no real way to know or control what data is in what range.

    Finally, remember that Hash-based shard keys can (right now) only distribute documents based on the value of a single field. ?So, continuing the example above, it isn’t directly possible to use “url" + “timestamp" as a Hash-based shard key without storing the combination in a single field in your application, for example:

    url_and_ts : { url : , timestamp :  }

    The sub-document will be hashed as a unit.

    If you’re interested in learning more about Hash-based sharding, register for the Hash-based sharding feature demo on May 2.

    聲明:本網(wǎng)頁內(nèi)容旨在傳播知識,若有侵權(quán)等問題請及時與本網(wǎng)聯(lián)系,我們將在第一時間刪除處理。TEL:177 7030 7066 E-MAIL:11247931@qq.com

    文檔

    NewHash-basedShardingFeatureinMongoDB2.4

    NewHash-basedShardingFeatureinMongoDB2.4:Lots of MongoDB users enjoy the flexibility of custom shard keys in organizing a sharded collections documents. For certain common workloads though, like key/value lookup, using
    推薦度:
    標(biāo)簽: in 2.4 new
    • 熱門焦點

    最新推薦

    猜你喜歡

    熱門推薦

    專題
    Top
    主站蜘蛛池模板: 久久精品a国产一级 | 国产成人乱码一区二区三区在线 | 国产国拍亚洲精品永久不卡 | 国产伦精品一区二区三区在线观看 | 国产精选在线 | 高清精品一区二区三区一区 | 亚欧美色 | 国产精品免费观看 | 成人免费久久精品国产片久久影院 | 国产精品一区二区手机在线观看 | 亚洲视频久久 | 美国一级大黄大色毛片 | 亚洲wuma| 亚洲一区在线播放 | 久久久久久久国产精品 | 精品一区二区三 | 亚洲精品美女久久久aaa | 久久精品一区二区三区不卡牛牛 | 亚洲精品在线免费观看视频 | 国产不卡网| 日日草视频 | 98成人网| 亚洲欧美综合 | 久久久这里有精品999 | 另类日韩 | 亚洲欧美日本另类激情 | 亚洲欧美一区二区三区久久 | 免费一看一级毛片 | 欧美日韩精品在线观看 | 欧美 亚洲 另类 热图 | 国产全黄a一级毛片 | 亚洲欧美第一 | 国产高清美女一级a毛片久久 | 激情欧美日韩一区二区 | 99久久精品国内 | 国产成人无精品久久久久国语 | 国产观看在线 | 欧美日韩在线一区 | zozozo欧美人禽交另类视频 | 亚洲色图欧美色 | 久久久久成人精品一区二区 |