Red Hat GLOBAL FILE SYSTEM 4.7 Betriebsanweisung Seite 95

  • Herunterladen
  • Zu meinen Handbüchern hinzufügen
  • Drucken
  • Seite
    / 134
  • Inhaltsverzeichnis
  • FEHLERBEHEBUNG
  • LESEZEICHEN
  • Bewertet. / 5. Basierend auf Kundenbewertungen
Seitenansicht 94
Using Lustre file systems — performance hints 6–11
The parameters that control client operation interact as shown in the following example. In this example, the
configuration is as follows:
There are 30 OST services, one on each server in the HP SFS system.
All client nodes and servers are connected to a single switch with an overall throughput of 1Gb/sec.
The max_dirty_mb parameter on the client node is 32MB for each OST service that the client node
is communicating with.
The value of the Lustre timeout attribute for the file system is 100 seconds (the default).
The timeout period for I/O transactions is 100 seconds (that is, half of the value of the Lustre
timeout attribute).
In such a configuration, there could be a maximum of 960MB of data to be sent if a client node were to
access all OST services. If ten client nodes all flushed data at the same time, it would take under 90 seconds
to flush all the data.
If there were an imbalance in the network processing, it is possible that an individual RPC could be delayed
beyond the I/O transaction timeout limit. If this happens, the server evicts the client node for non-
responsiveness.
For write operations, one way to avoid such problems is to set the max_dirty_mb parameter to a lower
value. However, this solution has the disadvantage of making the Lustre pipeline less deep and can impact
overall throughput. It also has no impact on read operation traffic.
The best way to avoid transaction timeouts is to combine the following actions:
Segment traffic so that (under most situations) a single client node accesses a limited number of OST
service.
Set an appropriate value for the Lustre timeout attribute (see Section 6.3.4.1).
CAUTION: In the /proc/sys/lustre directory, there is a configuration variable called
ldlm_timeout. This variable is for Lustre internal use on servers only; it is used by the LDLM lock
manager to detect and evict failed clients that have not yet been evicted as a result of being inactive for
greater than 2.25 times the period specified by the Lustre timeout file system attribute. Do not change
the value of the ldlm_timeout variable.
6.3.4.1 Changing the Lustre timeout attribute
The value of the Lustre timeout attribute on the file system can be changed using the modify
filesystem command in the HP SFS system. Refer to Chapter 5 of the HP StorageWorks Scalable File
Share System User Guide for more information.
NOTE: Before you change the Lustre timeout attribute, you must first unmount the file system on all
client nodes. When you have changed the attribute, the client nodes can remount the file system.
Note that the Lustre timeout attribute is also used by Lustre in a recovery scenario where an Object
Storage Server or MDS server is disconnected, or fails, or reboots. In this case, the timeout period used for
client nodes to reconnect to the server is 1.5 times the value of the Lustre timeout attribute. If you
increase the value of the Lustre timeout attribute, when a server boots it will wait longer for client nodes
to reconnect before giving up on them. This can impact overall file system startup time.
Keep the following formula in mind when changing the value of the Lustre timeout attribute:
Lustre timeout attribute value/2 >= number of clients * max_dirty_mb /
(bandwidth to each host)
Seitenansicht 94
1 2 ... 90 91 92 93 94 95 96 97 98 99 100 ... 133 134

Kommentare zu diesen Handbüchern

Keine Kommentare